CS 7643: Deep Learning

Similar documents
Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Lecture 23 Deep Learning: Segmentation

Convolutional Neural Networks. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 5-1

Convolutional Neural Networks. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 5-1

Convolutional Neural Networks

Deep Learning. Dr. Johan Hagelbäck.

Lecture 11-1 CNN introduction. Sung Kim

Visualizing and Understanding. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 12 -

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Introduction to Machine Learning

Fully Convolutional Networks for Semantic Segmentation

6. Convolutional Neural Networks

Colorful Image Colorizations Supplementary Material

Convolutional neural networks

Coursework 2. MLP Lecture 7 Convolutional Networks 1

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

CSC321 Lecture 11: Convolutional Networks

Biologically Inspired Computation

Impact of Automatic Feature Extraction in Deep Learning Architecture

Convolutional Networks Overview

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Research on Hand Gesture Recognition Using Convolutional Neural Network

Generating an appropriate sound for a video using WaveNet.

What Is And How Will Machine Learning Change Our Lives. Fair Use Agreement

Introduction to Machine Learning

Image Manipulation Detection using Convolutional Neural Network

Understanding Neural Networks : Part II

PROJECT REPORT. Using Deep Learning to Classify Malignancy Associated Changes

Semantic Segmentation on Resource Constrained Devices

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Compact Deep Convolutional Neural Networks for Image Classification

GPU ACCELERATED DEEP LEARNING WITH CUDNN

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

En ny æra for uthenting av informasjon fra satellittbilder ved hjelp av maskinlæring

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

Sketch-a-Net that Beats Humans

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

CS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET

INFORMATION about image authenticity can be used in

Convolutional Neural Networks for Small-footprint Keyword Spotting

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Applications of Music Processing

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 83

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Convolutional Neural Network-based Steganalysis on Spatial Domain

Filters. Materials from Prof. Klaus Mueller

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Semantic Segmentation in Red Relief Image Map by UX-Net

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

Lecture 17 Convolutional Neural Networks

Scene Text Eraser. arxiv: v1 [cs.cv] 8 May 2017

Camera Model Identification With The Use of Deep Convolutional Neural Networks

arxiv: v1 [cs.ce] 9 Jan 2018

Fully Convolutional Network with dilated convolutions for Handwritten

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

Autocomplete Sketch Tool

Domain Adaptation & Transfer: All You Need to Use Simulation for Real

Driving Using End-to-End Deep Learning

Today. CS 395T Visual Recognition. Course content. Administration. Expectations. Paper reviews

CPSC 340: Machine Learning and Data Mining. Convolutional Neural Networks Fall 2018

CSC 578 Neural Networks and Deep Learning

Digital Image Processing. Digital Image Fundamentals II 12 th June, 2017

Learning and Visualizing Modulation Discriminative Radio Signal Features

Image features: Histograms, Aliasing, Filters, Orientation and HOG. D.A. Forsyth

02/02/10. Image Filtering. Computer Vision CS 543 / ECE 549 University of Illinois. Derek Hoiem

Image Classification using Convolutional Neural Networks

یادآوری: خالصه CNN. ConvNet

LifeCLEF Bird Identification Task 2016

Vision Review: Image Processing. Course web page:

A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks

Image Filtering in Spatial domain. Computer Vision Jia-Bin Huang, Virginia Tech

Understanding Convolution for Semantic Segmentation

CS688/WST665 Student presentation Learning Fine-grained Image Similarity with Deep Ranking CVPR Gayoung Lee ( 이가영 )

Computer Vision, Lecture 3

Understanding Convolution for Semantic Segmentation

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

CS 4501: Introduction to Computer Vision. Filtering and Edge Detection

>>> from numpy import random as r >>> I = r.rand(256,256);

Matlab (see Homework 1: Intro to Matlab) Linear Filters (Reading: 7.1, ) Correlation. Convolution. Linear Filtering (warm-up slide) R ij

A Deep Learning Approach To Universal Image Manipulation Detection Using A New Convolutional Layer

SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

TRANSFORMING PHOTOS TO COMICS USING CONVOLUTIONAL NEURAL NETWORKS. Tsinghua University, China Cardiff University, UK

arxiv: v1 [stat.ml] 10 Nov 2017

Evaluation of Image Segmentation Based on Histograms

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 78

Midterm Examination CS 534: Computational Photography

Voice Activity Detection

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

Last Lecture. photomatix.com

arxiv: v1 [cs.lg] 2 Jan 2018

Lecture 7: Scene Text Detection and Recognition. Dr. Cong Yao Megvii (Face++) Researcher

Transcription:

CS 7643: Deep Learning Topics: Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions Dhruv Batra Georgia Tech

HW1 extension 09/22 09/25 Administrativia HW2 + PS2 both coming out on 09/22 09/25 Note on class schedule coming up Switching to paper reading starting next week. https://docs.google.com/spreadsheets/d/1un31ycwag6nhjv YPUVKMy3vHwW-h9MZCe8yKCqw0RsU/edit#gid=0 First review due: Tue 09/26 First student presentation due: Thr 09/28 (C) Dhruv Batra 2

Recap of last time (C) Dhruv Batra 3

Convolutional Neural Networks (without the brain stuff) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Convolutional Neural Networks a INPUT 32x32 C1: feature maps 6@28x28 C3: f. maps 16@10x10 S4: f. maps 16@5x5 S2: f. maps 6@14x14 C5: layer 120 F6: layer 84 OUTPUT 10 Convolutions Subsampling Convolutions Full connection Gaussian connections Subsampling Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 5

FC vs Conv Layer 6

Convolution Layer 32 32x32x3 image 5x5x3 filter 3 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Convolution Layer 32 32x32x3 image 5x5x3 filter activation map 28 convolve (slide) over all spatial locations 3 32 1 28 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

For example, if we had 6 5x5 filters, we ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 3 32 6 28 We stack these up to get a new image of size 28x28x6! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 24 3 32 CONV, ReLU e.g. 6 5x5x3 filters 28 6 CONV, ReLU e.g. 10 5x5x6 filters 10 24 CONV, ReLU. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

F N F N Output size: (N - F) / stride + 1 e.g. N = 7, F = 3: stride 1 => (7-3)/1 + 1 = 5 stride 2 => (7-3)/2 + 1 = 3 stride 3 => (7-3)/3 + 1 = 2.33 :\ Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

In practice: Common to zero pad the border 0 0 0 0 0 0 0 0 0 0 e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output? 7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

(btw, 1x1 convolution layers make perfect sense) 56 1x1 CONV with 32 filters 56 64 56 (each filter has size 1x1x64, and performs a 64-dimensional dot product) 32 56 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Pooling Layer By pooling (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 14

MAX POOLING dim 1 Single depth slice 1 1 2 4 5 6 7 8 3 2 1 0 1 2 3 4 max pool with 2x2 filters and stride 2 6 8 3 4 dim 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Max-pooling: Pooling Layer: Examples h n i (r, c) = max r2n(r), c2n(c) hn 1 i ( r, c) Average-pooling: L2-pooling: h n i (r, c) = h n i (r, c) = L2-pooling over features: s X h n i (r, c) = mean r2n(r), c2n(c) hn 1 i ( r, c) s X r2n(r), c2n(c) j2n(i) h n 1 i (r, c) 2 h n 1 i ( r, c) 2 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 16

Classical View (C) Dhruv Batra Figure Credit: [Long, Shelhamer, Darrell CVPR15] 17

H hidden units MxMxN, M small Fully conn. layer (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 18

Classical View = Inefficient (C) Dhruv Batra 19

Classical View (C) Dhruv Batra Figure Credit: [Long, Shelhamer, Darrell CVPR15] 20

Re-interpretation Just squint a little! (C) Dhruv Batra Figure Credit: [Long, Shelhamer, Darrell CVPR15] 21

Fully Convolutional Networks Can run on an image of any size! (C) Dhruv Batra Figure Credit: [Long, Shelhamer, Darrell CVPR15] 22

H hidden units / 1x1xH feature maps MxMxN, M small Fully conn. layer / Conv. layer (H kernels of size MxMxN) (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 23

K hidden units / 1x1xK feature maps H hidden units / 1x1xH feature maps MxMxN, M small Fully conn. layer / Conv. layer (H kernels of size MxMxN) Fully conn. layer / Conv. layer (K kernels of size 1x1xH) (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 24

Viewing fully connected layers as convolutional layers enables efficient use of convnets on bigger images (no need to slide windows but unroll network over space as needed to re-use computation). TRAINING TIME Input Image CNN TEST TIME Input Image CNN y x (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 25

Viewing fully connected layers as convolutional layers enables efficient use of convnets on bigger images (no need to slide windows but unroll network over space as needed to re-use computation). TRAINING TIME Input Image CNN TEST TIME CNNs work on any image size! Input Image CNN y x Unrolling is order of magnitudes more eficient than sliding windows! (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 26

Benefit of this thinking Mathematically elegant Efficiency Can run network on arbitrary image Without multiple crops (C) Dhruv Batra 27

Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent advances such as ResNet/GoogLeNet challenge this paradigm Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Plan for Today Convolutional Neural Networks Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions (C) Dhruv Batra 29

Toeplitz Matrix Diagonals are constants A ij = a i-j (C) Dhruv Batra 30

Why do we care? (Discrete) Convolution = Matrix Multiplication with Toeplitz Matrices (C) Dhruv Batra 31 y = w x 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 w k 0... 0 0 w k 1 w k... 0 0 w k 2 w k 1... 0 0..... w 1 w k 2... w k 0..... 0 w 1... w k 1 w k..... 0 0. w 1 w 2 0 0. 0 w 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 4 x 1 x 2 x 3. x n 3 7 7 7 7 7 5

"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/file:convolution_of_box_signal_with_itself2.gif#/media/file:convolution_of_box_signal_wi th_itself2.gif (C) Dhruv Batra 32

(C) Dhruv Batra 33

Plan for Today Convolutional Neural Networks Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions (C) Dhruv Batra 34

Dilated Convolutions (C) Dhruv Batra 35

Dilated Convolutions (C) Dhruv Batra 36

(C) Dhruv Batra 37

(recall:) (N - k) / stride + 1 (C) Dhruv Batra 38

(C) Dhruv Batra 39 Figure Credit: Yu and Koltun, ICLR16

Plan for Today Convolutional Neural Networks Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions (C) Dhruv Batra 40

Backprop in Convolutional Layers (C) Dhruv Batra 41

Backprop in Convolutional Layers (C) Dhruv Batra 42

Backprop in Convolutional Layers (C) Dhruv Batra 43

Backprop in Convolutional Layers (C) Dhruv Batra 44

Plan for Today Convolutional Neural Networks Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions (C) Dhruv Batra 45

Transposed Convolutions Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution (C) Dhruv Batra 46

So far: Image Classification This image is CC0 public domain Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. Vector: 4096 Fully-Connected: 4096 to 1000 Class Scores Cat: 0.9 Dog: 0.05 Car: 0.01... Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Other Computer Vision Tasks Semantic Segmentation Classification + Localization Object Detection GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT No objects, just pixels Single Object Instance Segmentation DOG, DOG, CAT Multiple Object Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n This image is CC0 public domain

Semantic Segmentation Label each pixel in the image with a category label Don t differentiate instances, only care about pixels This image is CC0 public domain Sky Sky Cow Cat Grass Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n Grass

Semantic Segmentation Idea: Sliding Window Extract patch Classify center pixel with CNN Full image Cow Cow Grass Farabet et al, Learning Hierarchical Features for Scene Labeling, TPAMI 2013 Pinheiro and Collobert, Recurrent Convolutional Neural Networks for Scene Labeling, ICML 2014 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Semantic Segmentation Idea: Sliding Window Extract patch Classify center pixel with CNN Full image Cow Cow Grass Problem: Very inefficient! Not reusing shared features between overlapping patches Farabet et al, Learning Hierarchical Features for Scene Labeling, TPAMI 2013 Pinheiro and Collobert, Recurrent Convolutional Neural Networks for Scene Labeling, ICML 2014 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Semantic Segmentation Idea: Fully Convolutional Design a network as a bunch of convolutional layers to make predictions for pixels all at once! Conv Conv Conv Conv argmax Input: 3 x H x W Convolutions: D x H x W Scores: C x H x W Predictions: H x W Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Semantic Segmentation Idea: Fully Convolutional Design a network as a bunch of convolutional layers to make predictions for pixels all at once! Conv Conv Conv Conv argmax Input: 3 x H x W Problem: convolutions at original image resolution will be very expensive... Convolutions: D x H x W Scores: C x H x W Predictions: H x W Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Semantic Segmentation Idea: Fully Convolutional Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network! Med-res: D 2 x H/4 x W/4 Med-res: D 2 x H/4 x W/4 Input: 3 x H x W High-res: D 1 x H/2 x W/2 Low-res: D 3 x H/4 x W/4 High-res: D 1 x H/2 x W/2 Predictions: H x W Long, Shelhamer, and Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Noh et al, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Semantic Segmentation Idea: Fully Convolutional Downsampling: Pooling, strided convolution Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network! Med-res: D 2 x H/4 x W/4 Med-res: D 2 x H/4 x W/4 Upsampling:??? Input: 3 x H x W High-res: D 1 x H/2 x W/2 Low-res: D 3 x H/4 x W/4 High-res: D 1 x H/2 x W/2 Predictions: H x W Long, Shelhamer, and Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Noh et al, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

In-Network upsampling: Unpooling Nearest Neighbor 1 1 2 2 Bed of Nails 1 0 2 0 1 2 1 1 2 2 1 2 0 0 0 0 3 4 3 3 4 4 3 4 3 0 4 0 3 3 4 4 0 0 0 0 Input: 2 x 2 Output: 4 x 4 Input: 2 x 2 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

In-Network upsampling: Max Unpooling Max Pooling Remember which element was max! 1 2 6 3 3 5 2 1 1 2 2 1 7 3 4 8 5 6 7 8 Rest of the network Max Unpooling Use positions from pooling layer 1 2 3 4 0 0 2 0 0 1 0 0 0 0 0 0 3 0 0 4 Input: 4 x 4 Output: 2 x 2 Input: 2 x 2 Output: 4 x 4 Corresponding pairs of downsampling and upsampling layers Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall:Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall: Normal 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall: Normal 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall: Normal 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall: Normal 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Recall: Normal 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Filter moves 2 pixels in the input for every one pixel in the output Stride gives ratio between movement in input and output Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution 3 x 3 transpose convolution, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution 3 x 3 transpose convolution, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution 3 x 3 transpose convolution, stride 2 pad 1 Sum where output overlaps Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Filter moves 2 pixels in the output for every one pixel in the input Stride gives ratio between movement in output and input Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Learnable Upsampling: Transpose Convolution Other names: -Deconvolution (bad) -Upconvolution -Fractionally strided convolution -Backward strided convolution 3 x 3 transpose convolution, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Sum where output overlaps Filter moves 2 pixels in the output for every one pixel in the input Stride gives ratio between movement in output and input Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Transpose Convolution: 1D Example Output Input a b Filter x y z ax ay az + bx by bz Output contains copies of the filter weighted by the input, summing at where at overlaps in the output Need to crop one pixel from output to make output exactly 2x input Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Transposed Convolution https://distill.pub/2016/deconv-checkerboard/ (C) Dhruv Batra 69