Convolutional neural networks

Similar documents
Coursework 2. MLP Lecture 7 Convolutional Networks 1

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

CSC 578 Neural Networks and Deep Learning

Lecture 17 Convolutional Neural Networks

CSC321 Lecture 11: Convolutional Networks

Convolutional Networks Overview

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Deep Learning. Dr. Johan Hagelbäck.

Generating an appropriate sound for a video using WaveNet.

CS 7643: Deep Learning

Lecture 23 Deep Learning: Segmentation

Introduction to Machine Learning

Digital Image Processing. Digital Image Fundamentals II 12 th June, 2017

CPSC 340: Machine Learning and Data Mining. Convolutional Neural Networks Fall 2018

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

Biologically Inspired Computation

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Understanding Neural Networks : Part II

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET

6. Convolutional Neural Networks

Research on Hand Gesture Recognition Using Convolutional Neural Network

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Semantic Segmentation on Resource Constrained Devices

Practical Image and Video Processing Using MATLAB

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Convolutional Neural Networks

Lecture 11-1 CNN introduction. Sung Kim

CS 4501: Introduction to Computer Vision. Filtering and Edge Detection

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

>>> from numpy import random as r >>> I = r.rand(256,256);

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

Matlab (see Homework 1: Intro to Matlab) Linear Filters (Reading: 7.1, ) Correlation. Convolution. Linear Filtering (warm-up slide) R ij

Vision Review: Image Processing. Course web page:

>>> from numpy import random as r >>> I = r.rand(256,256);

Deep Neural Network Architectures for Modulation Classification

Lecture 13 Register Allocation: Coalescing

More image filtering , , Computational Photography Fall 2017, Lecture 4

Neural Network Part 4: Recurrent Neural Networks

The Art of Neural Nets

Study guide for Graduate Computer Vision

Robert Collins CSE486, Penn State. Lecture 3: Linear Operators

Templates and Image Pyramids

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

A Spatial Mean and Median Filter For Noise Removal in Digital Images

Filtering in the spatial domain (Spatial Filtering)

02/02/10. Image Filtering. Computer Vision CS 543 / ECE 549 University of Illinois. Derek Hoiem

A Neural Algorithm of Artistic Style (2015)

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model

Digital Media. Lecture 4: Bitmapped images: Compression & Convolution Georgia Gwinnett College School of Science and Technology Dr.

Filtering. Image Enhancement Spatial and Frequency Based

Image features: Histograms, Aliasing, Filters, Orientation and HOG. D.A. Forsyth

Convolutional Neural Network-based Steganalysis on Spatial Domain

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

IMAGE PROCESSING PROJECT REPORT NUCLEUS CLASIFICATION

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

To Infinity And Beyond. Computer Vision for Astronomy

Hardware-based Image Retrieval and Classifier System

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

THE problem of automating the solving of

arxiv: v1 [stat.ml] 10 Nov 2017

DEEP LEARNING ON RF DATA. Adam Thompson Senior Solutions Architect March 29, 2018

Image processing for gesture recognition: from theory to practice. Michela Goffredo University Roma TRE

Image Manipulation Detection using Convolutional Neural Network

Image Filtering Josef Pelikán & Alexander Wilkie CGG MFF UK Praha

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Spatial Transformer Networks

Automated Image Timestamp Inference Using Convolutional Neural Networks

Templates and Image Pyramids

10mW CMOS Retina and Classifier for Handheld, 1000Images/s Optical Character Recognition System

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

arxiv: v3 [cs.cv] 18 Dec 2018

FFT analysis in practice

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Filters. Materials from Prof. Klaus Mueller

Introduction to Machine Learning

Lecture 19: Depth Cameras. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Deep learning architectures for music audio classification: a personal (re)view

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

En ny æra for uthenting av informasjon fra satellittbilder ved hjelp av maskinlæring

PACS photometry on extended sources

Numerical Derivatives See also T&V, Appendix A.2 Gradient = vector of partial derivatives of image I(x,y) = [di(x,y)/dx, di(x,y)/dy]

Classification of Road Images for Lane Detection

Motion illusion, rotating snakes

Solution Q.1 What is a digital Image? Difference between Image Processing

A Primer on Human Vision: Insights and Inspiration for Computer Vision

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

High Level Computer Vision SS2015

Counterfeit Bill Detection Algorithm using Deep Learning

Image Enhancement using Histogram Equalization and Spatial Filtering

Continued. Introduction to Computer Vision CSE 252a Lecture 11

Small World Network Architectures. NIPS 2017 Workshop

Computer Graphics (Fall 2011) Outline. CS 184 Guest Lecture: Sampling and Reconstruction Ravi Ramamoorthi

Computing for Engineers in Python

Radio Deep Learning Efforts Showcase Presentation

Neural Networks The New Moore s Law

Application of Deep Learning in Software Security Detection

Design of Practical Color Filter Array Interpolation Algorithms for Cameras, Part 2

Transcription:

Convolutional neural networks

Themes Curriculum: Ch 9.1, 9.2 and http://cs231n.github.io/convolutionalnetworks/ The simple motivation and idea How it s done Receptive field Pooling Dilated convolutions

Resources Learning goals Chapter 9 (not great ) http://cs231n.github.io/convolutionalnetworks/ cs231n video: https://www.youtube.com/watch?v=aqirpkrayd g Video relevant for motivation part : https://www.youtube.com/watch?v=sq67nbcl V98 Why is convolutional network good for images and audio? How does a normal convolutional network work? Why is the receptive field important? How can we increase the receptive field? What are pooling, why is it used and what are possible downsides?

The simple motivation and idea

The simple idea Image filters can enhance image attributes Convolutional neural networks are similar to conventional image filtering Filter kernels are learnt

How does a fully connected see the world? A neural network or standard machine learning have to learn that pixels close to each other are more related. A cat moved from one part of a picture to the other is viewed as completely different objects.

A shifted frog is seen as completely different

A shifted frog is seen as completely different

Most image applications are absolute position invariant

Building absolute position invariance We can make a sliding classifier: Reusing the same classifier many times for each picture Problems? SVM

Building absolute position invariance We can make a sliding classifier: Reusing the same classifier many times for each picture Problems: Restricted field of view Still problems with different sizes

Make every layer in a neural network slide

Make every layer in a neural network slide Not only the cat classifier is reused, but also partial representations Edge, fur, eye, grass detectors More tolerant to changes in shape and size? Large receptive field? Reuse from sliding is combined with reuse with depth

Make every layer in a neural network slide Reuse from sliding is combined with reuse with depth With depth a detector can be reused for different classes etc. With sliding a detector can also be reused for every position A product relationship instead of sum (have not seen any studies)

How it s done

Convolutional neural network You should all know convolution Difference between convolution and correlation is irrelevant (flipping filter) When we deal with channels or features there are some options

Filters and channels (Standard method) An input image have a third dimension (say RGB) A filter/kernel always has the same third dimension

Filters and channels Overlapping area is multiplied then summed (dot product) With sliding you get 28x28x1 output

Usually we use multiple filters per layer A new kernel/filter slides over the same image Create a new filtered image

Many activation maps create a new image If we filter the image 6 times, we get a new image with 6 channels.

A onelayer, twofilter network

A onelayer, twofilter network

A onelayer, twofilter network

A onelayer, twofilter network

In convolutional networks, layers are 3D...

kernels are 4D If we combine all the filters we get a 4D tensor The operation can be viewed as: a matrix multiplication for each spatial position a sum over spatial dimensions This is a useful representation as many deep learning frameworks present it in this way

Convolutional neural network consist of multiple layers

Convolutional neural network consist of multiple layers

Some stack many layers

Can a convolutional network remember positions? A fully connected network treat each position different...

Can a convolutional network remember positions? A fully connected network treat each position different A convolutional network can first of all keep spatial information in the spatial dimension of the filter bank. More on this later

Receptive field How much can the algorithm see

How large area influence the end result? With a sliding classifier you get the input size as a receptive field Why do we even want a large receptive field? SVM

How large area influence the end result? With a convolutional network the receptive field increase with each layer

How large area influence the end result? With a convolutional network the receptive field increase with each layer 3 inputs influence each node in the first hidden layer

How large area influence the end result? With a convolutional network the receptive field increase with each layer 3 inputs influence each node in the first hidden layer 5 influence the next...

How large area influence the end result? With a convolutional network the receptive field increase with each layer 3 inputs influence each node in the first hidden layer 5 influence the next...

How many inputs can influence each output?

The receptive field grow with k1 for each layer

The receptive field grow with k1 for each layer two 3x3 layers = one 5x5 layers

The receptive field grow with k1 for each layer two 3x3 layers = one 5x5 layers So should we use 3x3 or 5x5?

The receptive field grow with k1 for each layer two 3x3 layers = one 5x5 layers So should we use 3x3 or 5x5? A 5x5 kernel have: 5*5*(filters_in*filters_out) parameters Two 3x3 kernel have: 2*3*3*(filters_in*filters_out) parameters

Smaller spatial filter size is more parameter efficient A network with many parameters generally more training data and computation time A larger receptive field per parameter is good More layers can give more reuse

How large receptive field did the 152layer ResNet have (it used 3x3 convolutions)?

How large receptive field did the 152layer ResNet have (it used 3x3 convolutions)? 305

Increasing the receptive field more efficiently Why do we need to?

Increasing the receptive field more efficiently Why do we need to? We only need a certain level of abstraction (still a research topic, but indicated in: Wide Residual Networks Wider or Deeper: Revisiting the ResNet Model for Visual Recognition Residual Networks are Exponential Ensembles of Relatively Shallow Networks Low level features also need spatial context Large networks are expensive in computation time and memory

Strided convolutions By skipping positions we can cover a larger area with less computation The effect of the receptive field for the next layer is important

The effect of strided convolutions

The effect of strided convolutions We still cover the whole input Do we have a larger receptive field? The next layer have a larger receptive field 7 compared to 5

The effect of strided convolutions We still cover the whole input Do we have a larger receptive field? The next layer have a larger receptive field 7 compared to 5 The effect can be seen from:

The effect of strided convolutions Essentially all the following layers will have a receptive field multiplied by S Green: stride = 2, Red: stride=2 for first, Blue: stride=1

With strides, spatial dimensions will become smaller Usually some of the of the network capacity is preserved through an increasing number of channels

Can the network still remember positions?

Can the network still remember positions? Yes, the network can still encode positional information in the depth dimension A network can pass positional information (right, left etc.) to different channels

Pooling Spatial reduction and forcing invariance

Maxpooling A strided maximum filtering Choosing the maximum value inside the kernel range

Maxpooling: invariance builtin We saw that a network could learn max or average functions to create invariance With maxpooling you explicitly remove some spatial information This can help both position and rotation invariance As we know many image analysis applications seek results invariant to position

Maxpooling have some important problems Even if we want our final results to be positionally, we may need positional information in the earlier representations Only a small part of the network is updated with gradients each step (learning slower) We calculate a lot of values that is not used

Can the network still remember positions?

Can the network still remember positions? Yes, in a similar way as with strides Give a high value to one channel if the target is to the right and a high value to another channel if the target is to the left. The book calls it approximately invariant to small translations Variant features will be harder to learn compared to invariant features

Dilated convolutions Larger receptive field, without reducing spatial dimensions or increasing the number of parameters

Dilated convolutions Skipping values in the kernel Same as filling the kernel with every other value as zero Still cover all inputs Larger kernel with no extra parameters

A growing dilation factor can give similar effect as stride With a constant dilation factor you get the same effect as using a larger kernel With growing dilation you can get an even larger receptive field, while still covering all inputs

A growing dilation factor can give similar effect as stride With audio signals, as with this application receptive field is even more important.

Next week: Monday: Introduction to tensorflow (small lecture and coding): Why use tensorflow? Tensorflow compared to numpy Friday: Residual networks Convolutional neural networks for segmentation and localisation