Understanding Neural Networks : Part II

Similar documents
Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Biologically Inspired Computation

Pelee: A Real-Time Object Detection System on Mobile Devices

ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions

Research on Hand Gesture Recognition Using Convolutional Neural Network

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

Deep Learning. Dr. Johan Hagelbäck.

Deep Neural Network Architectures for Modulation Classification

EE-559 Deep learning 7.2. Networks for image classification

Convolutional neural networks

Colorful Image Colorizations Supplementary Material

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Lecture 23 Deep Learning: Segmentation

arxiv: v5 [cs.cv] 23 Aug 2017

arxiv: v3 [cs.cv] 18 Dec 2018

Semantic Segmentation on Resource Constrained Devices

arxiv: v2 [cs.cv] 11 Oct 2016

Convolutional Networks Overview

Impact of Automatic Feature Extraction in Deep Learning Architecture

Xception: Deep Learning with Depthwise Separable Convolutions

Radio Deep Learning Efforts Showcase Presentation

CS 7643: Deep Learning

Introduction to Machine Learning

LANDMARK recognition is an important feature for

یادآوری: خالصه CNN. ConvNet

Semantic Segmentation in Red Relief Image Map by UX-Net

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Lecture 17 Convolutional Neural Networks

Camera Model Identification With The Use of Deep Convolutional Neural Networks

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Free-hand Sketch Recognition Classification

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

Continuous Gesture Recognition Fact Sheet

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

Generating an appropriate sound for a video using WaveNet.

DSNet: An Efficient CNN for Road Scene Segmentation

arxiv: v1 [cs.sd] 29 Jun 2017

Convolutional Neural Networks

Design of Parallel Algorithms. Communication Algorithms

CSC 578 Neural Networks and Deep Learning

Can you tell a face from a HEVC bitstream?

Coursework 2. MLP Lecture 7 Convolutional Networks 1

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

arxiv: v1 [cs.sd] 1 Oct 2016

En ny æra for uthenting av informasjon fra satellittbilder ved hjelp av maskinlæring

Wide Residual Networks

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Visualizing and Understanding. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 12 -

CSC321 Lecture 11: Convolutional Networks

Image Manipulation Detection using Convolutional Neural Network

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

CPSC 340: Machine Learning and Data Mining. Convolutional Neural Networks Fall 2018

A Deep Learning Approach To Universal Image Manipulation Detection Using A New Convolutional Layer

A Neural Algorithm of Artistic Style (2015)

arxiv: v1 [cs.cv] 3 May 2018

On the Robustness of Deep Neural Networks

Sampling and reconstruction. CS 4620 Lecture 13

Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Performance Evaluation of Edge Detection Techniques for Square Pixel and Hexagon Pixel images

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

Learning a Dilated Residual Network for SAR Image Despeckling

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

arxiv: v1 [cs.cv] 23 May 2016

Creating Intelligence at the Edge

Introduction to DSP ECE-S352 Fall Quarter 2000 Matlab Project 1

Lecture 11-1 CNN introduction. Sung Kim

Analysis on Color Filter Array Image Compression Methods

arxiv: v1 [cs.lg] 2 Jan 2018

CIS581: Computer Vision and Computational Photography Homework: Cameras and Convolution Due: Sept. 14, 2017 at 3:00 pm

Automatic point-of-interest image cropping via ensembled convolutionalization

Does Haze Removal Help CNN-based Image Classification?

Thermal Image Enhancement Using Convolutional Neural Network

Sampling and reconstruction

Sampling and Reconstruction

arxiv: v1 [cs.cv] 21 Nov 2018

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

Image Filtering. Median Filtering

Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning

CSE 166: Image Processing. Overview. What is an image? Representing an image. What is image processing? History. Today

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Vision Review: Image Processing. Course web page:

Learning to Understand Image Blur

clcnet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions

Artistic Image Colorization with Visual Generative Networks

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

Fully Convolutional Networks for Semantic Segmentation

DEEP LEARNING ON RF DATA. Adam Thompson Senior Solutions Architect March 29, 2018

Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model

EXACT SIGNAL RECOVERY FROM SPARSELY CORRUPTED MEASUREMENTS

Image Processing (EA C443)

On the design and efficient implementation of the Farrow structure. Citation Ieee Signal Processing Letters, 2003, v. 10 n. 7, p.

arxiv: v1 [cs.cv] 28 Nov 2017 Abstract

Toward Non-stationary Blind Image Deblurring: Models and Techniques

Transcription:

TensorFlow Workshop 2018 Understanding Neural Networks Part II : Convolutional Layers and Collaborative Filters Nick Winovich Department of Mathematics Purdue University July 2018

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Convolutional Layers While fully-connected layers provide an effective tool for analyzing general data, the associated dense weight matrices can be inefficient to work with. Fully-connected layers also have no awareness of spatial information (consider reindexing the dataset inputs). When working with data which is spatially structured (e.g. images, function values on a domain, etc.), convolutional layers provide an efficient, spatially aware approach to data processing. Another key advantage to using convolutional layers is the fact that hardware accelerators, such as GPUs, are capable of applying the associated convolutional filters extremely efficiently by design.

Convolutional Filters/Kernels The key concept behind convolutional network layers is that of filters/kernels. These filters consist of small arrays of trainable weights which are typically arranged as squares or rectangles. Though shaped like matrices, the multiplication between filter weights and input values is performed element-wise Filters are designed to slide across the input values to detect spatial patterns in local regions; by combining several filters in series, patterns in larger regions can also be identified

Example: Convolutional Layer (with Stride=2)

Example: Convolutional Layer (with Stride=2)

Example: Convolutional Layer (with Stride=2)

Example: Convolutional Layer (with Stride=2)

Matrix Representation * The bias term and activation function have been omitted for brevity

Floating Point Operation Count For a convolutional layer with filter of size k k applied to a two dimensional input array with resolution R R, we have: k 2 R 2 multiplication ops between filter weights and inputs (k 2 1) R 2 addition ops to sum the k 2 values in each position R 2 addition ops for adding the bias term b to each entry 2 k 2 R 2 FLOPs The true FLOP count depends on the choice of stride and padding; but the count is generally close to the upper-bound given above.

Transposed Convolutional Layers Transposed convolutional layers play a complementary role to standard convolutional layers and are commonly used to increase the spatial resolution of data/features As the name suggests, the matrix which defines this network layer is precisely the transpose of a standard convolutional layer

Matrix Representation * The bias term and activation function have been omitted for brevity

Convolutional Layer: Multiple Channels and Filters Up until now, we have only discussed convolutional layers between two arrays with a single channel. A convolutional layer between an input array with N channels and an output array with M channels can be defined by a collection of N M distinct filters, with weight matrices W (n,m) for n {1,..., N} and m {1,..., M}, which correspond to the connections between input and output channels. Each output channel is also assigned a bias term, b (m) R for m {1,..., M}, and the final outputs for channel m are given by: y (m) ( = f n ) W(n,m) x (n) + b (m) The weight matrices W (n,m) typically correspond to filter weights w (n,m) of the same shape; we will see later how to generalize this.

Number of Trainable Parameters A convolutional layer between an input array with N channels and an output feature array with M channels therefore consists of: k 2 M N weights + M biases Moreover, a calculation analogous to that used for the single channel case shows that the FLOP count for the layer is: 2 k 2 R 2 M N FLOPs Note: The filter size k must be kept relatively small in order to maintain a manageable number of trainable variables and FLOPs.

Receptive Fields While small filters may appear capable of only local detection, when used in series much larger patterns can be also be found The receptive fields, or regions of influence, for feature values later in the network are much larger than those at the beginning

Sparsity and Hardware Accelerators Hardware accelerators, such as GPUs, leverage the availability of thousands of cores to quickly compute the matrix-vector products associated with a convolutional layer in parallel Weight matrices for convolutional layers are extremely sparse, highly structured, and have only a handful of distinct values Specialized libraries exist with GPU-optimized implementations of the computational primitives used for these calculations: cudnn: Efficient Primitives for Deep Learning Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B. and Shelhamer, E., 2014. cudnn: Efficient primitives for deep learning. arxiv preprint arxiv:1410.0759.

Note on Half-Precision Computations Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P., 2015, June. Deep learning with limited numerical precision. In International Conference on Machine Learning (pp. 1737-1746). It is possible to train networks using half-precision (i.e. 16-bit) fixed-point number representations without losing the accuracy achieved by single-precision floating-point representations This is possible in part due to the use of stochastic rounding: Round(x) = x with probability 1 x x ε x + ε with probability x x ε

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Strides and Padding When defining convolutional layers, it is also necessary to specify how quickly, and to what extent, the filter slides across the inputs; these properties are controlled by stride and padding parameters. A horizontal stride I and vertical stride J results in a filter which moves across rows in steps of I, e.g. x 1,1, x 1,1+I, x 1,1+2I, etc., and skips down rows by steps of J once the current row ends. Padding is used to determine which positions are admissable for the filter (e.g. when should the filter proceed to the next row). Same padding: zeros are added to pad the array if necessary Valid padding: the filter is only permitted to continue to positions where all of its values fit entirely inside the array

Example: Stride=1 with Valid Padding

Example: Stride=1 with Valid Padding

Example: Stride=1 with Valid Padding

Example: Stride=1 with Valid Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Example: Stride=1 with Same Padding

Same Padding vs.valid Padding Same Padding Same padding ensures that every input value is included, but also adds zeros near the boundary which are not in the original input. Valid Padding Valid padding only uses values from the original input; however, when the data resolution is not a multiple of the stride, some boundary values are ignored entirely in the feature calculation.

Additional References Additional references for visualizing and understanding the concepts of stride and padding in convolutional layers are: A guide to convolution arithmetic for deep learning by Vincent Dumoulin and Francesco Visin (2016) https://arxiv.org/abs/1603.07285 The associated GitHub page with animations and source files: https://github.com/vdumoulin/conv arithmetic/

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Downsampling Techniques As was shown earlier, convolutional layers with non-trivial stride result in a reduction in spatial resolution. In some applications, performance can be improved by instead using a convolution with stride 1 followed by a dedicated downsampling procedure: Max Pooling: filter shape, strides, and padding are specified and the maximum value under the filter is returned for each position. Average Pooling: essentially the same as max pooling, but returns the average of the values under the filter.

Upsampling Techniques Similarly, transposed convolutional layers can be used to increase the spatial resolution. However, it may be helpful to instead use a convolution with stride 1 and a dedicated upsampling procedure: Bilinear/Bicubic Interpolation: used to perform upsampling when the result is expected to have smooth, continuous values Nearest-neighbor Interpolation: useful for upsampling when the result is expected to have sharp boundaries or discontinuities

Channels and Resolution As the spatial resolution of features is decreased/downsampled, the channel count is typically increased to help avoid reducing the overall size of the information stored in features too rapidly.

Channels and Resolution Similarly, the channel counts of features are typically decreased whenever the spatial resolution is increased/upsampled.

Example: Implementation: Convolution and Pooling # Input Shape = [None, 64, 64, 1] # CONV: [None, 64, 64, 1] --> [None, 64, 64, 4] h = tf.layers.conv2d(x, 4, 3, padding="same", activation=tf.nn.relu) # POOL: [None, 64, 64, 4] --> [None, 32, 32, 4] h = tf.layers.max_pooling2d(h, 3, 2, padding="same") # CONV: [None, 32, 32, 4] --> [None, 30, 30, 8] h = tf.layers.conv2d(h, 8, 3, padding="valid", activation=tf.nn.relu) # POOL: [None, 30, 30, 8] --> [None, 15, 15, 8] h = tf.layers.max_pooling2d(h, 2, 2, padding="same")

Example Implementation: Transposed Convolution # Shortened names for brevity conv2d_transpose = tf.layers.conv2d_transpose lrelu = tf.nn.leaky_relu # Input Shape = [None, 4, 4, 128] # TCONV: [None, 4, 4, 128] --> [None, 8, 8, 64] h = conv2d_transpose(x, 64, 3, strides=(2, 2), padding="same", activation=lrelu) # TCONV: [None, 8, 8, 64] --> [None, 17, 17, 32] h = conv2d_transpose(h, 32, 3, strides=(2, 2), padding="valid", activation=lrelu)

Example Implementation: Bilinear Interpolation # Shortened names for brevity bilinear = tf.image.resizemethod.bilinear lrelu = tf.nn.leaky_relu # Input Shape = [None, 4, 4, 128] # CONV: [None, 4, 4, 128] --> [None, 4, 4, 64] h = tf.layers.conv2d(x, 64, 3, padding="same", activation=lrelu) # INTERP: [None, 4, 4, 64] --> [None, 8, 8, 64] h = tf.image.resize_images(h, [8,8], method=bilinear)

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Collaborative Filters Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). Network layers can be systematically organized in blocks, or modules, which facilitate collaboration between different filters These modules provide a multi-scale, multimodal approach to processing input data and features throughout the network

Inception v1 Block (naïve version) Diagram from Going deeper with convolutions

Using 1x1 Filters for Dimension Reduction The pooling layer in this naïve version of the module produces features with the same number of channels as the orginal input To balance the impact of each component in the module, it is natural to assign this channel count to features from each layer; when this channel count is relatively high, however, the layers with larger filters can become prohibitively expensive Alternatively, 1 1 convolutional layers can be used as a form of dimension reduction to help limit the computational demand and balance the size of features produced by each component

Inception v1 Block (with dimension reduction) Diagram from Going deeper with convolutions

Factoring Large Filters for Improved Efficiency While dimension reduction can be used to improve efficiency in part, the large filter sizes still pose a problem. A compromise between the full expressiveness of large filters and the efficiencies of small filters is to factor the larger filters into smaller, more efficient ones. From Rethinking the Inception Architecture for Computer Vision This factorization can be approximated by using a series/tower of consecutive convolutional layers with smaller filters By construction, the resulting component produces features with receptive fields identical to those of the original layer

Inception v2 Block Diagram from Rethinking the Inception Architecture for Computer Vision

Definition for inception v2(x, chans, name) conv2d = tf.layers.conv2d; lrelu = tf.nn.leaky_relu; """ 1x1 CONV + 3x3 CONV """ h1 = conv2d(x, chans, 1, activation = lrelu, padding = "same", name = name + "_1a") h1 = conv2d(h1, chans, 3, activation = lrelu, padding = "same", name = name + "_1b") """ 1x1 CONV + 3x3 CONV + 3x3 CONV """ h2 = conv2d(x, chans, 1, activation = lrelu, padding = "same", name = name + "_2a") h2 = conv2d(h2, chans, 3, activation = lrelu, padding = "same", name = name + "_2b") h2 = conv2d(h2, chans, 3, activation = lrelu, padding = "same", name = name + "_2c")

Definition for inception v2(x, chans, name) """ 3x3 MAX POOL + 1x1 CONV """ h3 = tf.layers.max_pooling2d(x, 3, 1, padding = "same") h3 = conv2d(h3, chans, 1, activation = lrelu, padding = "same", name = name + "_3") """ 1x1 CONV """ h4 = conv2d(x, chans, 1, activation = lrelu, padding = "same", name = name + "_4") h = tf.concat([h1,h2,h3,h4],3)

Implementation Note on Factorization If our main goal is to factorize the linear part of the computation, would it not suggest to keep linear activations in the first layer? We have ran several control experiments (for example see figure 2) and using linear activation was always inferior to using rectified linear units in all stages of the factorization. (Rethinking the Inception Architecture) The motivation of factoring large filters suggests only using activations for the final layer of each series/tower in a block Including activation functions in the intermediate block layers as well tends to improve the network s performance in practice

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Residual Learning He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). Instead of training layers to produce the full set of features H(x) directly, we can design network layers to learn residual changes: F(x) = H(x) x This can be done by including shortcuts, or skip connections, which allow features to pass through without modification These skip connections provide a way for the network to determine how many active layers are actually necessary

Example ResNet Block Diagram from Deep Residual Learning for Image Recognition

Implementation of Example ResNet Block """ Define ResNet block with 2-layer shortcut """ def resnet_block(x, chans, kernel_size): # Layer 1 r = tf.layers.conv2d(x, chans, kernel_size, padding="same", use_bias=false) r = tf.layers.batch_normalization(r) r = tf.nn.relu(r) # Layer 2 r = tf.layers.conv2d(r, chans, kernel_size, padding="same", use_bias=false) r = tf.layers.batch_normalization(r) # Shortcut h = tf.nn.relu(tf.add(r,x)) return h

Outline 1 Convolutional Neural Networks Convolutional Layers Strides and Padding Pooling and Upsampling 2 Advanced Network Design Collaborative Filters Residual Blocks Dense Convolutional Blocks

Densely Connected Convolutional Networks Huang, G., Liu, Z., Weinberger, K.Q. and van der Maaten, L., 2017, July. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, No. 2, p. 3). He, K., Zhang, X., Ren, S. and Sun, J., 2016, October. Identity mappings in deep residual networks. In European conference on computer vision (pp. 630-645). Springer, Cham. DenseNets exploit the potential of the network through feature reuse, yielding condensed models that are easy to train and highly parameterefficient. (Huang et al.) A variation on the underlying idea behind skip connections is provided by passing the unmodified features of several previous network layers to the current layer all at once

DenseNet Blocks Diagram from Densely Connected Convolutional Networks

Layers in DenseNet Blocks The input to the l th layer of a dense block consists of features from all previous layers: [x 0, x 1,..., x l 1 ] The new features x l produced by the l th block layer are the ouputs of the 3 3 convolution These new features are concatenated with the previous features and passed to the next layer

Implementation of DenseNet Blocks """ BN-ReLU-Conv layers within DenseNet blocks """ def block_layer(x, chans): h = tf.layers.batch_normalization(x) h = tf.nn.relu(h) h = tf.layers.conv2d(h, chans, 3, padding = "same") return tf.concat([x,h], 3) """ Define a DenseNet block with k layers """ def block(x, chans, k): for i in range(0,k): x = block_layer(x, chans) return x