CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET

Similar documents
Introduction to Machine Learning

Lecture 11-1 CNN introduction. Sung Kim

Deep Learning. Dr. Johan Hagelbäck.

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

CSC 578 Neural Networks and Deep Learning

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Biologically Inspired Computation

ECS 289G UC Davis Paper Presenta6on #1

Convolutional Networks Overview

Coursework 2. MLP Lecture 7 Convolutional Networks 1

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

6. Convolutional Neural Networks

Convolutional neural networks

Research on Hand Gesture Recognition Using Convolutional Neural Network

Sketch-a-Net that Beats Humans

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

arxiv: v1 [cs.ce] 9 Jan 2018

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

یادآوری: خالصه CNN. ConvNet

Camera Model Identification With The Use of Deep Convolutional Neural Networks

Vehicle Color Recognition using Convolutional Neural Network

Carnegie Mellon University, University of Pittsburgh

Fully Convolutional Networks for Semantic Segmentation

Image Manipulation Detection using Convolutional Neural Network

CS 7643: Deep Learning

CPSC 340: Machine Learning and Data Mining. Convolutional Neural Networks Fall 2018

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Lecture 17 Convolutional Neural Networks

Image Recognition of Tea Leaf Diseases Based on Convolutional Neural Network

CSC321 Lecture 11: Convolutional Networks

LANDMARK recognition is an important feature for

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Generating an appropriate sound for a video using WaveNet.

Impact of Automatic Feature Extraction in Deep Learning Architecture

Compact Deep Convolutional Neural Networks for Image Classification

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

Convolutional Neural Networks

Semantic Segmentation on Resource Constrained Devices

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Radio Deep Learning Efforts Showcase Presentation

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

Learning Deep Networks from Noisy Labels with Dropout Regularization

Analyzing features learned for Offline Signature Verification using Deep CNNs

Convolutional Neural Network-based Steganalysis on Spatial Domain

EE-559 Deep learning 7.2. Networks for image classification

Study Impact of Architectural Style and Partial View on Landmark Recognition

Lecture 23 Deep Learning: Segmentation

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

Colorful Image Colorizations Supplementary Material

A Deep Learning Approach To Universal Image Manipulation Detection Using A New Convolutional Layer

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

Embedding Artificial Intelligence into Our Lives

Convolutional Neural Networks: Real Time Emotion Recognition

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

What Is And How Will Machine Learning Change Our Lives. Fair Use Agreement

INFORMATION about image authenticity can be used in

The Art of Neural Nets

Correlating Filter Diversity with Convolutional Neural Network Accuracy

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

Landmark Recognition with Deep Learning

Image Classification using Convolutional Neural Networks

Convolutional Neural Networks for Small-footprint Keyword Spotting

Counterfeit Bill Detection Algorithm using Deep Learning

Introduction to Machine Learning

Scalable systems for early fault detection in wind turbines: A data driven approach

An Hybrid MLP-SVM Handwritten Digit Recognizer

arxiv: v3 [cs.cv] 18 Dec 2018

Artificial Intelligence and Deep Learning

Deep Learning Convolutional Neural Networks for Radio Identification

Road detection with EOSResUNet and post vectorizing algorithm

Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model

GESTURE RECOGNITION WITH 3D CNNS

Multi-frame convolutional neural networks for object detection in temporal data

Norsk Regnesentral (NR) Norwegian Computing Center

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Automatic point-of-interest image cropping via ensembled convolutionalization

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

Statistical Tests: More Complicated Discriminants

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 83

Diet Networks: Thin Parameters for Fat Genomics

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Deep filter banks for texture recognition and segmentation

Convolutional Neural Networks. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 5-1

Convolu'onal Neural Networks. November 17, 2015

Proposers Day Workshop

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Study guide for Graduate Computer Vision

THE aesthetic quality of an image is judged by commonly

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

Multimedia Forensics

arxiv: v2 [cs.mm] 12 Jan 2018

Facial Emotion Detection Using Different CNN Architectures: Hybrid Vehicle Driving

A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks

arxiv: v2 [cs.lg] 13 Oct 2018

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

Transcription:

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET

MOTIVATION

Fully connected neural network Example 1000x1000 image 1M hidden units 10 12 (= 10 6 10 6 ) parameters! Observation Spatial correlation is local

Locally connected neural net Example 1000x1000 image 1M hidden units Filter size: 10x10 10 8 (= 10 6 10 10) parameters! Observation Statistics is similar at different locations

Convolution network Share the same parameters across different locations Convolution with learned kernels Learn multiple filters 1000x1000 image 100 Filters Filter size: 10x10 10,000 parameters

Convolution neural networks We can design neural networks that are specifically adapted for these problems Must deal with very high-dimensional inputs 1000x1000 pixels Can exploit the 2D topology of pixels Can build in invariance to certain variations we can expect Translations, etc Ideas Local connectivity Parameter sharing

CONVOLUTION (IMAGE PROCESSING)

Convolution from: https://developer.apple.com/library/ios/documentation/performance/ Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

Linear filter

Linear filter (Gaussian)

L f

CONVOLUTION (DEEP LEARNING)

ALEXNET

THE IMAGENET LARGE SCALE VISUAL RECOGNITION CHALLENGE (ILSVRC)

Backpack

Flute Strawberry Traffic light Backpack Matchstick Bathing cap Sea lion Racket

Large-scale recognition

Large-scale recognition

Large Scale Visual Recognition Challenge (ILSVRC) 2010-2012 1000 object classes 1,431,167 images Dalmatian http://image-net.org/challenges/lsvrc/{2010,2011,2012}

Variety of object classes in ILSVR C

ILSVRC Task 1: Classification Steel drum

ILSVRC Task 1: Classification Steel drum Output: Scale T-shirt Steel drum Drumstick Mud turtle Output: Scale T-shirt Giant panda Drumstick Mud turtle

ILSVRC Task 1: Classification Steel drum Output: Scale T-shirt Steel drum Drumstick Mud turtle Output: Scale T-shirt Giant panda Drumstick Mud turtle Accuracy = 1 N ΣN images 1[correct on image i]

ILSVRC Task 2: Classification + Loca lization Steel drum

ILSVRC Task 2: Classification + Loca lization Steel drum Output Persian cat Picket fence Steel drum Foldin g chair Loud s peaker

ILSVRC Task 2: Classification + Loca lization Steel drum Output Persian cat Picket fence Steel drum Foldin g chair Loud s peaker Persian cat Picket fence Output (bad localization) Steel drum Foldin g chair Loud s peaker Output (bad classification) Persian cat Picket fence King pen guin Foldin g chair Loud s peaker

ILSVRC Task 2: Classification + Loca lization Steel drum Output Persian cat Picket fence Steel drum Foldin g chair Loud s peaker Accuracy = 1 N 1[correct on image i] ΣNimages

Classification: Comparison Submission Method Error rate SuperVision Deep CNN 0.16422 ISI XRCE/INRIA OXFORD_VGG FV: SIFT, LBP, GIST, CSIFT FV: SIFT and color 1M-dim features FV: SIFT and color 270K-dim features 0.26172 0.27058 0.27302

Classification + Localization

SuperVision (SV) Image classification: Deep convolutional neural networks 7 hidden weight layers, 650K neurons, 60M parameters, 630M conn ections Rectified Linear Units, max pooling, dropout trick Randomly extracted 224x224 patches for more data Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x,y,w,h) http://image-net.org/challenges/lsvrc/2012/supervision.pdf

SuperVision

Object Recognition

ALEXNET

AlexNet AlexNet: won the 2012 ImageNet competition by making 40% l ess error than the next best competitor It is composed of 5 convolutional layers The input is a color RGB image Computation is divided over 2 GPU architectures Learning uses artificial data augmentation and connection drop-out to avoi d over-fitting

AlexNet in details The first layer applies 96 kernels of size 3x11x11 34,848 parameters Each kernel is applied with a stride of 4 pixels (11x11x3)x(55x55x(48+48)) = 105,415,200 MACs

AlexNet in details The second layer applies 256 kernels of size 48x5x5 After applying a 3x3 max pooling with a stride of 2 pixels 307,200 parameters 256x(48x5x5)x(27x27)=223,948,800 MACs

AlexNet in details The third layer applies 384 kernels of size 256x3x3 After applying a 3x3 max pooling with a stride of 2 pixels 884,736 parameters 384x((128+128)x3x3)x(13x13)=149,520,384 MACs

AlexNet in details The fourth layer applies 384 kernels of size 192x3x3 Without pooling 663,552 parameters 384x(192x3x3)x(13x13)=112,140,288 MACs

AlexNet in details The fifth layer applies 256 kernels of size 192x3x3 Without pooling 442,368 parameters 256x(192x3x3)x(13x13)=74,760,192 MACs

AlexNet in details The output of the fifth layer (after a 3x3 max pooling with a stride of 2 pixels) is connected to a fully connected 3-layer perceptron 1 st layer (2x6x6x128)x4096= 37,748,736connections 2 nd layer 4096x4096= 16,777,216 connections 3 rd layer 4096x1000= 4,096,000 connections

AlexNet in details 60 Million parameters, 832M MAC ops Parameters: 35K 307K 884K 653K 442K 37M 16M 4M MAC ops: 105M 223M 149M 112M 74M 37M 16M 4M

BACKUPS

Complexity of a CNN classifier Apply the filter bank Each input image of size MxM is convoluted with K kernels each of size NxN KxMxMxNxN MAC operations Applying the non-linearity usually done through look-up tables Performing pooling Pooling aggregates the values of a VxV regions by applying an average or a max operation The image is subsampled by applying the pooling every P pixels (MxM)/(PxP) pooling operations over sets of size VxV Each fully connected layer of a perceptron involves LixLo MAC operations where L is the number of neurons (in input and outpu t layers)