Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

Convolutionl Networks Lecture slides for Chpter 9 of Deep Lerning In Goodfellow 2016-09-12

Convolutionl Networks Scle up neurl networks to process very lrge imges / video sequences Sprse connections Prmeter shring Automticlly generlize cross sptil trnsltions of inputs Applicble to ny input tht is lid out on grid (1-D, 2-D, 3-D, )

Key Ide Replce mtrix multipliction in neurl nets with convolution Everything else stys the sme Mximum likelihood Bck-propgtion etc.

Mtrix (Dot) Product C = AB. (2.4) s defined by X C i,j = X k A i,k B k,j. (2.5) m = m n p n Must mtch p

tions hve mny useful properties tht mke mthemticl rtnt opertion on mtrices is the trnspose. The trnspose of CHAPTER 2. LINEAR ALGEBRA more convenient. For exmple, mtrix multipliction is mirror imge of the mtrix cross digonl line, clled the min Mtrix Trnspose ning down nd to the right, strting from its upper left corner. See grphicl depiction of = thisab opertion. We denote the trnspose of A(B + C) + AC. A>, nd it is defined such tht (A> )i,j = Aj,i. (2.3) A(BC) = (AB)C. n be thought of s mtrices tht contin only one column. The 2 (the 3 not iscommuttive condition AB = BA isvector therefore mtrix with only one row. Sometimes we A1,1 A1,2 (2.6) (2.7) does not A A 4 A 5)A = A A A = lr multipliction. 33However, the dot between two A product A A A A : > > x trnspose y = yof x. (2.8)the Figure 2.1: The the mtrix cn be thought of s mirror imge cross 2,1 2,2 3,1 3,2 > 1,1 2,1 3,1 1,2 2,2 3,2 min digonl. mtrix product hs simple form: the i-th column of A. When we need to explicitly identify the elements of > > s > mtrix, we write them (AB) = B A n. rry enclosed in squre brckets: (2.9) A1,1 A1,2. (2.2) onstrte Eq. 2.8, by exploiting A2,1 the A2,2fct tht the vlue of

2D Convolution Input b c d e f g h i j k l Kernel w y x z Output w + bx + ey + fz bw + cx + fy + gz cw + dx + gy + hz ew + fx + iy + jz fw + gx + jy + kz gw + hx + ky + lz An exmple of 2-D convolution without kernel-flipping. In this cs Figure 9.1

Three Opertions Convolution: like mtrix multipliction Tke n input, produce n output (hidden lyer) Deconvolution : like multipliction by trnspose of mtrix Used to bck-propgte error from output to input Reconstruction in utoencoder / RBM Weight grdient computtion Used to bckpropgte error from output to weights Accounts for the prmeter shring

Sprse Connectivity Sprse connections due to smll convolution kernel s4 s5 s4 s5 Dense connections Figure 9.2

Sprse Connectivity CHAPTER 9. CONVOLUTIONAL NETWORKS Sprse connections due to smll convolution kernel s4 s5 s4 s5 Dense connections Figure 9.3: Figure Sprse connectivity, viewed from9.3 bove: We highlight one output unit,,

ceptive field of. (Top)When s is formed by convolution with only three inputs ﬀect. (Bottom)When s is formed by mtrix multi ity is no longer sprse, so ll of the inputs ﬀect. Growing Receptive Fields g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 4: The receptive field of the units in the deeper lyers of convolution hn the receptive field of the units in the shllow lyers. This eﬀect in Figure 9.4 rk includes rchitecturl fetures like strided convolution (figure 9.12) o

Prmeter Shring CHAPTER 9. CONVOLUTIONAL NETWORKS Convolution shres the sme prmeters cross ll sptil loctions Trditionl mtrix multipliction does not shre ny prmeters s4 s5 s4 s5 Figure 9.5: Prmeter shring: Blck rrows indicte the connections(goodfellow tht 2016) use Figure 9.5

Edge Detection by Convolution Input -1-1 Output Kernel Figure 9.6

Efficiency of Convolution Input size: 320 by 280 Kernel size: 2 by 1 Output size: 319 by 280 Convolution Dense mtrix Sprse mtrix Stored flots 2 319*280*320*280 > 8e9 2*319*280 = 178,640 Flot muls or dds 319*280*3 = 267,960 > 16e9 Sme s convolution (267,960)

Convolutionl Network Components Complex lyer terminology Next lyer Simple lyer terminology Next lyer Convolutionl Lyer Pooling stge Pooling lyer Detector stge: Nonlinerity e.g., rectified liner Detector lyer: Nonlinerity e.g., rectified liner Convolution stge: Affine trnsform Convolution lyer: Affine trnsform Input to lyer Input to lyers The components of typicl convolutionl neurl network lyer. T Figure 9.7

Mx Pooling nd Invrince to Trnsltion POOLING STAGE... 1. 1. 1. 0.2...... 0.1 1. 0.2 0.1... DETECTOR STAGE POOLING STAGE... 0.3 1. 1. 1.... 0.3 0.1 1. 0.2...... DETECTOR STAGE Figure 9.8

Cross-Chnnel Pooling nd Invrince to Lerned Trnsformtions Lrge response in pooling unit Lrge response in detector unit 1 Lrge response in pooling unit Lrge response in detector unit 3 Figure 9.9

Pooling with Downsmpling 1. 0.2 0.1 0.1 1. 0.2 0.1 0.0 0.1 Figure 9.10

Exmple Clssifiction Architectures Output of softmx: 1,000 clss probbilities Output of softmx: 1,000 clss probbilities Output of softmx: 1,000 clss probbilities Output of mtrix multiply: 1,000 units Output of mtrix multiply: 1,000 units Output of verge pooling: 1,000 Output of reshpe to vector: 16,384 units Output of reshpe to vector: 576 units Output of convolution: 166,000 Output of pooling with stride 4: 166x64 Output of pooling to 3 grid: 3x64 Output of pooling with stride 4: 166x64 Output of convolution + ReLU: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of pooling with stride 4: 64x64x64 Output of pooling with stride 4: 64x64x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 25656x64 Output of convolution + ReLU: 25656x64 Output of convolution + ReLU: 25656x64 Figure 9.11 Input imge: 25656 Input imge: 25656 Input imge: 25656

CHAPTER 9. CONVOLUTIONAL NETWORKS Convolution with Stride Strided convolution Downsmpling z1 z2 z3 z4 z5 Convolution Figure 9.12

Zero Pdding Controls Size Without zero pdding......... With zero pdding.................. Figure 9.13

CHAPTER 9. CONVOLUTIONAL NETWORKS Kinds of Connectivity b c d e s4 f g s5 h i s4 s5 b b b b Locl connection: like convolution, but no shring Convolution s4 s5 Fully connected Figure 9.14: Comprison of locl connections, convolution, nd full connections. (Top)A loclly connected lyer with ptch size of two pixels. Ech edge is lbeled with unique letter to show tht ech edge is ssocited with its own weight prmeter. Figure 9.14

Prtil Connectivity Between Chnnels Output Tensor Input Tensor Chnnel coordintes Figure 9.15 Sptil coordintes

CHAPTER 9. CONVOLUTIONAL NETWORKS Tiled convolution b c d e c f g b s4 h i d s5 s4 b c s5 d s4 s5 b b b b Locl connection (no shring) Tiled convolution (cycle between groups of shred prmeters) Convolution (one group shred everywhere) Figure 9.16: A comprison of loclly connected lyers, tiled convolution, nd stndrd convolution. All three hve the sme sets of connections between units, when the sme size of kernel is used. This digrm illustrtes the use of kernel tht is two pixels wide. Figure 9.16

Recurrent Pixel Lbeling Ŷ (1) Ŷ (2) Ŷ (3) V W V W V H (1) H (2) H (3) U U U X of recurrent convolutionl netw Figure 9.17

Gbor Functions Figure 9.18

Gbor-like Lerned Kernels Figure 9.19

Mjor Architectures Sptil Trnsducer Net: input size scles with output size, ll lyers re convolutionl All Convolutionl Net: no pooling lyers, just use strided convolution to shrink representtion size Inception: complicted rchitecture designed to chieve high ccurcy with low computtionl cost ResNet: blocks of lyers with sme sptil size, with ech lyer s output dded to the sme buffer tht is repetedly updted. Very mny updtes = very deep net, but without vnishing grdient.