Spatial Transformer Networks

Size: px

Start display at page:

Download "Spatial Transformer Networks"

Brice Martin
5 years ago
Views:

1 Spatial Transformer Networks Kaichun Mo Shanghai Jiao Tong University July 28, 2015 Kaichun Mo STN July 28, / 29

2 Overview 1 Spatial Transformer Network 2 Caffe: A popular deep learning framework 3 Future Work Kaichun Mo (SJTU@Cornell) STN July 28, / 29

3 Start from the interesting part: Spatial Transformer Network! Kaichun Mo STN July 28, / 29

4 Spatial Transformer Network: Ideal Images Kaichun Mo STN July 28, / 29

5 Spatial Transformer Network: Ideal Images Kaichun Mo STN July 28, / 29

6 Spatial Transformer Network: ideal Images Kaichun Mo STN July 28, / 29

7 Spatial Transformer Network: ideal Images Kaichun Mo STN July 28, / 29

8 Spatial Transformer Network: Real Images Kaichun Mo STN July 28, / 29

9 Spatial Transformer Network: Spatial Variance Kaichun Mo STN July 28, / 29

10 Introduction: Spatial Transformer Layer A new type of layer that is designed to perserve the spatial invariance It is expected to be more powerful than pooling layer Kaichun Mo (SJTU@Cornell) STN July 28, / 29

11 Spatial Transformer Layer: Algorithm Localisation Network: For each input i, output its specific transform matrix θ i Grid generator and Sampler: Compute the transformed result using θ i Kaichun Mo (SJTU@Cornell) STN July 28, / 29

12 Spatial Transformer Network: Localisation Net Localisation Network: For each input image, try to learn the specific transformation matrix θ i by which it was deformed After learning that, we can reverse it as possible as we can Kaichun Mo (SJTU@Cornell) STN July 28, / 29

13 Spatial Transformer Network: Algorithm Localisation Network: For each input i, output its specific transform parameter θ i Grid generator and Sampler: Compute the transformed result using parameter θ i Kaichun Mo (SJTU@Cornell) STN July 28, / 29

14 Spatial Transformer Network: Grid Generator For each input, using localisation network, we know the transformation matrix to reverse the deformation Then, we need to perform this matrix on the input image by first generating the grid Kaichun Mo STN July 28, / 29

15 Spatial Transformer Network: Sampler Some projected point on the input image may not be sampled We need to use interpolation technique to sample it Ex bilinear Kaichun Mo STN July 28, / 29

16 Spatial Transformer Layer: Algorithm Localisation Network: For each input i, output its specific transform matrix θ i Grid generator and Sampler: Compute the transformed result using θ i Kaichun Mo (SJTU@Cornell) STN July 28, / 29

17 Spatial Transformer Network: Differentiability Kaichun Mo STN July 28, / 29

18 Spatial Transformer Network: Differentiability In order to learn the weights in localisation network and perform back propogation during training, we need the forwarding function to be differentiable Fortunately, it is, at least in sense of subgradient! More detail in paper Kaichun Mo STN July 28, / 29

19 Spatial Transformer Network: Advantages Efficiency: highly localized highly parallelizable GPU acceleration End-to-end training: can be seamlessly incorporated into neural network no pre-training is required Spatial Invariance: Make neural network to be less vulnerable to spatial transformations Kaichun Mo STN July 28, / 29

20 Introduction to Caffe! Kaichun Mo STN July 28, / 29

21 Caffe: Website Kaichun Mo STN July 28, / 29

22 Caffe: Why Use It? Kaichun Mo STN July 28, / 29

23 Caffe: Why Use It? Kaichun Mo STN July 28, / 29

24 Caffe: Network Definition Kaichun Mo STN July 28, / 29

25 Caffe: Layer Definition Kaichun Mo STN July 28, / 29

26 Caffe: Solver Definition Kaichun Mo STN July 28, / 29

27 Future Work Finish implementing Spatial Transformer Layer on Caffe Test its performance on different vision task This layer should be powerful whenever images are not spatially aligned and attention or localisation is necessary Kaichun Mo STN July 28, / 29

28 References Official Website: Official Tutorial: DIY Deep Learning for Vision with Caffe Spatial Transformer Networks, by Max Jaderberg, et al Kaichun Mo STN July 28, / 29

29 Thank you for listening! Q&A Kaichun Mo STN July 28, / 29

CSC321 Lecture 11: Convolutional Networks

CSC321 Lecture 11: Convolutional Networks Roger Grosse Roger Grosse CSC321 Lecture 11: Convolutional Networks 1 / 35 Overview What makes vision hard? Vison needs to be robust to a lot of transformations