یادآوری: خالصه CNN. ConvNet

Size: px

Start display at page:

Download "یادآوری: خالصه CNN. ConvNet"

Gabriel Bryan
5 years ago
Views:

1 1

2 ConvNet یادآوری: خالصه CNN شبکه عصبی کانولوشنال یا Convolutional Neural Networks یا نوعی از شبکههای عصبی عمیق مدل یادگیری آن باناظر.اصالح وزنها با الگوریتم back-propagation مناسب برای داده های حجیم و ماتریسی چرا شبکه عصبی کانولوشنال حذف مرحله استخراج ویژگی از دادگان حجیم مانند عکس مقایسه با شبکه های عصبی غیرعمیق شباهت : اجزا نورونها توابع الگوریتم یادگیری تفاوت: فرض می شود ورودی تصویر است. 2

3 یادآوری: اجزای شبکه عصبی کانولوشنال الیههایشبکهعصبیکانولوشنال ورودی گرفتنمقادیرخامازنقاطتصویر ابعادنمونه 32x32x3 32x32x12 )با 12 فیلتر( 32x32x12 16x16x12 کانولوشن) CONV ( محاسبهضربنقطهایبینوزنهاینورونویکناحیهکوچکازحجمورودی اصالحخطی) RELU ( اعمالیکتابعبهعناصر عدمتغییرابعادحجم POOL کاهشنمونهبرداریدرامتدادابعادمکانی)عرضوارتفاع( 1x1x10 محاسبهامتیازهرکالس FC 3

4 عملکرد شبکه عصبی کانولوشنال کار مراحل 4 سمت چپ تصویر خام و در سمت راست توسط یک تابع امتیاز )Score function( امتیاز تعلق به دسته های مختلف ارائه می شود

5 توسعه های CNN نمونهتوسعههاروی CNN AlexNet ZF Net VGG Net GoogLeNet Microsoft ResNet R-CNN Fast R-CNN Faster R-CNN 3D-CNN GAN... 5

6 AlexNet )2012( شروعشبکههایعصبی CNN ) 1998 (راشروعمیدانند. برخی LeNet ImageNet Classification with Deep Convolutional عنوانمقاله: Networks Alex Krizhevsky, Ilya Sutskever, and Geoffrey نویسندگانمقاله: Hinton * تعدادارجاعات:بیشاز large, deep convolutional neural network was used to win the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)- the annual Olympics of computer vision for tasks such as classification, localization, detection *.دشاب یم درآذرماه 96 Google کلیهارجاعاتدراینارائهبرمبنایآمار Scholar 6

7 AlexNet Architecture a relatively simple layout, compared to modern architectures used for classification with 1000 possible categories very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (not a single CONV layer always immediately followed by a POOL layer) 7

8 AlexNet Architecture eight layers first five are convolutional the remaining three are fully-connected last fully-connected layer is produces 1000 class labels 8

9 AlexNet Architecture The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer The kernels of the third convolutional layer are connected to all kernel maps in the second layer Max-pooling layers follow the first, second and fifth convolutional layers The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer 9

10 AlexNet Architecture The first convolutional layer filters the input image with 96 kernels of size with a stride of 4 pixels. The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size The fifth convolutional layer has 256 kernels of size The fully-connected layers have 4096 neurons each 10

11 AlexNet Main Points Trained the network on ImageNet data, which contained over 15 million annotated images from a total of over 22,000 categories. Used ReLU for the nonlinearity functions faster than the conventional tanh function. Used data augmentation techniques that consisted of image translations, horizontal reflections, and patch extractions. Implemented dropout* layers in order to combat the problem of overfitting to the training data. Trained the model using batch stochastic gradient descent, with specific values for momentum and weight decay. Trained on two GTX 580 GPUs for five to six days. * Dropout: setting to zero the output of each hidden neuron with probability 0.5. The neurons which are dropped out in this way do not contribute to the forward pass and do not participate in back-propagation 11

12 AlexNet Why It s Important The first time a model performed so well on a historically difficult ImageNet dataset Utilizing techniques that are still used today, such as data augmentation and dropout It illustrated the benefits of CNNs It breaking performance in the competition 12

13 ZF Net برنده 2013 ILSVRC عنوانمقاله: Visualizing and Understanding Convolutional Neural Networks نویسندگانمقاله: Fergus Matthew Zeiler and Rob تعدادارجاعات:بیشاز 2750 This architecture was more of a fine tuning to the previous AlexNet structure, but still developed some very keys ideas about improving performance the authors spent a good amount of time explaining a lot of the intuition behind ConvNets and showing how to visualize the filters and weights correctly. The main contributions: a slightly modified AlexNet model a very interesting way of visualizing feature maps. 13

14 ZF Net architecture 14

15 ZF Net Main Points Very similar architecture to AlexNet, except for a few minor modifications. AlexNet trained on 15 million images, while ZF Net trained on only 1.3 million images. Instead of using 11x11 sized filters in the first layer (same as AlexNet), ZF Net used filters of size 7x7 and a decreased stride value to keep a lot of original pixel information in the input volume. As the network grows, we also see a rise in the number of filters used. Used ReLUs for their activation functions, cross-entropy loss for the error function, and trained using batch stochastic gradient descent. Trained on a GTX 580 GPU for twelve days. Developed a visualization technique named Deconvolutional Network (DeConvNet), which helps to examine different feature activations and their relation to the input space. Called deconvnet because it maps features to pixels. 15

16 ZF Net DeConvNet At every layer of the trained CNN, you attach a deconvnet which has a path back to the image pixels. forward pass: An input image is fed into the CNN and activations are computed at each level This deconvnet has the same filters as the original CNN Steps: store the activations of this one feature map, but set all of the other activations in the layer to 0 pass this feature map as the input into the deconvnet This input then goes through a series of unpool (reverse maxpooling), rectify, and filter operations for each preceding layer until the input space is reached 16

17 ZF Net DeConvNet 17

18 ZF Net DeConvNet The first layer of your ConvNet is always a low level feature detector that will detect simple edges or colors in this particular case. We can see that with the second layer, we have more circular features that are being detected. Let s look at layers 3, 4, and 5. 18

19 ZF Net Why It s Important ZF Net was not only the winner of the competition in 2013, but also provided great intuition as to the workings on CNNs and illustrated more ways to improve performance. The visualization approach described helps not only to explain the inner workings of CNNs, but also provides insight for improvements to network architectures. The fascinating deconv visualization approach and occlusion experiments make this one of my personal favorite papers. 19

20 VGG Net سالانتشار: 2014 عنوانمقاله: Very Deep Convolutional Networks for Large-Scale Image Recognition نویسندگان: Zisserman Karen Simonyan and Andrew Visual Geometry Group تعدادارجاعات:بیشاز 7650 Simplicity and depth. a 19 layer CNN strictly used 3x3 filters with stride and pad of 1 along with 2x2 maxpooling layers with stride 2 20

21 VGG Net Architecture 21

22 VGG Net Main Points The use of only 3x3 sized filters is quite different from AlexNet s and ZF Net s filters. (combination of two 3x3 conv layers has an effective receptive field of 5x5) 3 conv layers back to back have an effective receptive field of 7x7. As the spatial size of the input volumes at each layer decrease (result of the conv and pool layers), the depth of the volumes increase due to the increased number of filters as you go down the network. The number of filters doubles after each maxpool layer. This reinforces the idea of shrinking spatial dimensions, but growing depth. 22

23 VGG Net Main Points Worked well on both image classification and localization tasks. Built model with the Caffe toolbox. Used scale jittering* as one data augmentation technique during training. Used ReLU layers after each conv layer and trained with batch gradient descent. Trained on 4 Nvidia Titan Black GPUs for two to three weeks. * a single model is trained to recognize objects over a wide range of scales 23

24 VGG Net Why It s Important VGG Net is one of the most influential papers because it reinforced the notion that convolutional neural networks have to have a deep network of layers in order for this hierarchical representation of visual data to work. Keep it deep. Keep it simple. 24

25 GoogLeNet برنده 2014 ILSVRC عنوانمقاله: Convolutions Going Deeper with نویسندگان Sermanet, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich سالانتشارمقاله 2015 تعدادارجاعات:بیشاز 5400 ویژگیاصلی عمقزیادشبکه معرفی module Inception 25

26 GoogLeNet انگیزهتحقیق راحتترینراهبهبودنتایجشبکههایعمیقافزایشاندازهوعمقآنهاست.اماآیا اینبهترینراهاست.تسا مهمترینمشکلاینراهحلافزایشچندبرابری)بامرتبهچندجملهای(پیچیدگیمحاسباتی یکراهدیگربرآوردتوزیعاحتماالتیدادگانتوسطشبکهعمیقخلوتوساخت توپولوژیبهینهشبکهباتحلیلالیهبهالیههمبستگیآمارههایالیهقبلاست. neurons that fire together, wire برمبنایقانونهب: together 26

27 GoogLeNet معماری Inception ایدهاصلیمعماری Inception بررسیآناستکهچگونهمیتوانیکساختارخلوتبهینهاز شبکههایدیداریکانولوشنالیبرآوردکرد. 27

28 GoogLeNet معماری Inception ایدهدومآناستکهکاهشبعدحتماضروریاستهرچندبارمحاسباتیداشتهباشد. ادغاموتولیدماتریسهایبابعدکمهمحاویاطالعاتبسیارزیادیازتکهتصویراست. 28

29 GoogLeNet افزودنکانولوشن 1*1 برایکاهشبعد مثالاگرحجمورودی باشد با 20 فیلتر 1 1 حجمخروجیخواهدشد

30 GoogLeNet یکشبکه Inception یکشبکهازماژولهاییازانواعباالستکهپشتسرهم انباشتهشدهوگاهگاهیالیههایادغامبیشینهباگام 2 براینصفکردنگرید استفادهشدهاست. ISSVRC خاصاستکهبرایمسابقات نامیکمعماری Inception GoogLeNet 2014 انتخابشدهاست. نویسندگاننسخهعمیقتروبا Inception بدستآوردهاست. پهنترهمارائهشدهکهنتایجبهتری براساسمشاهداتمقادیردقیقپارامترهایمعماریتاثیرچندانینداشتهاست....: 30

31 رایج ترین نمونه Inception 31

32 32

33 GoogLeNet Architecture A 22 layer CNN Sequential or Parallel Inception module 33

34 GoogLeNet Architecture 34

35 GoogLeNet Main Points Used 9 Inception modules in the whole architecture, with over 100 layers in total! No use of fully connected layers! They use an average pool instead, to go from a 7x7x1024 volume to a 1x1x1024 volume. This saves a huge number of parameters. Uses 12x fewer parameters than AlexNet. There are updated versions to the Inception module (Versions 6 and 7). Trained on GPU within a week. 35

36 Microsoft ResNet عنوانمقاله: Recognition Deep Residual Learning for Image نویسندگان: Sun Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian سالانتشار: 2015 تعدادارجاع:بیشاز 5100 ویژگیاصلی بسیارعمیقترازآنچهتاکنوندیدهاید معرفی block residual 36

37 Microsoft ResNet سوال: آیایادگیریبهترشبکهبهسادگیباافزودنپشتسرهمالیههابدستمیآید باافزودنعمقشبکهمشکل degradation پیشمیآید اینخطابعلتبیشبرازشنیستبلکهباافزودنالیههابهشبکهایکهخوبکارمیکند خطایآموزشافزایشمییابد. راهحل: الیههایاضافهشدهنگاشتهمانی) identity (هستند. سایرالیهها کپیمدلکوتاهترآموزشدیدهاند. وجوداینراهحلنشانمیدهدکهیکمدلعمیقترنشاندهندهآناستیکمدلعمیق ترنبایدخطاییبیشترازمشابهکوتاهترداشتهباشد. 37

38 Microsoft ResNet Residual Block راهحلاینمقالهمعرفیقابکارییادگیریباقیماندهعمیق) learning )deep residual است. بجایآنکهانتظارداشتهباشیمچندالیهپشتسرهمیکتابعرابرازشکنند باقیماندهرا برازشکنند. تابعموردنظر( H(x الیههایپشسرهمنگاشتزیررابرازشمیکنند F(x):=H(x)-x درنتیجهنگاشتاصلیخواهدشد F(x)+x بهینهسازینگاشتباقیماندهراحتتراستچون اگرنگاشتهمانیبهینهشود کافیاستباقیماندهراصفردرنظرگرفت. 38

39 معماری 39

40 Microsoft ResNet Main Points Ultra-deep, up to 152 layers. Authors claim that a naïve increase of layers in plain nets result in higher training and test error Trained on an 8 GPU machine for two to three weeks. 40

41 41

Biologically Inspired Computation

Biologically Inspired Computation Deep Learning & Convolutional Neural Networks Joe Marino biologically inspired computation biological intelligence flexible capable of detecting/ executing/reasoning about