yclic Annealing Training (AT) NNs for Image lassification with Noisy Labels JiaWei Li, Tao Dai, QingTao Tang, YeLi Xing, Shu-Tao Xia Tsinghua University li-jw15@mailstsinghuaeducn October 8, 2018 AT Training NNs for Image lassification with Noisy Labels 1 / 15
Noisy Labels Noisy Labels Problem Noise Modeling with EM Speedup the training in M-cycle yclic Annealing Training Aggregate M-cycle NNs at test time Bagging NNs Algorithm Description AT on Noisy Labels Experiments Performance on MNIST Robustness on IFAR AT Training NNs for Image lassification with Noisy Labels 2 / 15
Noisy Labels Problem: I Labeling image dataset is a cubersome work and easily induce noise It has a large impact on learning Figure 1: left-middle1 -right might be labeled as dog, seal, and seal 1 opyright: http://wwwdianliwenmicom/postimg_3364775_6html AT Training NNs for Image lassification with Noisy Labels 3 / 15
Noise patterns: I Image x has a noisy label z, its true label y is unknown x y x z y z Figure 2: Two different noise patterns I Left: noisy label z only depends on true label y I Right: z depends on both of true label y and feature x AT Training NNs for Image lassification with Noisy Labels 4 / 15
Noise Modeling: Feature x NN h(x) Pattern Θ Parameter W True Label y=h(x) Noise Model Noisy Label z Figure 3: A typical label noise modeling procedure Learning with EM: I E-step: fix W and update the noise modeling parameter θ I M-step: use z, y=h(x, w), and θ to train W AT Training NNs for Image lassification with Noisy Labels 5 / 15 S
yclic Annealing Training (AT): I It abruptly raises the learning rate α and then quickly decreases it with a cosine function: α(t) = πmod(t 1, dt /e) α0 (cos( ) + 1) 2 dt /e I Align every annealing learning rate cycle to every M-step I Then use the obtained local minimal NN models to update the following E-step I Almost -times faster than original EM approaches AT Training NNs for Image lassification with Noisy Labels 6 / 15
AT vs standard training schedule: 085 Training Accuracy 080 075 070 Standard Learning Rate Schedual yclic Annealing Training (AT) 065 060 0 50 100 150 Epochs 200 250 Figure 4: Training DenseNet-40 on IFAR-10 with different schedule AT Training NNs for Image lassification with Noisy Labels 7 / 15
Aggregate M-cycle NNs at test time: Figure 5: Using AT for Snapshot Ensemble1 I Once the training finished, collect all local minimal NNs P I The aggregating output will be: h AVG (x) = 1 c=1 h c (x) 1 ILR 2017 Gao Huang, et al Snapshot ensembles: Train 1, get m for free AT Training NNs for Image lassification with Noisy Labels 8 / 15
The log likelihood of model parameters are: L(W, θ) = n X t=1 k X log( p(zt yt = i; θ)p(yt = i xt ; W )) i=1 Algorithm 1: AT on Noisy Labels AT Training NNs for Image lassification with Noisy Labels 9 / 15
Noise Setting on MNIST: I We use the label flipping operation on MNIST dataset Figure 6: Label flipping with noise pattern [7,9,0,4,2,1,3,5,6,8] AT Training NNs for Image lassification with Noisy Labels 10 / 15
Performance on MNIST: AT True labels 10 Simple NAL 10 10 8 8 08 6 6 06 4 4 2 2 0 0 2 4 6 Noisy Labels 8 10 0 0 04 02 2 4 6 Noisy Labels 8 10 00 Figure 7: The acquired transfer probability θ of AT and Simple NAL I 46% noisy labels with noise pattern [7,9,0,4,2,1,3,5,6,8] I The simple NAL has a 9968% classification accuracy and AT achieves 9977% AT Training NNs for Image lassification with Noisy Labels 11 / 15
Noise Setting on IFAR-100: I z depends on both of true label y and feature x Figure 8: Randomly selected images from the noisy-label IFAR AT Training NNs for Image lassification with Noisy Labels 12 / 15
Robustness on IFAR-100: IFAR-100 with random noise labels 0400 0375 Test Accuracy 0350 Baseline NN Hard Bootstrap EM Simple NAL omplex NAL AT without Bagging AT 0325 0300 0275 0250 0225 0300 0325 0350 0375 0400 0425 0450 0475 0500 Noise fraction Figure 9: ompare the robustness of noise modeling methods AT Training NNs for Image lassification with Noisy Labels 13 / 15
Selected Reference: 1 TNNLS 2014 lassification in the Presence of Label Noise: a Survey 2 ILR 2015 Training convolutional networks with noisy labels 3 ILR 2015 Training deep neural networks on noisy labels with bootstrapping 4 IASSP 2016 Training deep neural-networks based on unreliable labels 5 ILR 2017 Snapshot ensembles: Train 1, get m for free 6 ILR 2017 Training DNNs Using a Noise Adaptation Layer Some New Progress: 1 JMLR 2018 A theory of learning with corrupted labels 2 IML 2018 Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels 3 IML 2018 Dimensionality-Driven Learning with Noisy Labels 4 VPR 2018 Iterative Learning with Open-set Noisy Labels 5 ILR 2019 submission Pumpout: A Meta Approach for Robustly Training Deep Neural Networks with Noisy Labels AT Training NNs for Image lassification with Noisy Labels 14 / 15
Thanks for listening! AT Training NNs for Image lassification with Noisy Labels 15 / 15