Gated Recurrent Convolution Neural Network for OCR

Size: px

Start display at page:

Download "Gated Recurrent Convolution Neural Network for OCR"

James Rodgers
5 years ago
Views:

1 Gated Recurrent Convolution Neural Network for OCR Jianfeng Wang amd Xiaolin Hu Presented by Boyoung Kim February 2, 2018 Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

2 Optical Charactor Recognition(OCR) OCR : Natural image 의글자인식 예 ) Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

Overall Architecture 3 단계로이루어짐 1) Feature Sequence Extraction - CNN 대신 (gated recurrent CNN) 사용 2) Sequence Modeling 3) Transcription [1] Baoguang Shi, et al.

3 Overall Architecture 3 단계로이루어짐 1) Feature Sequence Extraction - CNN 대신 (gated recurrent CNN) 사용 2) Sequence Modeling 3) Transcription [1] Baoguang Shi, et al. An End-to-End Trainable Neural Network for Imagebased Sequence Recognition and Its Application to Scene Text Recognition. IEEE, Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

4 1) Figure Sequence Extraction CNN : 연속적인 feature 특성추출 여러개 feature map의 i번째열을 concatenation 한것이 i번째 feature vector가됨. 각각의 feature vector는원래이미지의 Local 영역의특성만을반영 Context modulation( 맥락파악 ) 이중요. 신피질 (noecortex) 에존재하는되풀이되는 (recurrent) 신경시냅스가시각적인식과정에중요한역할을함. Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

5 1) Figure Sequence Extraction Recurrent Convolutional layer(rcl) in RCNN : 같은 Covolutional layer 에 recurrent connection 사용. x(t) = F(w f u(t) + w r x(t 1)), where F : nonlinearity of RNN, : convolution, u(t), x(t 1) : feed-forward, reccurent input, w f, w r : feed-forward, recurrent weights. - t가클수록각각의 feature vector가더큰 receptive field를갖는다. 관련성이적은 context로부터의 signal을약하게조절하고자함. [2] M. Liang, et al. Recurrent convolutional neural network for object recognition. CVPR, Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

6 1) Figure Sequence Extraction Gated Recurrent Convolutional layer(grcl) in : Gate 도입 Gate of GRCL : { 0 t = 0 G(t) = sigmoid(bn(wg f u(t)) + BN(wg r x(t 1))) t > 0 GRCL : { x(t) = ReLU(BN(w f u(t))) t = 0 ReLU(BN(w f u(t)) + BN(BN(w r x(t 1)) G(t))) t > 0 whire denotes element-wise multiplication. Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

7 1) Figure Sequence Extraction T = 2 일때 GRCL - 같은색의 Convolution 커널은 같은 weight 사용. Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

8 2) Sequence Modeling BLSTM : t 시점의출력값이이전, 이후시점의입력값에영향을받음. Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

9 3) Transcription sequence 마다의예측을실제라벨로변환하는단계. S = (I, l) : dataset, 여기서 I: 학습이미지, l : 해당이미지의실제라벨 objective function : O = (I,l) S logp(l I) Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

10 3) Transcription Connectionist Temporal Classification(CTC) Input : probability sequence y = {y 1,, y T } Output : l, hello (label sequence) All label sets : L = L {blank} -,a z,0 9 Possible sequence : π = hh ell ll oo B maps π to l by removing blacks and repeated labels: π = hh ell ll oo B l = hello B(π) = l [3] A. Graves and F. Gomez. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML, Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

11 3) Transcription Probabitily of a sequence p(l y) = p(π y), T p(π y) = π:b(π)=l t=1 where y t π t is the prob. of having label π t at time stemp t. y t π t Lexicon-free transcription l = B(argmax p(π y)) π Lexicon-based transcription (fixed length lexicon D) l = argmax p(l y). l D Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 11

12 Isabuau Premont-Schwarz, et al. Presented by Boyoung Kim February 2, 2018 Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

13 Semi-supervised Learning Semi-supervised Learning : Labeled data + Unlabeled data labeled data 는부족하고, unlabeled data 가풍부한경우가대부분. Unsupervised learning(auto-encoder) observation 을재구성할수있는표현을학습하려고함 (detail 을버릴수없음 ) Supervised learning 상충됨 higher level 의표현을학습하기위해이전층의 low level 표현들을버릴 수있음. Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

14 Ladder Networks Autoencoder 모형에서 Shortcut connection 을추가함으로써 detail 을버릴 수있게함. Ladder Network : Neural network with encoder (bottom-up pass) decoder (top-down pass) lateral connections bottom task: denoising of input top task: e.g. classification Objective function : O = logp(ŷ = y x) + [1] Rasmus et al. Semi-supervised learning with Ladder networks. NIPS, L λ l ze l zd l 2 Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8 l=0

15 Recurrent Ladder Network (RLadder) : 시계열모형화에사용. 예 ) Occluded Moving MNIST : 5 번동안부분적으로관측되는숫자맞추기 Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

16 Encoder cell in the l th layer (possibly LSTM or GRU) s l (t) = f s,l (e l 1 (t), d l (t 1), s l (t 1)) : state value e l (t) = f e,l (e l 1 (t), d l (t 1), s l (t 1)) : output of encoder Decoder cell in the l th layer (inspired by original Ladder[1]) d l (t) = g l (e l (t), d l+1 (t)) : output of decoder Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

17 Experiment : Occluded Moving MNIST Top-level task (Primary) : 5 번동안부분적으로관측되는글자 Classification Low-level task (auxilary) : 다음프레임예측 Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

18 Experiment : Occluded Moving MNIST Fully-supervised learning result Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

19 Experiment : Occluded Moving MNIST Semi-supervised learning result Boyoung Kim (SNU) RNN-NIPS2017 February 2, / 8

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1 Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent