Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Similar documents
Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Neural Network Part 4: Recurrent Neural Networks

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Robustness (cont.); End-to-end systems

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1

Generating an appropriate sound for a video using WaveNet.

Coursework 2. MLP Lecture 7 Convolutional Networks 1

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

arxiv: v1 [cs.ne] 5 Feb 2014

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Automatic Speech Recognition (CS753)

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Music Recommendation using Recurrent Neural Networks

A simple RNN-plus-highway network for statistical

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

Deep Neural Network Architectures for Modulation Classification

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

MINE 432 Industrial Automation and Robotics

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR

Deep learning architectures for music audio classification: a personal (re)view

Neural Turing Machines

Lecture 2. Digital Basics

Using Deep Learning for Sentiment Analysis and Opinion Mining

Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK

Machine Learning in Indoor Positioning and Channel Prediction Systems. Yizhou Zhu B.Eng., Zhejiang University, 2010

Training neural network acoustic models on (multichannel) waveforms

Gated Recurrent Convolution Neural Network for OCR

Investigating Very Deep Highway Networks for Parametric Speech Synthesis

arxiv: v2 [cs.cl] 20 Feb 2018

Introduction to Machine Learning

Artificial Intelligence and Deep Learning

Multiple-Layer Networks. and. Backpropagation Algorithms

Convolutional neural networks

Deep Learning. Dr. Johan Hagelbäck.

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Digital Integrated CircuitDesign

Revision of Channel Coding

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments

A Comparison of MLP, RNN and ESN in Determining Harmonic Contributions from Nonlinear Loads

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

Unsupervised Minimax: nets that fight each other

CSC321 Lecture 11: Convolutional Networks

Continuous time and Discrete time Signals and Systems


Neural Networks The New Moore s Law

Digital Communication System

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

Signal Characteristics

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

Speech Coding in the Frequency Domain

Reverse Correlation for analyzing MLP Posterior Features in ASR

VQ Source Models: Perceptual & Phase Issues

Radio Deep Learning Efforts Showcase Presentation

Image Manipulation Detection using Convolutional Neural Network

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

A Technique for Pulse RADAR Detection Using RRBF Neural Network

Constant False Alarm Rate Detection of Radar Signals with Artificial Neural Networks

Learning the Speech Front-end With Raw Waveform CLDNNs

Real-time Traffic Data Prediction with Basic Safety Messages using Kalman-Filter based Noise Reduction Model and Long Short-term Memory Neural Network

Convolutional Neural Networks for Small-footprint Keyword Spotting

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Introduction to Machine Learning

Digital Image Processing COSC 6380/4393

EE228 Applications of Course Concepts. DePiero

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

1 Introduction. w k x k (1.1)

Speech Signal Analysis

Lecture 23 Deep Learning: Segmentation

Machine Learning for Hardware Design. Elyse Rosenbaum University of Illinois at Urbana- Champaign Oct. 18, 2017

FPGA-based Low-power Speech Recognition with Recurrent Neural Networks

Digital Communication System

Supplementary information accompanying the manuscript Biologically Inspired Modular Neural Control for a Leg-Wheel Hybrid Robot

Eur Ing Dr. Lei Zhang Faculty of Engineering and Applied Science University of Regina Canada

Mobile Communications TCS 455

arxiv: v2 [cs.sd] 22 May 2017

Application of Generalised Regression Neural Networks in Lossless Data Compression

Lesson 7. Digital Signal Processors

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Prediction by a Hybrid of Wavelet Transform and Long-Short-Term-Memory Neural Network

Convolutional Networks Overview

Accelerating Stochastic Random Projection Neural Networks

2 TD-MoM ANALYSIS OF SYMMETRIC WIRE DIPOLE

A.I. and Translation. iflytek Research : Gao Jianqing

Introduction (concepts and definitions)

HOW DO DEEP CONVOLUTIONAL NEURAL NETWORKS

Introduction to Source Coding

Fixed- Weight Controller for Multiple Systems

A Hybrid Architecture using Cross Correlation and Recurrent Neural Networks for Acoustic Tracking in Robots

Experiments with Noise Reduction Neural Networks for Robust Speech Recognition

Transcription:

Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1

Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent Networks 2

Introduction - Recurrent Neural Networks (RNNs) Modelling sequential data Recurrent hidden unit connections Training RNNs: Back-propagation through time LSTMs Examples (speech and language) MLP Lecture 9 Recurrent Networks 3

Sequential Data output Modelling sequential data with time dependences between feature vectors hidden input x1 x2 x3 x1 x2 x3 x1 x2 x3 t-2 t-1 t 2 frames of context MLP Lecture 9 Recurrent Networks 4

Sequential Data input output hidden x1 x2 x3 x1 x2 x3 t-2 t-1 2 frames of context x1 x2 x3 t Modelling sequential data with time dependences between feature vectors Can model fixed context with a feed-forward network with previous time input vectors added to the network input Finite context determined by window width MLP Lecture 9 Recurrent Networks 4

Sequential Data recurrent hidden output input x1 x2 x3 t Modelling sequential data with time dependences between feature vectors Can model fixed context with a feed-forward network with previous time input vectors added to the network input Finite context determined by window width Model sequential inputs using recurrent connections to learn a time-dependent state Potentially infinite context MLP Lecture 9 Recurrent Networks 4

Recurrent networks If there was no external input... think of recurrent networks in terms of the dynamics of the recurrent hidden state Settle to a fixed point stable representation Regular oscillation ( limit cycle ) learn some kind of repetition Chaotic dynamics (non-repetitive) theoretically interesting ( computation at the edge of chaos ) Useful behaviours of recurrent networks with external inputs: Recurrent state as memory remember things for (potentially) an infinite time Recurrent state as information compression compress a sequence into a state representation MLP Lecture 9 Recurrent Networks 5

Vanilla RNNs MLP Lecture 9 Recurrent Networks 6

Simplest recurrent network y k (t) = softmax ( H r=0 d h j (t) = sigmoid s=0 w (2) kr h r (t) b k w (1) js x s (t) H r=0 ) jr h r (t 1) b j w (R) } {{ } Recurrent part Output (t) w (2) Hidden (t) w (1) w (R) Input (t) Hidden (t-1) MLP Lecture 9 Recurrent Networks 7

Recurrent network unfolded in time Output (t-1) Output (t) Output (t1) w (2) w (2) w (2) w (R) Hidden (t-1) w (R) Hidden (t) w (R) Hidden (t1) w (R) w (1) w (1) w (1) Input (t-1) Input (t) Input (t1) An RNN for a sequence of T inputs can be viewed as a deep T -layer network with shared weights MLP Lecture 9 Recurrent Networks 8

Recurrent network unfolded in time Output (t-1) Output (t) Output (t1) w (2) w (2) w (2) w (R) Hidden (t-1) w (R) Hidden (t) w (R) Hidden (t1) w (R) w (1) w (1) w (1) Input (t-1) Input (t) Input (t1) An RNN for a sequence of T inputs can be viewed as a deep T -layer network with shared weights MLP Lecture 9 Recurrent Networks 8

Recurrent network unfolded in time Output (t-1) Output (t) Output (t1) w (2) w (2) w (2) w (R) Hidden (t-1) w (R) Hidden (t) w (R) Hidden (t1) w (R) w (1) w (1) w (1) Input (t-1) Input (t) Input (t1) An RNN for a sequence of T inputs can be viewed as a deep T -layer network with shared weights We can train an RNN by doing backprop through this unfolded network, making sure we share the weights Weight sharing if two weights are constrained to be equal (w 1 = w 2 ) then they will stay equal if the weight changes are equal ( E/ w 1 = E/ w 2 ) achieve this by updating with ( E/ w 1 E/ w 2 ) (cf Conv Nets) MLP Lecture 9 Recurrent Networks 8

Back-propagation through time (BPTT) We can train a network by unfolding and back-propagating through time, summing the derivatives for each weight as we go through the sequence More efficiently, run as a recurrent network cache the unit outputs at each timestep cache the output errors at each timestep then backprop from the final timestep to zero, computing the derivatives at each step compute the weight updates by summing the derivatives across time Expensive backprop for a 1,000 item sequence equivalent to a 1,000-layer feed-forward network Truncated BPTT backprop through just a few time steps (e.g. 20) MLP Lecture 9 Recurrent Networks 9

Vanishing and exploding gradients BPTT involves taking the product of many gradients (as in a very deep network) this can lead to vanishing (component gradients less than 1) or exploding (greater than 1) gradients This can prevent effective training Modified optimisation algorithms RMSProp (and similar algorithms) normalise the gradient for each weight by average of it magnitude, with a learning rate for each weight Hessian-free an approximation to second-order approaches which use curvature information Modified hidden unit transfer functions Long short term memory (LSTM) Linear self-recurrence for each hidden unit (long-term memory) Gates - dynamic weights which are a function of their inputs Gated recurrent units MLP Lecture 9 Recurrent Networks 10

LSTM MLP Lecture 9 Recurrent Networks 11

Vanilla RNN h(t) g(t) Whh Whx x(t) g(t) = W hx x(t) W hh h(t 1) b h h(t) = tanh (g(t)) MLP Lecture 9 Recurrent Networks 12

LSTM Internal recurrent state ( cell ) c(t) combines previous state c(t 1) and LSTM input g(t) MLP Lecture 9 Recurrent Networks 13

LSTM h(t) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 14

LSTM Internal recurrent state h(t) c(t-1) c(t) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 14

LSTM Internal recurrent state ( cell ) c(t) combines previous state c(t 1) and LSTM input g(t) Gates - weights dependent on the current input and the previous state Input gate: controls how much input to the unit g(t) is written to the internal state c(t) Forget gate: controls how much of the previous internal state c(t 1) is written to the internal state c(t) Input and forget gates together allow the network to control what information is stored and overwritten at each step MLP Lecture 9 Recurrent Networks 15

LSTM h(t) c(t-1) c(t) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 16

LSTM Input Gate h(t) c(t-1) c(t) I(t; x(t), ) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 16

LSTM Forget Gate h(t) c(t-1) F(t; x(t), ) c(t) I(t; x(t), ) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 16

LSTM Input and Forget Gates h(t) c(t-1) F(t; x(t), ) c(t) I(t; x(t), ) g(t) Whh Whx I(t) = σ (W ix x(t) W ih h(t 1) b i ) F(t) = σ (W fx x(t) W fh h(t 1) b f ) σ is the sigmoid function x(t) g(t) = W hx x(t) W hh h(t 1) b h c(t) = F(t) c(t 1) I(t) g(t) is element-wise vector multiply MLP Lecture 9 Recurrent Networks 17

LSTM Internal recurrent state ( cell ) c(t) combines previous state c(t 1) and LSTM input g(t) Gates - weights dependent on the current input and the previous state Input gate: controls how much input to the unit g(t) is written to the internal state c(t) Forget gate: controls how much of the previous internal state c(t 1) is written to the internal state c(t) Input and forget gates together allow the network to control what information is stored and overwritten at each step Output gate: controls how much of each unit s activation is output by the hidden state it allows the LSTM cell to kepp information that is not relevant at the current time, but may be relevant later MLP Lecture 9 Recurrent Networks 18

LSTM Input and Forget Gates h(t) c(t-1) F(t; x(t), ) c(t) I(t; x(t), ) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 19

LSTM Output Gate h(t) c(t-1) F(t; x(t), ) O(t; x(t), ) c(t) I(t; x(t), ) g(t) W hh W hx x(t) MLP Lecture 9 Recurrent Networks 19

LSTM Output Gate h(t) c(t-1) F(t; x(t), ) O(t; x(t), ) c(t) I(t; x(t), ) g(t) Whh Whx O(t) = σ (W oxx(t) W oh h(t 1) b o) x(t) h(t) = tanh (O(t) c(t)) MLP Lecture 9 Recurrent Networks 20

LSTM h(t) c(t-1) F(t; x(t), ) O(t; x(t), ) c(t) I(t; x(t), ) g(t) Whh Whx x(t) I(t) = σ (W ix x(t) W ih h(t 1) b i ) F(t) = σ (W fx x(t) W fh ht 1) b f ) O(t) = σ (W oxx(t) W oh h(t 1) b o) g(t) = W hx x(t) W hh h(t 1) b h c(t) = F(t) c(t 1) I(t) g(t) h(t) = tanh (O(t) c(t)) MLP Lecture 9 Recurrent Networks 21

Example applications using RNNs MLP Lecture 9 Recurrent Networks 22

Example 1: speech recognition with recurrent networks Phoneme Probabilities Recurrent Neural Network freq (Hz) 8000 6000 4000 Speech Acoustics 2000 0 0 200 400 600 800 1000 1200 1400 time (ms) T Robinson et al (1996). The use of recurrent networks in continuous speech recognition, in Automatic Speech and Speaker Recognition Advanced Topics (Lee et al (eds)), Kluwer, 233 258. MLP Lecture 9 Recurrent Networks 23

Example 2: speech recognition with stacked LSTMs input input input input LSTM LSTM LSTM LSTM output (a) LSTM LSTM output recurrent output (c) LSTMP recurrent LSTM (b) DLSTM recurrent output (d) DLSTMP H Sak et al (2014). Long Short-Term Memory based Recurrent Neural Network Architectures for Large Scale Acoustic Modelling, Interspeech. MLP Lecture 9 Recurrent Networks 24

Example 3: recurrent network language models T Mikolov et al (2010). Recurrent Neural Network Based Language Model, Interspeech MLP Lecture 9 Recurrent Networks 25

Example 4: recurrent encoder-decoder Machine translation I Sutskever et al (2014). Sequence to Sequence Learning with Neural Networks, NIPS. K Cho et al (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP. MLP Lecture 9 Recurrent Networks 26

Summary RNNs can model sequences Unfolding an RNN gives a deep feed-forward network Back-propagation through time LSTM More on recurrent networks next semester in NLU (and 1-2 lectures in ASR and MT) MLP Lecture 9 Recurrent Networks 27