Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Similar documents
Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Neural Network Part 4: Recurrent Neural Networks

Coursework 2. MLP Lecture 7 Convolutional Networks 1

Robustness (cont.); End-to-end systems

Generating an appropriate sound for a video using WaveNet.

Deep learning architectures for music audio classification: a personal (re)view

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Training neural network acoustic models on (multichannel) waveforms

CSC321 Lecture 11: Convolutional Networks

Neural Network Acoustic Models for the DARPA RATS Program

Radio Deep Learning Efforts Showcase Presentation

Audio Effects Emulation with Neural Networks

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments

Music Recommendation using Recurrent Neural Networks

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Learning the Speech Front-end With Raw Waveform CLDNNs

Audio Effects Emulation with Neural Networks

arxiv: v2 [cs.sd] 22 May 2017

Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition

Acoustic modelling from the signal domain using CNNs

Convolutional neural networks

arxiv: v1 [cs.ne] 5 Feb 2014

Audio Augmentation for Speech Recognition

Deep Learning. Dr. Johan Hagelbäck.

AUDIO TAGGING WITH CONNECTIONIST TEMPORAL CLASSIFICATION MODEL USING SEQUENTIAL LABELLED DATA

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1

Automatic Speech Recognition (CS753)

CS 7643: Deep Learning

Introduction to Machine Learning

Convolutional Neural Networks for Small-footprint Keyword Spotting

ONE of the important modules in reliable recovery of

I D I A P. Hierarchical and Parallel Processing of Modulation Spectrum for ASR applications Fabio Valente a and Hynek Hermansky a

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

arxiv: v2 [cs.cl] 20 Feb 2018

Lecture 23 Deep Learning: Segmentation

CSC 578 Neural Networks and Deep Learning

Neural Networks The New Moore s Law

TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION. Vikramjit Mitra, Horacio Franco

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices

Reverse Correlation for analyzing MLP Posterior Features in ASR

Artificial Intelligence and Deep Learning

Using RASTA in task independent TANDEM feature extraction

Decoding EEG Waves for Visual Attention to Faces and Scenes

MINE 432 Industrial Automation and Robotics

Deep Neural Network Architectures for Modulation Classification

Convolutional Networks Overview

BEAMNET: END-TO-END TRAINING OF A BEAMFORMER-SUPPORTED MULTI-CHANNEL ASR SYSTEM

Google Speech Processing from Mobile to Farfield

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

(Towards) next generation acoustic models for speech recognition. Erik McDermott Google Inc.

Application of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers

Multiple-Layer Networks. and. Backpropagation Algorithms

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

An Introduction to Artificial Intelligence, Machine Learning, and Neural networks. Carola F. Berger

Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation

Collection of re-transmitted data and impulse responses and remote ASR and speaker verification. Igor Szoke, Lada Mosner (et al.

arxiv: v1 [cs.ce] 9 Jan 2018

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Statistical Tests: More Complicated Discriminants

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

WaveNet Vocoder and its Applications in Voice Conversion

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 8, NOVEMBER

Ch. 3: Image Compression Multimedia Systems

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

DNN AND CNN WITH WEIGHTED AND MULTI-TASK LOSS FUNCTIONS FOR AUDIO EVENT DETECTION

Decoding Brainwave Data using Regression

Continuous time and Discrete time Signals and Systems

A Hybrid Architecture using Cross Correlation and Recurrent Neural Networks for Acoustic Tracking in Robots

Using Deep Learning for Sentiment Analysis and Opinion Mining

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Mel Spectrum Analysis of Speech Recognition using Single Microphone

A Comparison of MLP, RNN and ESN in Determining Harmonic Contributions from Nonlinear Loads

Formant Estimation and Tracking using Deep Learning

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

GESTURE RECOGNITION WITH 3D CNNS

Progress in the BBN Keyword Search System for the DARPA RATS Program

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

arxiv: v1 [cs.sd] 7 Jun 2017

arxiv: v1 [cs.sd] 29 Jun 2017

Lecture 11-1 CNN introduction. Sung Kim

GUJARAT TECHNOLOGICAL UNIVERSITY

Frequency Estimation from Waveforms using Multi-Layered Neural Networks

Transcription:

Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent Neural Networks 1: Modelling sequential data Steve Renals Machine Learning Practical MLP Lecture 9 15 November 2017 / 20 November 2017 MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 2

Sequential Data We often wish to model data that is a sequence or trajectory through time, for instance audio signals, text (sequences of characters/words), currency exchange rates, motion of animal Modelling sequential data Invariances across time The current state depends on the past Need to share data across time Convolutional networks model invariances across space can we do something similar across time? Yes - time-delay neural networks Can we use units to act as memories? Yes - recurrent networks MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 3

Recap: Space invariance 6x4x4 Pooling Layers 6x8x8Feature Maps 3x12x12 Pooling Layers Input 28x28 3x24x24 Feature Maps Local connectivity Weight sharing MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 4

Modelling sequences...... t=3 Imagine modelling a time sequence of 3D vectors t=2 t=1 t=0 x1 x2 x3 MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 5

Modelling sequences input hidden output Imagine modelling a time sequence of 3D vectors Can model fixed context with a feed-forward network with previous time input vectors added to the network input x1 x2 x3 x1 x2 x3 x1 x2 x3 t-2 t-1 2 frames of context t MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 5

Modelling sequences t-t...... t-3 t-2 T frames of context t-1 t output fully-connected layer 1D conv layer x1 x2 x3 input Imagine modelling a time sequence of 3D vectors Can model fixed context with a feed-forward network with previous time input vectors added to the network input Model using 1-dimension convolutions in time - time-delay neural network (TDNN) MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 5

Modelling sequences t-t...... t-3 t-2 T frames of context t-1 t output fully-connected layer 1D conv layer x1 x2 x3 input Imagine modelling a time sequence of 3D vectors Can model fixed context with a feed-forward network with previous time input vectors added to the network input Model using 1-dimension convolutions in time - time-delay neural network (TDNN) Network takes into account a finite context MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 5

speeds up the training by 5x in the baseline TDNN architecture shown in action in Figure TDNNs 1. t-7 t+2 t-10 t-4 t-1 t+5-1 +2-1 +2-7 +2-3 +3-3 +3 t -1 +2-1 +2 Layer 4 Layer 3 Layer 2 the parameters, just as in the cross-entropy t Our previous smbr-based training reci on the ASpIRE setup, so we introduced a recipe which we have since found to be use in other LVCSR tasks. In the smbr objective function, as for rors are not penalized. This can lead to larg tion errors when decoding with smbr traine TDNN Correcting operating this asymmetry on 23in the smbr ob frames penalizing ofinsertions, context was shown to improv mance of smbr models by 10% relative. Without sub-sampling training [22, 24], the frame error is alway (blue+red) reference is silence, which means that ins vocalized-noise is not penalized), but inserti With regionssub-sampling are not penalized. (red) In other words t-11 t-8 t-5 t-2 t+1 t+4 t+7 reference alignment is silence are treated sp -2 +2 in our implementation several phones, inc Layer 1 calized noise and non-spoken noise, are tre t-13 t+9 these purposes.) In our modified smbr tr treat silence as any other phone, except tha Peddinti et al, Reverberation robust acoustic modeling using i-vectors phones withare time collapsed delay neural into a single class for th Figure 1: Computation in TDNN with sub-sampling (red) and putation. This means that replacing one sile networks, without sub-sampling Interspeech-2015, (blue+red) http://www.danielpovey.com/files/2015_interspeech_aspire.pdf other silence phone is not penalized (e.g. rep The hyper-parameters which define the MLPsub-sampled Lecture 9 Recurrent TDNNNeural Networks 1: Modelling sequential data 6

Wavenet van den Oord et al (2016), WaveNet: A Generative Model for Raw Audio, https://arxiv.org/abs/1609.03499 MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 7

Networks with state Feed-forward = finite context: feed-forward networks (even fancy ones like Wavenet) compute the output based on a finite input history. Sometimes the required context is known, but often it is not State units: we would like a network with state across time if an event happens, we can potentially know about that event many time steps in the future State units as memory remember things for (potentially) an infinite time State units as information compression compress a sequence into a state representation Recurrent networks with state units h delay x MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 8

Recurrent networks output recurrent hidden input x1 x2 x3 t MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 9

Graphical model of a recurrent network y h delay x MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 10

Graphical model of a recurrent network y y t 1 y t y t+1 h delay h t 1 h t h t+1 x x t 1 x t x t+1 Unfold a recurrent network in time MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 10

Simple recurrent network y k (t) = softmax d h j (t) = sigmoid s=0 ( H r=0 w (1) js x s(t) + w (2) kr hr (t) + b k H r=0 ) jr h r (t 1) +b j } {{ } Recurrent part Output (t) Hidden (t) Input (t) Hidden (t-1) MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 11

Recurrent network unfolded in time Output (t-1) Output (t) Output (t+1) w (2) w (2) w (2) Hidden (t-1) Hidden (t) Hidden (t+1) w (1) w (1) w (1) Input (t-1) Input (t) Input (t+1) View an RNN for a sequence of T inputs as a T -layer network with shared weights MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 12

Recurrent network unfolded in time Output (t-1) Output (t) Output (t+1) w (2) w (2) w (2) Hidden (t-1) Hidden (t) Hidden (t+1) w (1) w (1) w (1) Input (t-1) Input (t) Input (t+1) View an RNN for a sequence of T inputs as a T -layer network with shared weights MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 12

Recurrent network unfolded in time Output (t-1) Output (t) Output (t+1) w (2) w (2) w (2) Hidden (t-1) Hidden (t) Hidden (t+1) w (1) w (1) w (1) Input (t-1) Input (t) Input (t+1) View an RNN for a sequence of T inputs as a T -layer network with shared weights Train an RNN by doing backprop through this unfolded network Weight sharing if two weights are constrained to be equal (w 1 = w 2 ) then they will stay equal if the weight changes are equal ( E/ w 1 = E/ w 2 ) achieve this by updating with ( E/ w 1 + E/ w 2 ) (cf Conv Nets) MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 12

Bidirectional RNN Output (t-1) Output (t) Output (t+1) RHid (t-1) RHid (t) RHid (t+1) FHid (t-1) FHid (t) FHid (t+1) Input (t-1) Input (t) Input (t+1) Output a prediction that depends on the whole input sequence Bidirectional RNN combine an RNN moving forward in time, with one moving backwards in time State units provide a combined representation that depends on both the past and the future MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 13

Back-propagation through time (BPTT) We can train a network by unfolding and back-propagating through time, summing the derivatives for each weight as we go through the sequence More efficiently, run as a recurrent network cache the unit outputs at each timestep cache the output errors at each timestep then backprop from the final timestep to zero, computing the derivatives at each step compute the weight updates by summing the derivatives across time Expensive backprop for a 1,000 item sequence equivalent to a 1,000-layer feed-forward network Truncated BPTT backprop through just a few time steps (e.g. 20) MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 14

Example 1: speech recognition with recurrent networks Phoneme Probabilities freq (Hz) 8000 6000 4000 Recurrent Neural Network Speech Acoustics T Robinson et al (1996). The use of recurrent networks in continuous speech recognition, in Automatic Speech and Speaker Recognition Advanced Topics (Lee et al (eds)), Kluwer, 233 258. http://www.cstr.ed.ac.uk/ downloads/publications/1996/ rnn4csr96.pdf 2000 0 0 200 400 600 800 1000 1200 1400 time (ms) MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 15

Example 2: recurrent network language models T Mikolov et al (2010). Recurrent Neural Network Based Language Model, Interspeech http://www.fit.vutbr.cz/research/ groups/speech/publi/2010/mikolov_ interspeech2010_is100722.pdf MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 16

Summary Model sequences using finite context using feed-forward networks with convolutions in time (TDNNs, Wavenet) Model sequences using infinite context using recurrent neural networks (RNNs) Unfolding an RNN gives a deep feed-forward network with shared weights Train using back-propagation through time Back-propagation through time (Historical) examples on speech recognition and language modelling Reading: Goodfellow et al, chapter 10 (sections 10.1, 10.2, 10.3) http://www.deeplearningbook.org/contents/rnn.html Next lecture: LSTM, sequence-sequence models MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 17