Generating an appropriate sound for a video using WaveNet.

Size: px
Start display at page:

Download "Generating an appropriate sound for a video using WaveNet."

Transcription

1 Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki 1. Supervisor 2. Supervisor Dr. Christian Walder Dr.Benjamin Swift

2 October 27, 2017 ii

3 Contents 1 Introduction Motivation Overview Methods Artificial Neural Networks Overview The Artificial Neuron Feedforward Neural Network Multi Layer Perceptron Each Layer s Function in a Classification Task Training Neural Networks Convolutional Neural Networks Overview of CNN Convolutional Layer Pooling Layer VGG Architecture of VGG model Performance iii

4 Contents Discussion Generalisation of VGG model Recurrent Neural Networks RNN s Model Long Short Term Memory WaveNet Overview of WaveNet Model Implementation and Experiment Results Implementation for the Local Conditioning Simple Local Conditioning Upsampled Local Conditioning Testing the Model Fast and Slow Generation Converting a Video Frame to a Vector Using VGG Pre-trained Model Training the WaveNet with Videos Experiment 1: Train the model only with wave sound files Experiment 2: Training the Model with Sounds and Videos, and Generating Sounds without Local Condition Experiment 3: Training the Model with Sounds and Videos, and Generating Sound with Local condition Discussion Why doesn t local conditioning work for the video, while it works for the toy problem? Future Works 46 iv

5 Contents 6 Conclusion 50 Bibliography 51 v

6 Acknowledgments Firstly I would like to thank my supervisors, Christian Walder and Benjamin Swift for their great support and accurate advice. I could not have done this project without them. I would also like to thank Florin Schimbinschi for his help starting off this project, specifically helping me understand WaveNet model, and our insightful discussions. When I came up with this project s idea I did not expect to get this much support. So, I cannot express my gratitude more. vi

7 List of Figures 2.1 A depicted (a) biological neuron and (b) an artificial neuron Plotted activation functions: (a) sigmoid, (b) tanh and (c) ReLU An illustrated multi layer perceptron An example of simple convolution Max pooling layer with 2x2 filters and stride Configuration of each CNN they tested in their research A list of results of the evaluations for each CNN Performance comparison with other research groups CNNs An unrolled recurrent neural network An unfolded LSTM cell Forget gate Input gate Output gate Overview of the WaveNet model Visualization of a stack of causal convolutional layers Visualization of a stack of dilated causal convolutional layers WaveNet overview with local condition Visualisation of the simple local conditioning for the second hidden layer. 30 vii

8 List of Figures 3.2 Visualisation of the upsampled local conditioning for the second hidden layer Example of the training data Expected result for the test Upsampled model 1000 iterations Simple model 1000 iterations Example of the training data Error value through the training for 1000 epochs without local condition Error value through the training for 300 epochs without local condition Model overview of WaveNet with LSTM Visualisation of the second idea of improving the local conditioning with LSTM Model overview of the future work idea viii

9 1 Introduction 1.1 Motivation What does snow sound like? - It was from this simple question in which this project was born. If an artificial intelligence can learn the sounds of videos and predict an appropriate sound of a video, it may be able to answer to that question. In this project, my goal was to develop a system that generates an appropriate sound for a video. Perhaps most people have not thought about the practice of generating sounds for videos, and therefore may think there is little value in this project of creating appropriate sounds, thus I would like to further elaborate my motivation to start such a project. Firstly, as stated in the abstract, this is a practice used in the film industry. When a film is dubbed into other languages, the film makers have to record or create appropriate sounds for each scene [10]. It is sometimes because the actor s voice could not be separated from the original sound data. Therefore, the environmental sounds, such as footsteps, street noise, etc. have to be recreated for the video. Also, for animation or science fiction films, they need to create appropriate sounds for scenes that are created with computer graphic technologies, because they only generate the visual effects and not the sound effects. 1

10 1 Introduction Additionally, this project would be able to propose a new way of composing or performing music in digital art fields. For example, it is difficult to put music to an abstract video. However, this project may be able to generate some music or sound for such videos. 1.2 Overview Among the research about raw audio generation, the research topic of Text-to-Speech has been gaining increased amounts of attention. As the name implies, the Text-to- Speech task is to convert input text form of speech data into sound form of the speech data. By combining both voice recognition and Text-to-Speech technology, a new interface which does not require physical interaction between the user and the system can be implemented. As the system understands and communicates with the user through human language, this new interface requires only conversations for the operation. WaveNet[6] achieved state-of-the-art performances on Text-to-Speech tasks in both English and Mandarin Chinese. WaveNet is a generative deep learning model for raw audio waveform generation. Their approach is to predict new audio samples based on a long range of previous audio samples using dilated causal convolution layers. As audio data is normally stored as an array of float values, and the sample rate is usually more than 16,000 Hz, to capture only one second of audio waveform, the model has to deal with 16,000 values. Moreover, the audio data holds sequential temporal data as well as the audio signal of each time step. Hence, the model needs to be able to consider a long range of previous audio samples while keeping the temporal causality of the waveform to capture the patterns of it. To treat sequential data in deep learning techniques, recurrent neural networks or causal convolutions are used. However, for raw audio generation, as mentioned above, the input sequence is very long. Because of this unusual length of the input data, these techniques cannot learn the pattern of 2

11 1 Introduction the waveform efficiently. Therefore, they used the dilated causal convolutional layers to cover the long input array. In the hidden layers of dilated causal convolutions, the convolution is conducted with skipping a certain number of inputs, so that the model can cover long input array more efficiently without losing the temporal causality of the original input waveform. Additionally, this WaveNet receives local condition as another input data. With this local condition, the WaveNet model predicts new audio samples based on not only the previous audio samples but also other external information. For example, for the Text-to-Speech tasks, the phonemes of the input text s words are fed into the WaveNet as local conditions, so that the model can change the audio samples to generate as the user intends. For this project, video frames are given to the model as local conditions. As we utter different words based on the words or their phoneme, any environmental sounds occur based on phenomena or physical interactions between objects. So, there are correlations between the videos about these actions and the corresponding sounds. Thus, the goal of this project is to develop a system that recognises the differences between the video frames and to generate an appropriate sound for each scene. To feed video frames into the WaveNet model, they need to be converted in a form that the WaveNet can treat it easily and effectively. Additionally, the image data holds spatial information, so it is crucial for the model to consider the spatial information to recognise each video frame. Accordingly, the input video frames are converted to vectors through a pre-trained convolutional neural network. For image recognition tasks, the spatial information of the input images is extracted through the CNN and converted into a vector so that fully connected layers following the CNN layers can treat it adequately to conduct the classification. Therefore, the trained CNNs can be used to extract spatial information of images and also convert them to vectors. The Visual Geometry Group achieved state-of-the-art performances on image recognition tasks using a CNN. They released the CNN model which was trained on an image classification 3

12 1 Introduction benchmark task publicly to facilitate computer vision related research. It is also proved that this pre-trained model can be used for different types of image recognition tasks, so it is used for many related research. 4

13 2 Methods In this section, I would like to introduce the basics of the existing models and tools that I used in this project. Before I go into the details of my project s model, as I mainly used some different types of deep neural networks, I will start off with explaining the most standard architecture of neural networks. 2.1 Artificial Neural Networks Overview Artificial neural network(ann) is a biologically inspired network of artificial neurons which are designed to perform certain tasks such as classification, regression, etc. The basic idea of ANN is to imitate the architecture of a human brain and simulate its operations on the computer The Artificial Neuron The artificial neuron is the smallest unit that constitutes the ANN. As the name implies, its function is almost identical to the biological neuron. A biological neuron cell 5

14 2 Methods Figure 2.1: A depicted (a) biological neuron and (b) an artificial neuron. receives signals from other neurons through dendrites and then sum these signals up. If the summation of the signals exceed the threshold, this neuron cell becomes activated. Activated neuron cells transmit a signal to other connected neurons through synapses. The artificial neurons work almost identically. An artificial neuron receives input signals from connected neurons. Each input signal has an associated weight value. The signals are multiplied by the weight and passed to connected neurons. As the biological neurons sum up the received signals, the artificial neuron also sums up all the transmitted signals from connected neurons. The summation is passed to an arbitrary activation function which decides whether to activate this neuron or not. If this neuron is activated by the received signals, it transmits a signal to other connected neurons through the axon. The mathematical form of these processes is expressed as below. M y(x, w) = f( w i x i ) i=0 where f( ) is an activation function, x i is an input from previous layer s i th neuron, W is an associated weight for the input. 6

15 2 Methods Also, there are many types of activation functions. I will introduce three popular activation functions. Figure 2.2: Plotted activation functions: (a) sigmoid, (b) tanh and (c) ReLU. Sigmoid The input real value is squashed to a value between 0 and 1. Backpropagation algorithm which is used to update the learnable weight values in the neural networks computes the gradients of cost function with respect to each weight value and updates it based on the gradient. However, sigmoid function s gradient vanishes as the input value increases or decreases too much. Consequently, the weights cannot be updated efficiently. Tanh The input real value is squashed to a value between -1 and 1. This activation function also has the same problem as sigmoid. Rectified Linear Unit (ReLU) 7

16 2 Methods ReLU activation function ignores all the negative inputs and returns the positive inputs as they are. As the gradient of this function does not vanish or explode, it can update the weight values more efficiently and help the model to converge faster Feedforward Neural Network The feedforward neural network is the simplest type of neural network. Feedforward neural networks are comprised of layers of artificial neurons. Each layer s artificial neurons are connected to adjacent layers artificial neurons. An artificial neuron receives the signal from previous layer s artificial neurons. If the neuron is activated, it transmits the signal to the next layer s artificial neurons Multi Layer Perceptron Multi Layer Perceptron is comprised of an input layer, one or more hidden layers, and an output layer. Although single layer perceptron can only solve linearly separable classification, multi layer perceptron can solve nonlinear problems [3]. The basic architecture of a multi layer perceptron is shown below. Input layer The left layer of the figure is the input layer. In this figure, there are two input nodes and one bias node. All input nodes are connected to all the hidden nodes except for bias node of the hidden node. Each connection has an associated weight value. The value of each input node is multiplied by the associated weight and passed onto the hidden nodes. Hidden layer Hidden nodes receive the product of the input node s value and the associated weight from all connected nodes. Each node sums these values up and passes the summation to the activation function. The output of each 8

17 2 Methods Figure 2.3: An illustrated multi layer perceptron. hidden node is multiplied by its associated weight value and passed onto next layer s nodes. Output layer The output layer s nodes receive values from all the connected previous hidden layer s nodes. They also pass the summation of these received values to its activation function Each Layer s Function in a Classification Task The number of input nodes is same as the number of parameters from which you predict the correct class. To respond to many different inputs, the neurons in the hidden layers activate other nodes when a certain input is transmitted. The number of the output layer s nodes indicates the number of classes in the classification task. Each output node corresponds to a certain class. Therefore, in a classification task, the label for each dataset is expressed in a form of one-hot vector. For example, if there are three classes, the label for class one is expressed as [1, 0, 0], class two is [0, 1, 0] and likewise for class three. Hence, in a classification task, the outcomes of each output node represents the probability that the data point is classified to the class. To convert the outcome to the 9

18 2 Methods probability, softmax function is usually used. σ(x) i = e x i Kk=1 e x k softmax function to convert outcome of i th node to the probability. Where K is the number of classes Training Neural Networks There are two phases for neural networks. One is the prediction phase which forwards propagation from input to output to get the results from the input values. The other one is the training phase. During the training phase, the error between the correct answer and the prediction is calculated. The objective of the training phase is to minimise this error to optimise the network s trainable weight values. The backpropagation algorithm is used to train the neural networks. It calculates the gradient of the error and updates the weight to reduce the error according to the gradient. There are some error functions of which I will introduce two common ones briefly. Loss Functions The Mean Squared Error (MSE) function MSE = 1 n (Ŷi Y i ) 2 n i=1 The MSE function computes the average of the square of the difference between a prediction and the label of the data point. The Cross Entropy loss function 10

19 2 Methods Loss(X, Y ) = 1 n Y i ln(x i ) + (1 Y i ) ln(1 X i ) n i=1 Where X is the prediction and Y is the corresponding label. The average of all cross entropies is computed. The learnable weight values are updated based on the computed gradients of the loss function with respect to them. There are some algorithms for optimisation such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), and the weights are updated differently based on the optimisation algorithms. Adam is one of the most popular optimisation algorithms and it is used for a variety of deep learning tasks. It considers exponentially the decaying average of past gradients and squared gradients to help SGD to converge to a better value and faster [8]. 2.2 Convolutional Neural Networks Convolutional Neural Network (CNN) is a type of deep neural network that has achieved outstanding performances on image classification tasks. In image classification tasks, CNN is used to extract features from an input image. Extracted features are expressed as a vector and passed to a feedforward neural network to conduct the classification. CNNs can be trained to extract features to help the feedforward neural network to classify the input image more accurately. Therefore, the trained CNN is often used to extract features and convert the input image into a descriptive vector of the image. Visual Geometry Group (VGG) from University of Oxford published a paper that investigated the effect of the depth of the CNN on the accuracy in large-scale image classification tasks [9]. In this research, they demonstrated state-of-the-art performances on image classification benchmark tasks. They released the CNN model which is trained on one of the benchmark tasks. In this project, the input video s frames are converted into vectors 11

20 2 Methods that describe their features. Thus, VGG s pre-trained model was used to convert each frame of the video to a descriptive vector Overview of CNN I will explain the overview of the CNN and the VGG s CNN architecture and their pretrained model. A basic CNN is comprised of a number of convolutional layers, pooling layers and fully connected layers. Convolutional layers have an arbitrary number of convolutional filters. These filters are slid over the input image and convolve the filter and each area of the image. Pooling layers are applied to reduce the special size of the outcomes of the convolutional layers. The purpose of applying these layers is to reduce the number of parameters and computation cost, and also to avoid overfitting the model [2]. The outcomes of the convolutional layers and pooling layers are passed to a fully connected layer to conduct the prediction based on them Convolutional Layer The convolutional layer is the core part of the CNN. A set of trainable filters constitute this layer. Since the input data (image) has three dimensions: width, height and depth (color channels), these filters also have a 3-dimensional shape. However, the shape and the number of dimensions depend on the input data s dimensionality and the designer of the CNN. Their spatial size has to be smaller than the input image. In the forward propagation, each filter is slid across the entire input image. The dot products between the filter and corresponding regions of the input image are computed. This figure shows how the convolution is performed. The yellow region of the left side matrix is the filter and the red values are the weights of the filter. It computes the 12

21 2 Methods Figure 2.4: An example of simple convolution. dot product between the filter s weights and the image s overlapping values. The dot products are stored to the same dimension space as the right side matrix of the figure. This matrix is the input data for the next convolutional layer. Although only 2D convolution is demonstrated in this example, the filter and also the input data can have third dimension (depth). In this case, the filter is applied to the corresponding depth of the image and calculates the dot product. Hence, the output of the convolution will be transformed into 2D space from 3D space Pooling Layer Pooling layers take an important role in CNNs. One of the objectives of using the CNN is to reduce the dimension of the input data to make it more tractable. For example, an image which size is 300x300x3 (width x height x depth) has 270,000 parameters. It can easily lead to an overfitting. Therefore, pooling layers are used to reduce the spatial size of the input volume. There are several algorithms for pooling layers: max pooling, average pooling, L2-norm pooling, etc. However, it has been demonstrated that max 13

22 2 Methods pooling performs better than other algorithms [2]. Figure 2.5: Max pooling layer with 2x2 filters and stride 2. This figure shows how max pooling reduces the spatial size of the input 2D data. In each different coloured region of the matrix, only the highest value is picked and stored, as the value represents the region. Consequently, if a max pooling layer size of 2x2 is used, the input image s size will be decreased to half width and half height image VGG As I mentioned above, Visual Geometry Group (VGG) is a research group of the University of Oxford. They investigated the effect of the depth of the CNN on the accuracy in the large-scale image classification task. They constructed CNNs with different depth from 11 to 19. They used only 3x3 convolution filters to evaluate the effect of the depth of the CNNs consistently. As a result, they not only achieved exceptional performances on the ILSVRC classification and localisation tasks, but also demonstrated applicability of the model to other image recognition datasets. To facilitate further computer vision related research they released their pre-trained model publicly. 14

23 2 Methods Architecture of VGG model In their work, they adopted convolutional layers comprised of 3x3 convolution filters with a certain number of channels. Apart from these layers, they used max pooling layers, 1x1 convolutional layers and three fully connected layers. Max pooling is performed by a 2x2 filter with 2 strides. The 1x1 convolutional layer is utilised to increase nonlinearity without any effects on the receptive field of the convolutional layers. The last layer of the stack of convolutional layers and pooling layers is connected to three fully connected layers. The first two fully connected layers have 4096 hidden nodes and the last layer has 1000 hidden nodes. The last layer is followed by a softmax layer to compute the probability for each class. Figure 2.6: Configuration of each CNN they tested in their research. 15

24 2 Methods Figure 2.7: A list of results of the evaluations for each CNN Performance This table shows the results of the evaluations for each configuration. They proved that as the CNN gets deeper, the performance is improved. Additionally, they demonstrated that they can improve the model by changing the training images size randomly (in this work from 256 to 512). The third row for configuration C, D and E shows the performance of the changing training images size Discussion Other CNNs that achieved outstanding accuracy on the image classification competition task consist of convolutional filters with larger receptive field with more strides, especially in the first convolutional layer. The effect of adopting small 3x3 filters throughout the entire CNN is discussed. Firstly, in terms of the size of receptive field, a set of three 3x3 filters is equivalent to a single 7x7 filter. Ergo, three non-linear rectification layers can be obtained by using 3x3 filter, while only one layer is obtained by a single 7x7 filter. As 3x3 filter produces more outcomes, it is more informative for the decision functions than the single outcome from the 7x7 filter. Secondly, the number of the parameters is decreased by adopting smaller filters. If the input volume has C channels, the number of parameters for a single 7x7 filter is calculated as 7 7 C C = 49C 2. On the other 16

25 2 Methods hand, the number of parameters of a set of three 3x3 filters is 3 (3 3 C C) = 27C 2. Hence, using a set of three 3x3 filters is more discriminative for the decision function and also is able to decrease the number of parameters. This implies they can be trained more efficiently with less computation cost [2] Generalisation of VGG model They utilised their pre-trained model on ILSVRC dataset for image classification on other datasets. Specifically, they compared their model s performance on 4 different benchmarks: VOC-2007, VOC-2012, Caltech-101, and Caltech-256 to other state-ofthe-arts. Figure 2.8: Performance comparison with other research groups CNNs. This figure shows the comparison with the other CNNs. VGG s 19-layer CNN achieved highest scores on VOC-2007, VOC-2012, and Caltech-256. In addition, it shows that the combination of 16-layer and 19-layer CNN improves its accuracy. Since the pre-trained model was released by them, it has been used by a wide range of image recognition research. 2.3 Recurrent Neural Networks As we understand sequential data such as texts, audios, videos, etc. by recognising each component of the sequence step by step from the start to the end. Sequential 17

26 2 Methods data contains the information not only in its contents but also in the ordering of the sequence. Therefore, for tasks which treat sequential data, the temporal dependencies of the input data have to be considered to achieve good performances. Recurrent neural networks (RNN) have achieved great performances in a various types of problems, including language modeling, translation, speech recognition, etc. Specifically, Long Short Term Memory (LSTM) which is one of the architectures of RNN has achieved outstanding results in these tasks. As I treat sequential data: audios and videos, for this project, I will explore RNNs in the following chapter RNN s Model Figure 2.9: An unrolled recurrent neural network. A cell receives an input x t and outputs o t, and also this cell feeds the information to itself recurrently as the left figure shows. As the cell receives each step s data one by one along its ordering and generates an output for corresponding step, it can be unfolded as the right side of the figure shows. Additionally, each time step s cell stores its state s t and it can be interpreted as a memory of the network at each time step. Therefore, the state is computed as follows s t = f(ux i + W s t 1 ) where f( ) is an arbitrary activation function, U is a learnable weight for input and W is also a learnable weight for the previous step s hidden state. The output is computed 18

27 2 Methods as below o t = softmax(v s t ) where V is a learnable weight for the current hidden step. Although the temporal features of the input sequence are captured by this basic RNN model, it is not flexible enough to solve complicated tasks. For instance, as this model is summing up the previous steps states simply, the information of the far previous node is lost gradually. Hence, this simple RNN model can deal with short memories, but cannot keep longer memories. Also, for tasks which require both short and long memory, this model cannot handle the data flexibly. To overcome this problem, different architecture of RNN called Long Short Term Memory (LSTM) was developed Long Short Term Memory Long Short Term Memory (LSTM) is an advanced type of RNN which can learn both long and short term dependencies in sequential data. LSTM is capable of selecting features to remember and also forget, and this regulation system consists of three different gates: forget gate, input gate and output gate [4]. Each gate takes different role. In this section, I will introduce those gates one by one. Overview of the model In a LSTM cell, there are two parallel flows of information. One of the flows (the flow above in Figure 2.10) carries the previous step s state information through the cell. Meanwhile, the other flow (the flow below in Figure 2.10) also carries the previous step s information. However, it receives the external output X t and selects what parameters 19

28 2 Methods Figure 2.10: An unfolded LSTM cell. to forget, input and output. As a result, the lower flow interacts with the upper flow to regulate its information. This process is repeated through the LSTM network. Forget gate Figure 2.11: Forget gate. Forget gate selects the values to forget from previous step s state information. As I mentioned above, this state acts as the memory for the network. Therefore, this gate makes the network forget specific values. In this gate, the input h t1 received from previous step and external input x t are multiplied by different weight matrix and summed up element wisely. The outcome is passed to the sigmoid function and all the values are mapped to values between 0 and 1. These values are multiplied by the upper flow, and values of the previous state are decreased depending on the corresponding value of the 20

29 2 Methods outcomes of the sigmoid function. For instance, if the outcome of the sigmoid function is 0, the parameter will be forgotten completely. Input gate Figure 2.12: Input gate. Input gate computes two values. Firstly, it decides which values of the input to use to update the current state. As the forget gate does, input gate also maps the values between 0 to 1 using the sigmoid function. Hence, it can select parameters to update by multiplying with input values. Secondly, it creates candidate values that may be added to the current state to update it. Therefore, tanh activation function is applied for this value. Values to be added to the current state are selected by the outcomes of the sigmoid function. Output gate Output gate selects which values of the current state to produce as an output of this time step. Both input from the previous step and external input are multiplied by each weight matrix and passed to the sigmoid function to get a selective vector. This vector 21

30 2 Methods Figure 2.13: Output gate. is multiplied with the updated current state so that only selected values by this gate are output. 2.4 WaveNet Overview of WaveNet WaveNet is a deep neural network for generating raw audio waveforms developed by Google DeepMind. This model predicts each audio sample by computing the predictive distribution based on the certain length of previous samples, global condition and local condition. It has achieved state-of-the-art performances on text-to-speech tasks in both English and Mandarin [6]. 22

31 2 Methods Figure 2.14: Overview of the WaveNet model Model Overview Predictive distribution for each audio sample x t is computed conditioned on the previous timesteps. Therefore, the joint probability of a waveform x = x i x 1,..., x t 1 is expressed as a product of conditional probabilities as follows: T p(x) = p(x t x 1,..., x t 1 ) t=1 The WaveNet consists of three parts: causal convolutional layer, a stack of dilated causal convolutional layers and fully connected layers. Firstly, the input raw audio samples are transformed into one-hot vectors with a certain number of categories. 2x1 causal convolution is applied to these transformed vectors. (These new arrays have the same depth as the dilated causal convolutional layer s output.) This sequence of vectors is 23

32 2 Methods passed to the first layer of dilation layers. Secondly, dilated causal convolution is applied to the passed sequence. In each layer, the dilated causal convolution produces two identical sequences of convoluted vectors. The tanh function is applied to one of them, and the sigmoid function is applied to the other one. An element-wise multiplication is applied to these vectors. 1x1 1D convolution is applied to the outcomes of this elementwise multiplication. The result is stored as a skip output of this layer. This is also added to the input sequence of vectors for this layer and passed to the next layer as a residual output. Repeat this process with different dilation value. Finally, the skip outputs are summed up and passed to two fully connected layers and the softmax layer to compute the predictive distribution for each audio sample. Causal Convolutions Figure 2.15: Visualization of a stack of causal convolutional layers. To keep the temporal dependencies of the audio samples, causal convolution is adopted in signal processing tasks. This is implemented with a 1D convolution with filters whose width is 2. The same operation is applied to the output of each layer to implement the stack of causal convolutional layers. RNN is also one of neural networks that can process sequential data. As there is no recurrent connections in causal convolutions, it can learn 24

33 2 Methods time series data faster than RNN. However, many layers are required to increase the receptive field. Long receptive field is a crucial factor to capture the patterns of audio waveforms. Hence, in the WaveNet, a stack of dilated causal convolutional layers is used in the main part. Dilated Causal Convolutions Figure 2.16: Visualization of a stack of dilated causal convolutional layers. A dilated causal convolution is comprised of causal convolutions with a certain number of skips. As the figure shows, 1D convolution is computed for each layer for the output of the previous layer. However, dilation - 1 nodes are skipped to select pairs of nodes to be convolved. Therefore, the size of the receptive field can be dramatically increased with fewer layers compared to a stack of ordinary causal convolution layers. Softmax Distributions Typically, audio data is stored with 16 bits audio bit depth. Therefore, each sample has 65,536 possible values. To make it more tractable, each sample of the input raw audio 25

34 2 Methods is quantised to 256 possible values by applying mu-law companding transformation [5]. f(x t ) = sign(x t ) ln(1 + µ x t ) ln(1 + µ) Where 1 < x t < 1 and µ = 255. This quantisation produces a significantly better reconstruction than a simple linear quantisation scheme, especially for speech data [6] Gated Activation Units In the stack of dilated causal convolutional layers, gated activation units are applied to the outcomes of the dilated causal convolutions. These units consist of two activation functions: sigmoid and tanh. The outcomes of the dilated causal convolutions are multiplied by learnable weight matrix and passed through the activation functions separately. The outcomes of these two activation functions are element-wisely multiplied. This is identical to the input gate of the LSTM cell. Therefore, it can be interpreted that these gated activation units work as an input gate. In short, they select which parameters of the outcome to feed to the next layer of the stacked dilated causal convolutional layers as a residual output. Residual and Skip connections The outcomes of the gated activation units are passed to skip connections and also to the next dilated causal convolutional layer. As the dilated causal convolutional layer gets deeper, longer range of the audio samples are considered. In each layer, the important parameters are selected through the gated activation units, then fed to the next layer and get convolved with the longer range of the audio samples. Also, each layer outputs the outcomes of the gated activation units. The outcomes of all layers of the stacked dilated 26

35 2 Methods causal convolutional layers are summed up and fed to the following fully connected layers. Conditional WaveNet Figure 2.17: WaveNet overview with local condition WaveNet can also compute the conditional distribution p(x h) of next sample of the waveform. T p(x) = p(x t x 1,..., x t 1, h) t=1 By feeding condition to the dilated causal layer, the WaveNet can compute the predictive distribution based on the previous step and the given condition. For instance, for textto-speech tasks, phonemes of each word in the given sentence is fed into the layer so that the WaveNet can generate different waveform for different words. In their work, it is reported that the WaveNet generates human language-like sound without this condition, 27

36 2 Methods while it generates recognisable human language sound. The model can accept two types of conditions: global condition and local condition. The global condition does not change through time, but the local condition does. Therefore, in text-to-speech tasks, the global condition is used to tell the model the type of speaker. On the other hand, the local condition is passed to guide the model what phoneme to generate for that time step. 28

37 3 Implementation and Experiment Results 3.1 Implementation for the Local Conditioning There is an open source code for WaveNet without local conditioning. So, I implemented the local conditioning for that code. In this project, each video frame s vector was passed as a local condition to predict corresponding audio samples to the frame. For example, if the sample rate of the sound file is 16,000 samples per second and frame rate of the video is 25 frames per second, each video frame has 16,000 / 25 = 640 samples. Therefore, the WaveNet receives video frame vectors and predicts 640 audio samples based on the receptive field long previous audio samples and the corresponding video frame vector. One way of implementing the local conditioning is to pass only one video frame vector to the WaveNet (I call it simple local conditioning) to predict corresponding 640 audio samples. The other way is to upsample the video frame vectors to adjust its frame rate to the audio data s sample rate and pass this sequence of video frame vectors (I call it upsampled local condition). 29

38 3 Implementation and Experiment Results Simple Local Conditioning Figure 3.1: Visualisation of the simple local conditioning for the second hidden layer. For the simple local conditioning implementation, only the new local condition which corresponds to the newest audio sample is added to the dilated causal convolutions. For instance, in this figure, even if the local condition changes every 4 audio samples, only the newest local condition is added. During the training phase, audio samples and a video frame vector are passed. The length of the audio samples is the receptive field s size plus an arbitrary number of target samples. As only one local condition can be passed to the model, the target audio samples must be corresponding to the video frame. Additionally, in the dilated causal convolutional layers, this local condition is added to the outcome of each layer. This is should be noted that all the audio samples given to the model are convolved with different dilation and added to the local condition. The size of the audio samples is receptive field plus the number of samples for a video frame. Concretely, the number of samples for a video frame is 640 and the receptive field s size is usually more than 5,000 which is equivalent to about 8 video frames. Therefore, originally, those samples in the receptive field have different local condition. However, in this simple implementation for local conditioning, the given video frame vector is added to the convolved samples even though they are not corresponding to it. 30

39 3 Implementation and Experiment Results Upsampled Local Conditioning Figure 3.2: Visualisation of the upsampled local conditioning for the second hidden layer. On the other hand, in the upsampled implementation for local conditioning, the local conditions are upsampled before they are passed to the model. Therefore, the size of the audio samples and the local conditions are the same. Hence, in the dilated causal convolutional layer, each convolved sample is added to the corresponding local condition. To do this, the size of the local conditions is adjusted as the dilation convolution layer goes deeper because the size of the convolution is changed according to each layer s dilation. 3.2 Testing the Model Before I started working on the training and audio synthesis for a video, both the simple and upsampled models were tested their performances with a toy problem. This toy problem is to learn and generate specific frequency sine waves based on the local conditions. The training data consists of three different notes. One of these notes lasts 31

40 3 Implementation and Experiment Results for an arbitrary duration and changes to another note dynamically. It is repeated several times in one dataset. The ID of each frequency is given to the model as a local condition. Therefore, the model learns the patterns of each different note s waveform based on the audio samples, and also it learns the relationships between the audio samples and the given local condition. For the training, 30 training data sets were prepared and there are 3 local conditions (notes): D# (155 Hz), G (196 Hz) and A# (233 Hz). Figure 3.3 shows one of the training datasets. Figure 3.3: Example of the training data. The objective for the generation part is to generate different frequency waveforms by changing the local condition through the time. In the generation phase, 900 audio samples are generated. Every 300 samples the local condition varies from 1 to 3 incrementally. Therefore, the expected result changes its frequency as the local condition 32

41 3 Implementation and Experiment Results varies. (a) (b) (c) (d) Figure 3.4: Expected result for the test. Figure 3.5 shows the results of the test for the upsampled model. The power spectrum of the generated waveform is shown in the upper graph and the lower graph indicates the waveform. For each local condition, the waveform is cut out and the power spectrum is shown separately in (a), (b) and (c). As I mentioned, 3 different notes (155 Hz, 196 Hz and 233 Hz) were generated and the power spectrum (d) shows that there are the frequencies in the waveform. Also, (a), (b) and (c) shows that expected frequencies were generated for each local condition. 33

42 3 Implementation and Experiment Results (a) (b) (c) (d) Figure 3.5: Upsampled model 1000 iterations Figure 3.6 shows the results for the simple model. (d) shows the overall generated waveform. It is clearly shown that although 3 different local conditions were input to the model every 300 samples, only 2 notes are generated. I repeatedly ran this test program, but it only generated one of the notes randomly for an arbitrary length. Hence, the local condition is totally ignored by the model. [3] 34

43 3 Implementation and Experiment Results (a) (b) (c) (d) Figure 3.6: Simple model 1000 iterations Fast and Slow Generation An algorithm for the generation is introduced [7]. The basic concept of this algorithm is to store the outcomes of each layer of dilated causal convolutional layers and reuse the values to generate next audio sample. For instance, in Figure 2.16, the output is computed by convolving the newest audio sample with the outcomes of other hidden layers convolutions. Therefore, the hidden layers values can be reused to compute other new samples convolution, because it is not affected by the new samples. This algorithm accelerates the generation dramatically, so it is highly recommended to utilise. Therefore, 35

44 3 Implementation and Experiment Results the generation with this algorithm is examined to see if it generates the same logits as the slow generation algorithm or not. The slow generation algorithm is to compute all the dilated causal convolutions from scratch. Figure 3.7: Example of the training data. To test the fast and slow generation, logits for each time step are computed based on the same previous audio samples. The logit is a 256 dimensional vector, which indicates the probability distribution of the time step s audio sample. Therefore, element wise differences between the logits generated by slow and fast algorithms are computed and squashed to a scalar value by summing the 256 dimensional vector. The logits are generated for 900 samples and the squashed differences are plotted 3.7. As the slow algorithm add the local condition to the entire dilated causal convolutions, while the fast algorithm updates the local condition of the new sample, there are big differences when the new local condition is fed into the WaveNet. For this test, the size of the receptive field is set to 256. At time step 300 and 600, the local condition was changed. After 256 steps, the WaveNet receives exactly the same audio samples and local conditions, so the there are no differences between slow and fast as the figure shows. 36

45 3 Implementation and Experiment Results 3.3 Converting a Video Frame to a Vector Using VGG Pre-trained Model As the WaveNet s performance has been confirmed in the previous section, I shall now move to the next step: converting an image into a descriptive vector by using VGG s pretrained model. The VGG s pre-trained model is accessible via the Internet. The original pre-trained model is compatible with Caffe 1. However, the same pre-trained model which is compatible with other deep learning libraries (e.g. TensorFlow 2, Keras 3, etc.) are also publicly available. TensorFlow is used for any deep learning implementation throughout this project, so the pretrained model for TensorFlow is obtained. There are two pre-trained models are available: 16-layer and 19-layer model, 19-layer model is used for this project. This 19-layer model is comprised of 16 CNN layers and 3 fully connected layers. The filters and biases stored in the pre-trained model are loaded as constant filters and bias values. Model the 16 CNN layers with TensorFlow library functions and loaded filters and biases are passed to corresponding layer s convolution function as its constant filter and bias. The size of video frames is reduced to 32 x 18 before they are passed to the CNN. Although the size of the input images for the original research is 224 x 224, there are some benefits to reduce the size. Firstly, the conversion gets computationally efficient. As the filters are slid through the image in each convolutional layer, if the input image is smaller, the computation cost can be reduced. Secondly, the size of output vector can be reduced. Although the original size input images (224 x 224) are converted into 1024 dimensional vectors, 32 x 18 images will become 512 dimensional vectors. It seems 1 One of the most popular deep learning framework. Originally developed at UC Berkeley. 2 An open-source machine learning library for python and C++. Developed by Google Brain Team. 3 An open-source neural network library for python. This is the second fastest growing deep learning framework after TensorFlow. 37

46 3 Implementation and Experiment Results that conversion from 32 x 18 (=576) into 512 is not an efficient dimension reduction. However, the image has 3 channels (RGB), so actually 32 x 18 x 3 (=1,728) is reduced to 512 and this vector has spatial information of the input image. Additionally, too many parameters could lead overfitting [2], and also the objects in the image do not need to be recognised in this project. All the WaveNet model need to do is to find the correlation between the image s spatial information and the sound. It is because of these reasons, the input video frames are downsampled. 3.4 Training the WaveNet with Videos. Now I will move onto the training and audio generation with WaveNet using videos as training data. For this experiment I used a video featuring waves on a beach: https: // The wave sound varies as the wave movements change. Specifically, when the wave comes to the shore, the wave sound becomes louder and as the wave goes back to the sea, the sound becomes quiet. The original video is divided into three videos: 1 minute for training, 30 seconds for validation. The 1 minute training video is also divided into 6 videos, and each video is 10 seconds long. I conducted three different experiments: 1. Training the model only with wave sound files. 2. Training the model with wave sound file and corresponding videos, and generate sound file without feeding the video frames (local conditions). 38

47 3 Implementation and Experiment Results 3. Training the model with wave sound file and corresponding videos, and generate sound file with the video frames (local conditions). (The second and the third experiments are done for both simple and upsampled local conditioning. ) Experiment 1: Train the model only with wave sound files. This experiment was conducted to confirm that the WaveNet model can learn and generate wave sounds. Additionally, the model was trained with different hyperparameters to inspect how they affect the quality of the generation. You can listen to the generated Figure 3.8: Error value through the training for 1000 epochs without local condition. 39

48 3 Implementation and Experiment Results wave sound at here: As you can hear, the generated wave sounds are stable wave-like sounds. As I did not feed the video frames as local conditions, the model captured the pattern of the wave sound in the training videos. 3.8 shows the plotted error values throughout the training phase. For each epoch I plotted the error value 50 times. Therefore there are 50 x 1000 = 50,000 points plotted. The initial error value is approximately 5.5, but it decreased to around 2. However, there are error values around 4 consistently through the training. So, it can be guessed that the model is fitted to some particular parts of the training videos. This is why the generated sound may also be considered as an imitation of those parts. It may be believed it is impossible to generate sound that perfectly matches the training sounds data. However, the model captured the patterns of the training wave sounds and generated a sound that sounds like a wave sound. Therefore, it is confirmed that this WaveNet model can learn the wave sound and generate it Experiment 2: Training the Model with Sounds and Videos, and Generating Sounds without Local Condition In this experiment, the model is trained with audio data and the corresponding video frames as local conditions. However, for the generation phase, the local condition is not given to the model. Therefore, the expected result would be similar to the results of the first experiment. This experiment was conducted to illustrate the effect of local conditioning for the generation clearly by comparing its result to the next experiment s result. The generated sound can be listened here: TPDuPzg 40

49 3 Implementation and Experiment Results Figure 3.9: Error value through the training for 300 epochs without local condition. I trained the mode for 300 epochs. Unlike the training without local conditioning, the error value was not decreased to around 2. Also, the generated sound is similar to the one generated by the model which is trained without local condition, but it is similar to white noise and people may not be able to recognise it as a wave sound Experiment 3: Training the Model with Sounds and Videos, and Generating Sound with Local condition In this experiment, the video frames are fed into the model for both training and generation phases. Hence, the generated sound is expected to correspond to the given video. 41

50 3 Implementation and Experiment Results The generated sound can be heard here: Although I fed the local conditions to the model for both training and generation phases, unfortunately the generated sound does not correspond to the video frames. Moreover, the generated sound does not vary through time even though it receives different local condition through the generation phase. 42

51 4 Discussion 4.1 Why doesn t local conditioning work for the video, while it works for the toy problem? There are possible causes. Implementation mistakes or bugs. The first possible cause is a wrong implementation. Although the performance of the local conditioning and the generation were tested with the toy problem, there still may be some bugs in the code. Specifically, for the generation, despiting giving the local conditions to the model during the generation phase, the generated sound was stable white noise. This was a strange result. However, the training error did not decrease greatly, so the model may have generated totally random values with any input values. This is also a possible reason why the generated sound does not vary even when the local condition is changed. The training video may not be appropriate for this task. 43

52 4 Discussion First of all, the wave sound is composed of white noises with certain dynamics which correspond to the movements of the wave. Therefore, the white noise can be said that one of patterns of the wave sounds. Additionally, the length of the cycle of waves dynamics is more than 5 seconds, but the WaveNet model that I used for the training can deal with only approximately 0.3 second long audio waveform. So, the model may not be able to capture the pattern of the waves (not the pattern of wave sounds, but the pattern of the wave itself). However, to help the model to recognise the pattern of the wave, the video frames are fed as local conditions. Here is another problem about using the wave videos as training data. As this video was recorded at a beach, the waves which were not in the video frame also make wave sounds and are recorded. Hence, the video frames and the wave sounds do not match perfectly. Alternative training videos ideas: Although I could only conduct the experiments with the wave videos in a short amount of time. Other types of videos were also prepared, such as fireworks, street and speech. It is proved that the WaveNet model can generate human language-like sound even if it is trained with only speech sound data. Therefore, if the model is trained with the speech videos, the model could generate human languagelike sound only when the speaker opens his/her mouth. The training was not enough. Although the training without the local conditioning was conducted for 1,000 epochs, the model with local conditioning was trained for only 300 epochs. So, the training may have not been enough. Despite this, it is more likely that other possible causes are the actual causes, because the plotted log of the training error values are different from that of the training without local conditioning. The hyperparameters need to be tuned well. 44

53 4 Discussion There are many hyperparameters for this project s model, but the effects of them on the performance have not been examined yet comprehensively. In the first experiment, the training and generation was conducted with different dilation, residual-channels, dilation-channels, skip-channels and initial-filter-width, but they have not inspected comprehensively. For each experiment, one of them is changed from the original setting. In the model, these hyperparameters affect each other so the performance could be changed dramatically with different combination of different parameters. However, it is quite time consuming for each iteration, which is why they could not be examined comprehensively. Once an adequate result for the training and generation with video frame local conditions is obtained, they need to be inspected more carefully to achieve the best result. 45

54 5 Future Works For future projects, I have considered training the model with multiple types videos at the same time, and improving the model with Recurrent Neural Networks. I am curious about what kind of sounds the model can generate for some videos that it does not see during the training phase. If the model can learn multiple videos sound and video frames, it could guess sounds of videos which originally do not have sound such as snow or blooming flowers. Additionally, the model could be improved by applying a Recurrent neural network for the local condition [1]. As I mentioned, although the WaveNet model s receptive field size is equivalent to about 0.3 second, the cycle of the wave is more than 5 seconds. Therefore, the prediction could be improved by considering longer range of the input. However, because of the computational cost the receptive field cannot be increased to 5 seconds. On the other hand, the frame rate of the input video is 25 frames per second, so it is more feasible to consider longer range of video frames. Even if 5 seconds long video frames are taken into an account, the length of it is 125 frames (=25 frames per second x 5 seconds). Here is the recurrent neural net model to apply for the video frame local condition. For the normal video frame local condition, the descriptive vector of each video frame is 46

55 5 Future Works upsampled to 16 khz which is same as the sample rate of the audio data. Therefore, only one vector is given to the model which corresponds to each audio sample. For the model with a recurrent neural network, the local condition is a sequence of video frames. The sequence is fed into the recurrent neural networks, and the output of the end of the recurrent neural network is passed to the WaveNet as a local condition. Figure 5.1: Model overview of WaveNet with LSTM. The first idea feeds the same local condition for all the audio samples including ones covered by the receptive field. However, as the model test showed, each audio sample is supposed to receive a corresponding local condition. Therefore, this idea is designed to solve this problem. The problem of the first idea is that only the last output of the recurrent neural network is taken and fed into the WaveNet. In this idea, each audio sample receives corresponding part of the recurrent neural network s output

56 5 Future Works shows how this model feeds local conditions. Figure 5.2: Visualisation of the second idea of improving the local conditioning with LSTM. The blue line in 5.2 is the waveform and the diagram above illustrates the unfolded recurrent neural network. The inputs for the recurrent neural network (X 0,, X n+3 ) are the video frames. The normal model with upsampled local conditioning (without recurrent neural network) receives X 0,, X n+3 as local conditions. X 0 is given to the corresponding audio samples which are in the same red grid in the figure. So, in this idea, the corresponding output of the recurrent neural network is given to the audio samples. Namely, h 0,, h n+3 are fed into the WaveNet model instead of X 0,, X n+3. Figure 5.3 shows the model overview of this idea. 48

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Deep Learning. Dr. Johan Hagelbäck.

Deep Learning. Dr. Johan Hagelbäck. Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 Final Project Landscape Tabla bol transcription Music Genre Classification Audio

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Convolutional neural networks

Convolutional neural networks Convolutional neural networks Themes Curriculum: Ch 9.1, 9.2 and http://cs231n.github.io/convolutionalnetworks/ The simple motivation and idea How it s done Receptive field Pooling Dilated convolutions

More information

Convolutional Networks Overview

Convolutional Networks Overview Convolutional Networks Overview Sargur Srihari 1 Topics Limitations of Conventional Neural Networks The convolution operation Convolutional Networks Pooling Convolutional Network Architecture Advantages

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1 Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent

More information

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland An Introduction to Convolutional Neural Networks Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland Sources & Resources - Andrej Karpathy, CS231n http://cs231n.github.io/convolutional-networks/

More information

1 Introduction. w k x k (1.1)

1 Introduction. w k x k (1.1) Neural Smithing 1 Introduction Artificial neural networks are nonlinear mapping systems whose structure is loosely based on principles observed in the nervous systems of humans and animals. The major

More information

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016 Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

Coursework 2. MLP Lecture 7 Convolutional Networks 1

Coursework 2. MLP Lecture 7 Convolutional Networks 1 Coursework 2 MLP Lecture 7 Convolutional Networks 1 Coursework 2 - Overview and Objectives Overview: Use a selection of the techniques covered in the course so far to train accurate multi-layer networks

More information

Biologically Inspired Computation

Biologically Inspired Computation Biologically Inspired Computation Deep Learning & Convolutional Neural Networks Joe Marino biologically inspired computation biological intelligence flexible capable of detecting/ executing/reasoning about

More information

arxiv: v1 [cs.ce] 9 Jan 2018

arxiv: v1 [cs.ce] 9 Jan 2018 Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science

More information

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Open Source Dataset and Deep Learning Models

More information

Image Manipulation Detection using Convolutional Neural Network

Image Manipulation Detection using Convolutional Neural Network Image Manipulation Detection using Convolutional Neural Network Dong-Hyun Kim 1 and Hae-Yeoun Lee 2,* 1 Graduate Student, 2 PhD, Professor 1,2 Department of Computer Software Engineering, Kumoh National

More information

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF 95 CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF 6.1 INTRODUCTION An artificial neural network (ANN) is an information processing model that is inspired by biological nervous systems

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve Renals Machine Learning

More information

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE A Thesis by Andrew J. Zerngast Bachelor of Science, Wichita State University, 2008 Submitted to the Department of Electrical

More information

MINE 432 Industrial Automation and Robotics

MINE 432 Industrial Automation and Robotics MINE 432 Industrial Automation and Robotics Part 3, Lecture 5 Overview of Artificial Neural Networks A. Farzanegan (Visiting Associate Professor) Fall 2014 Norman B. Keevil Institute of Mining Engineering

More information

Neural Network Part 4: Recurrent Neural Networks

Neural Network Part 4: Recurrent Neural Networks Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

IBM SPSS Neural Networks

IBM SPSS Neural Networks IBM Software IBM SPSS Neural Networks 20 IBM SPSS Neural Networks New tools for building predictive models Highlights Explore subtle or hidden patterns in your data. Build better-performing models No programming

More information

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Steve Renals Machine Learning Practical MLP Lecture 4 9 October 2018 MLP Lecture 4 / 9 October 2018 Deep Neural Networks (2)

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

CS 7643: Deep Learning

CS 7643: Deep Learning CS 7643: Deep Learning Topics: Toeplitz matrices and convolutions = matrix-mult Dilated/a-trous convolutions Backprop in conv layers Transposed convolutions Dhruv Batra Georgia Tech HW1 extension 09/22

More information

Radio Deep Learning Efforts Showcase Presentation

Radio Deep Learning Efforts Showcase Presentation Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how

More information

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1 Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1 Hidden Unit Transfer Functions Initialising Deep Networks Steve Renals Machine Learning Practical MLP Lecture

More information

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang Introduction Recurrent neural networks Dates back to (Rumelhart et al., 1986) A family of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Perceptron Barnabás Póczos Contents History of Artificial Neural Networks Definitions: Perceptron, Multi-Layer Perceptron Perceptron algorithm 2 Short History of Artificial

More information

Lecture 23 Deep Learning: Segmentation

Lecture 23 Deep Learning: Segmentation Lecture 23 Deep Learning: Segmentation COS 429: Computer Vision Thanks: most of these slides shamelessly adapted from Stanford CS231n: Convolutional Neural Networks for Visual Recognition Fei-Fei Li, Andrej

More information

REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK

REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK Thomas Schmitz and Jean-Jacques Embrechts 1 1 Department of Electrical Engineering and Computer Science,

More information

Landmark Recognition with Deep Learning

Landmark Recognition with Deep Learning Landmark Recognition with Deep Learning PROJECT LABORATORY submitted by Filippo Galli NEUROSCIENTIFIC SYSTEM THEORY Technische Universität München Prof. Dr Jörg Conradt Supervisor: Marcello Mulas, PhD

More information

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi CS 229, Project Progress Report SUNet ID: 06044535 Name: Ajay Shanker Tripathi Title: Voice Transmogrifier: Spoofing My Girlfriend s Voice Project Category: Audio and Music The project idea is an easy-to-state

More information

Colorful Image Colorizations Supplementary Material

Colorful Image Colorizations Supplementary Material Colorful Image Colorizations Supplementary Material Richard Zhang, Phillip Isola, Alexei A. Efros {rich.zhang, isola, efros}@eecs.berkeley.edu University of California, Berkeley 1 Overview This document

More information

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw Review Analysis of Pattern Recognition by Neural Network Soni Chaturvedi A.A.Khurshid Meftah Boudjelal Electronics & Comm Engg Electronics & Comm Engg Dept. of Computer Science P.I.E.T, Nagpur RCOEM, Nagpur

More information

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 1 Olaf Ronneberger, Philipp Fischer, Thomas Brox (Freiburg, Germany) 2 Hyeonwoo Noh, Seunghoon Hong, Bohyung Han (POSTECH,

More information

Counterfeit Bill Detection Algorithm using Deep Learning

Counterfeit Bill Detection Algorithm using Deep Learning Counterfeit Bill Detection Algorithm using Deep Learning Soo-Hyeon Lee 1 and Hae-Yeoun Lee 2,* 1 Undergraduate Student, 2 Professor 1,2 Department of Computer Software Engineering, Kumoh National Institute

More information

Application of Multi Layer Perceptron (MLP) for Shower Size Prediction

Application of Multi Layer Perceptron (MLP) for Shower Size Prediction Chapter 3 Application of Multi Layer Perceptron (MLP) for Shower Size Prediction 3.1 Basic considerations of the ANN Artificial Neural Network (ANN)s are non- parametric prediction tools that can be used

More information

Course Objectives. This course gives a basic neural network architectures and learning rules.

Course Objectives. This course gives a basic neural network architectures and learning rules. Introduction Course Objectives This course gives a basic neural network architectures and learning rules. Emphasis is placed on the mathematical analysis of these networks, on methods of training them

More information

Understanding Neural Networks : Part II

Understanding Neural Networks : Part II TensorFlow Workshop 2018 Understanding Neural Networks Part II : Convolutional Layers and Collaborative Filters Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Convolutional

More information

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2017 Audio Effects Emulation with Neural Networks OMAR DEL TEJO CATALÁ LUIS MASÍA FUSTER KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL

More information

THE problem of automating the solving of

THE problem of automating the solving of CS231A FINAL PROJECT, JUNE 2016 1 Solving Large Jigsaw Puzzles L. Dery and C. Fufa Abstract This project attempts to reproduce the genetic algorithm in a paper entitled A Genetic Algorithm-Based Solver

More information

On the Use of Convolutional Neural Networks for Specific Emitter Identification

On the Use of Convolutional Neural Networks for Specific Emitter Identification On the Use of Convolutional Neural Networks for Specific Emitter Identification Lauren Joy Wong Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

Low frequency extrapolation with deep learning Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology

Low frequency extrapolation with deep learning Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology SUMMARY The lack of the low frequency information and good initial model can seriously affect the success of full waveform inversion

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks Escola Tècnica Superior d Enginyeria Informàtica Universitat Politècnica de València Audio Effects Emulation with Neural Networks Trabajo Fin de Grado Grado en Ingeniería Informática Autor: Omar del Tejo

More information

We Know Where You Are : Indoor WiFi Localization Using Neural Networks Tong Mu, Tori Fujinami, Saleil Bhat

We Know Where You Are : Indoor WiFi Localization Using Neural Networks Tong Mu, Tori Fujinami, Saleil Bhat We Know Where You Are : Indoor WiFi Localization Using Neural Networks Tong Mu, Tori Fujinami, Saleil Bhat Abstract: In this project, a neural network was trained to predict the location of a WiFi transmitter

More information

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY Sidhesh Badrinarayan 1, Saurabh Abhale 2 1,2 Department of Information Technology, Pune Institute of Computer Technology, Pune, India ABSTRACT: Gestures

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Classification of Road Images for Lane Detection

Classification of Road Images for Lane Detection Classification of Road Images for Lane Detection Mingyu Kim minkyu89@stanford.edu Insun Jang insunj@stanford.edu Eunmo Yang eyang89@stanford.edu 1. Introduction In the research on autonomous car, it is

More information

GPU ACCELERATED DEEP LEARNING WITH CUDNN

GPU ACCELERATED DEEP LEARNING WITH CUDNN GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION

More information

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM) NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM) Ahmed Nasraden Milad M. Aziz M Rahmadwati Artificial neural network (ANN) is one of the most advanced technology fields, which allows

More information

Continuous Gesture Recognition Fact Sheet

Continuous Gesture Recognition Fact Sheet Continuous Gesture Recognition Fact Sheet August 17, 2016 1 Team details Team name: ICT NHCI Team leader name: Xiujuan Chai Team leader address, phone number and email Address: No.6 Kexueyuan South Road

More information

A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks

A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks Simran Shah 1, Ami Kotia 2, Kausha Nisar 3, Aneri Udeshi 4, Prof. Pramila. M. Chawan 5 1,2,3,4U.G. Students, Department

More information

Machine Learning and RF Spectrum Intelligence Gathering

Machine Learning and RF Spectrum Intelligence Gathering A CRFS White Paper December 2017 Machine Learning and RF Spectrum Intelligence Gathering Dr. Michael Knott Research Engineer CRFS Ltd. Contents Introduction 3 Guiding principles 3 Machine learning for

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks ABSTRACT Just as life attempts to understand itself better by modeling it, and in the process create something new, so Neural computing is an attempt at modeling the workings

More information

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SF Minhas A Barton P Gaydecki School of Electrical and

More information

IMPLEMENTATION OF NEURAL NETWORK IN ENERGY SAVING OF INDUCTION MOTOR DRIVES WITH INDIRECT VECTOR CONTROL

IMPLEMENTATION OF NEURAL NETWORK IN ENERGY SAVING OF INDUCTION MOTOR DRIVES WITH INDIRECT VECTOR CONTROL IMPLEMENTATION OF NEURAL NETWORK IN ENERGY SAVING OF INDUCTION MOTOR DRIVES WITH INDIRECT VECTOR CONTROL * A. K. Sharma, ** R. A. Gupta, and *** Laxmi Srivastava * Department of Electrical Engineering,

More information

Music Recommendation using Recurrent Neural Networks

Music Recommendation using Recurrent Neural Networks Music Recommendation using Recurrent Neural Networks Ashustosh Choudhary * ashutoshchou@cs.umass.edu Mayank Agarwal * mayankagarwa@cs.umass.edu Abstract A large amount of information is contained in the

More information

Decoding Brainwave Data using Regression

Decoding Brainwave Data using Regression Decoding Brainwave Data using Regression Justin Kilmarx: The University of Tennessee, Knoxville David Saffo: Loyola University Chicago Lucien Ng: The Chinese University of Hong Kong Mentor: Dr. Xiaopeng

More information

Neural Network Classifier and Filtering for EEG Detection in Brain-Computer Interface Device

Neural Network Classifier and Filtering for EEG Detection in Brain-Computer Interface Device Neural Network Classifier and Filtering for EEG Detection in Brain-Computer Interface Device Mr. CHOI NANG SO Email: cnso@excite.com Prof. J GODFREY LUCAS Email: jglucas@optusnet.com.au SCHOOL OF MECHATRONICS,

More information

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 - Lecture 11: Detection and Segmentation Lecture 11-1 May 10, 2017 Administrative Midterms being graded Please don t discuss midterms until next week - some students not yet taken A2 being graded Project

More information

INFORMATION about image authenticity can be used in

INFORMATION about image authenticity can be used in 1 Constrained Convolutional Neural Networs: A New Approach Towards General Purpose Image Manipulation Detection Belhassen Bayar, Student Member, IEEE, and Matthew C. Stamm, Member, IEEE Abstract Identifying

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network Controlling Cost and Time of Construction Projects Using Neural Network Li Ping Lo Faculty of Computer Science and Engineering Beijing University China Abstract In order to achieve optimized management,

More information

The Art of Neural Nets

The Art of Neural Nets The Art of Neural Nets Marco Tavora marcotav65@gmail.com Preamble The challenge of recognizing artists given their paintings has been, for a long time, far beyond the capability of algorithms. Recent advances

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 6. Convolutional Neural Networks (Some figures adapted from NNDL book) 1 Convolution Neural Networks 1. Convolutional Neural Networks Convolution,

More information

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2018 Comparison of Google Image

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Convolution, LeNet, AlexNet, VGGNet, GoogleNet, Resnet, DenseNet, CAM, Deconvolution Sept 17, 2018 Aaditya Prakash Convolution Convolution Demo Convolution Convolution in

More information

Image Extraction using Image Mining Technique

Image Extraction using Image Mining Technique IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 9 (September. 2013), V2 PP 36-42 Image Extraction using Image Mining Technique Prof. Samir Kumar Bandyopadhyay,

More information

A Novel Fuzzy Neural Network Based Distance Relaying Scheme

A Novel Fuzzy Neural Network Based Distance Relaying Scheme 902 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 15, NO. 3, JULY 2000 A Novel Fuzzy Neural Network Based Distance Relaying Scheme P. K. Dash, A. K. Pradhan, and G. Panda Abstract This paper presents a new

More information

Deep Learning for Launching and Mitigating Wireless Jamming Attacks

Deep Learning for Launching and Mitigating Wireless Jamming Attacks Deep Learning for Launching and Mitigating Wireless Jamming Attacks Tugba Erpek, Yalin E. Sagduyu, and Yi Shi arxiv:1807.02567v2 [cs.ni] 13 Dec 2018 Abstract An adversarial machine learning approach is

More information

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices Daniele Ravì, Charence Wong, Benny Lo and Guang-Zhong Yang To appear in the proceedings of the IEEE

More information

Proposers Day Workshop

Proposers Day Workshop Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Cognitive Computing Vertical Research Center Mandy Pant Academic Research Director Intel Corporation Center Motivation Today s deep learning

More information

Accelerating Stochastic Random Projection Neural Networks

Accelerating Stochastic Random Projection Neural Networks Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 12-2017 Accelerating Stochastic Random Projection Neural Networks Swathika Ramakrishnan sxr1661@rit.edu Follow

More information

Impact of Automatic Feature Extraction in Deep Learning Architecture

Impact of Automatic Feature Extraction in Deep Learning Architecture Impact of Automatic Feature Extraction in Deep Learning Architecture Fatma Shaheen, Brijesh Verma and Md Asafuddoula Centre for Intelligent Systems Central Queensland University, Brisbane, Australia {f.shaheen,

More information

Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip

Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Assistant Professor of Electrical Engineering and Computer Engineering shimengy@asu.edu http://faculty.engineering.asu.edu/shimengyu/

More information

CHAPTER 4 MONITORING OF POWER SYSTEM VOLTAGE STABILITY THROUGH ARTIFICIAL NEURAL NETWORK TECHNIQUE

CHAPTER 4 MONITORING OF POWER SYSTEM VOLTAGE STABILITY THROUGH ARTIFICIAL NEURAL NETWORK TECHNIQUE 53 CHAPTER 4 MONITORING OF POWER SYSTEM VOLTAGE STABILITY THROUGH ARTIFICIAL NEURAL NETWORK TECHNIQUE 4.1 INTRODUCTION Due to economic reasons arising out of deregulation and open market of electricity,

More information

Analysis of Learning Paradigms and Prediction Accuracy using Artificial Neural Network Models

Analysis of Learning Paradigms and Prediction Accuracy using Artificial Neural Network Models Analysis of Learning Paradigms and Prediction Accuracy using Artificial Neural Network Models Poornashankar 1 and V.P. Pawar 2 Abstract: The proposed work is related to prediction of tumor growth through

More information

Convolutional Neural Networks: Real Time Emotion Recognition

Convolutional Neural Networks: Real Time Emotion Recognition Convolutional Neural Networks: Real Time Emotion Recognition Bruce Nguyen, William Truong, Harsha Yeddanapudy Motivation: Machine emotion recognition has long been a challenge and popular topic in the

More information

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk

More information

Classifying the Brain's Motor Activity via Deep Learning

Classifying the Brain's Motor Activity via Deep Learning Final Report Classifying the Brain's Motor Activity via Deep Learning Tania Morimoto & Sean Sketch Motivation Over 50 million Americans suffer from mobility or dexterity impairments. Over the past few

More information

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, CONVOLUTION OPERATION, ALEXNET MOTIVATION Fully connected neural network Example 1000x1000 image 1M hidden units 10 12 (= 10 6 10 6 ) parameters! Observation

More information

Using Deep Learning for Sentiment Analysis and Opinion Mining

Using Deep Learning for Sentiment Analysis and Opinion Mining Using Deep Learning for Sentiment Analysis and Opinion Mining Gauging opinions is faster and more accurate. Abstract How does a computer analyze sentiment? How does a computer determine if a comment or

More information

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Parallel to AIMA 8., 8., 8.6.3, 8.9 The Automatic Classification Problem Assign object/event or sequence of objects/events

More information

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK 4.1 INTRODUCTION For accurate system level simulator performance, link level modeling and prediction [103] must be reliable and fast so as to improve the

More information

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,

More information

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies Journal of Electrical Engineering 5 (27) 29-23 doi:.7265/2328-2223/27.5. D DAVID PUBLISHING Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Patrice Wira and Thien Minh Nguyen

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Characterization of LF and LMA signal of Wire Rope Tester

Characterization of LF and LMA signal of Wire Rope Tester Volume 8, No. 5, May June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Characterization of LF and LMA signal

More information

THE USE OF ARTIFICIAL NEURAL NETWORKS IN THE ESTIMATION OF THE PERCEPTION OF SOUND BY THE HUMAN AUDITORY SYSTEM

THE USE OF ARTIFICIAL NEURAL NETWORKS IN THE ESTIMATION OF THE PERCEPTION OF SOUND BY THE HUMAN AUDITORY SYSTEM INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS VOL. 8, NO. 3, SEPTEMBER 2015 THE USE OF ARTIFICIAL NEURAL NETWORKS IN THE ESTIMATION OF THE PERCEPTION OF SOUND BY THE HUMAN AUDITORY SYSTEM

More information

Fault Diagnosis of Analog Circuit Using DC Approach and Neural Networks

Fault Diagnosis of Analog Circuit Using DC Approach and Neural Networks 294 Fault Diagnosis of Analog Circuit Using DC Approach and Neural Networks Ajeet Kumar Singh 1, Ajay Kumar Yadav 2, Mayank Kumar 3 1 M.Tech, EC Department, Mewar University Chittorgarh, Rajasthan, INDIA

More information

Stacking Ensemble for auto ml

Stacking Ensemble for auto ml Stacking Ensemble for auto ml Khai T. Ngo Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master

More information

Demystifying Machine Learning

Demystifying Machine Learning Demystifying Machine Learning By Simon Agius Muscat Software Engineer with RightBrain PyMalta, 19/07/18 http://www.rightbrain.com.mt 0. Talk outline 1. Explain the reasoning behind my talk 2. Defining

More information

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS Bulletin of the Transilvania University of Braşov Vol. 10 (59) No. 2-2017 Series I: Engineering Sciences ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS E. HORVÁTH 1 C. POZNA 2 Á. BALLAGI 3

More information

Multiple-Layer Networks. and. Backpropagation Algorithms

Multiple-Layer Networks. and. Backpropagation Algorithms Multiple-Layer Networks and Algorithms Multiple-Layer Networks and Algorithms is the generalization of the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer functions.

More information