Music Recommendation using Recurrent Neural Networks

Size: px
Start display at page:

Download "Music Recommendation using Recurrent Neural Networks"

Transcription

1 Music Recommendation using Recurrent Neural Networks Ashustosh Choudhary * ashutoshchou@cs.umass.edu Mayank Agarwal * mayankagarwa@cs.umass.edu Abstract A large amount of information is contained in the sequence of songs in a particular playlist. Typical collaborative filtering based recommendation approaches do not consider this information for recommendations. This project models the music recommendation problem as a sequence prediction problem and explores the application of Long Short Term Memory (LSTM) networks to the problem. 1 Introduction Music listening activity typically spans over a sequence of songs rather than a single song in isolation. The relative position of each song in the user s playlist has significance, and the user satisfaction can significantly change if the same songs are played but in a different order. Typical collaborative filtering approaches do not model the sequence aspect of music recommendation. Long Short Term Memory network is a Recurrent Neural Network (RNN) architecture that unlike traditional RNNs are capable of learning longterm dependencies in the data. Since the user playing history can span years, LSTMs are apt to model the music recommendation problem as a sequence and also learn the long term dependencies in the user s music preference. This project experiments with the application of LSTM network to music rcommendation. The network is trained on the Million Song Dataset along with Last.fm user playing history dataset. 2 Related Work Liebman et al. in [1], model the muusic playlist recommendation problem as a sequence decision making task. The paper introduces a new Reinforcement Learning based method for music recommendation task, and concludes that modelling music recommendation as a sequence decision making task gives the proposed method a small but significant boost in performance compared to only reason about song preferences. Oord et al. in [2], apply convolutional neural networks to learn latent factors of music audio and use these to predict songs to users. This paper showed that recent advances in deep learning methods along with the suggested approach translates very well to the music recommendation task. Learning from audio data alone, the suggested model performs sensibly on the Million Song Dataset. 3 LSTM LSTMs have been used extensively in language models and text generation tasks. The reason for this reliance on LSTMs for sequence based tasks is the property of LSTMs to effectively propagate long-term dependencies in a sequence of objects. In case of text, these objects are either one-hot encoded vectors or word-embeddings. Embeddings are fixed size vector representation of input datapoint i.e. input feature vector. LSTMs achieve the mentioned effective gradient propagation by making use of memory units called cell states as well as a combination of gates. These gates have an added advantage of learning to give appropriate weights to more prominent or essential parts of the sequence. There are other types of RNNs such as Gated Recurrent Units(GRUs) which are also used for similar tasks, however, unlike LSTMs they have lesser number of gates hence lesser control on gradient flow as well as they do not have memory units and expose the complete memory. Both the gated networks are superior than a simple RNN but there is no conclusive evidence to claim one is more superior than the other.[3]. Given a sequence of inputs X = {x 1, x 2,..., x nx } an LSTM associates each time step with an input gate, a memory gate and an output gate, de-

2 noted by i t,f t and o t. Let e t represent the inputembedding unit i.e. word in a sentence or song in a playlist. If c t and h t represent the cell state vector and produced hidden state at time t, then vector representation h t for each time step is given by: i t f t o t = l t tanh W. [ ht 1 e s t ] (1) c t = f t.c t 1 + i t.l t (2) h s t = o t.tanh(c t ) (3) where W i, W f, W o, W t R K 2K 4 Sequence-to-Sequence Models As the name suggests, sequence-to-sequence(seqto-seq) models are used to generate a sequence with another sequence as its input. Seq-to-seq models have been used extensively in machine translation and dialog systems. Introduced by Cho et. el [4], a basic seq-to-seq model consists of two LSTMs: an encoder that processes the input sequence and a decoder that generates the output sequence. The encoder is used to obtain a vector representation of input sequence. The final hidden state of encoder is this representation which is then fed to the decoder as part of its input at each time step. The decoder, thus at each time step, has the information pertaining to the dependent sequence, current sequence partial information till that time step and the sequence element generated in the previous time step. To increase the capacity of the model, multi-layer cells have been successfully used in seq-to-seq models [5]. If input sequence lengths are really large, terms seen long back in the sequence may be forgotten by a basic seq-to-seq model. This is despite the fact that gated units are able to remember quite long sequences. However, it has been shown that for sequence lengths of more than 30 terms, lead to gradients decaying to zero. Thus no useful information is passed from the terms more than a threshold terms away. To allow decoder a more direct access to the input, an attention mechanism was introduce in [6]. In this mechanism, the model allows decoder to peek into the input at every decoding step. Figure 1: A basic seq-to-seq model with encoder decoder architecture. 5 Proposed model We tried to model a user playlist as a sequence of songs and tried to recommend another sequence i.e. a playlist for the user. This is based on a minimalistic approach wherein no rating data needs to be available for user-song pair in order to recommend next few songs. This approach thus seems to tackle the cold start or data sparsity problem. A sequence-to-sequence model was chosen to recommend a sequence of songs to the user based upon a sequence he had already listened to. A new user s first song is suggested based on the song which is most likely to be listened first by a user. This learning is imparted in the deep RNN network by passing each decoder sequence s start as a special SOS ( start of sequence ) symbol. Each song is represented as an embedding of fixed size 65. This embedding is either learned from the sequence data or is extracted from additional data as explained later in the datasets section. An encoder with LSTM unit is used to encode the sequence of songs as an fixed vector and a decoder is used to output another sequence of songs which in our problem statement corresponds to the recommended playlist. Since the length of playlist listened by a single user, as provided in the dataset, is over a span of years, the use of attention mechanism is an obvious extension to the model as discussed in the future works section. The average length of song playlist is around 900 songs. The equation for the encoder and decoder LSTM remains the same as in (1),(2) and (3). This supervised learning of song-sequence is based upon the premise of considering the next song played being the ground truth of the output for the current time step. Thus for a playlist of songs S = (s 1, s 2,...s N ) and e t being the embedding of t th input song, the x and y vectors are: x = S[0...N 1] = e(s 0 ), e(s 1 ),...e(s N 1 ) (4)

3 y = S[1...N] = e(s 1 ), e(s 2 ),...e(s N ) (5) where each e(s t ) R 1 D. The final encoder state after an entire sequence of x has been processed by the encoder is the required vector representation of the playlist of the user. Now, this vector h maxt R 1 H is concatenated with each input e(s t ) in the decoder to generate or recommend a song appropriate for this time step. This is the recommended or predicted song ŷ. The loss is defined at the decoder final output state and back-propagated through the entire decoderencoder model end-to-end. If embeddings are being learned, then the loss is back-propagated even to the embedding layer. The loss used is cross-entropy on log softmax. The softmax function is given by: (z) j = e z j k k=1 ez j for j = 1, 2,..K (6) Thus softmax provides a probability distribution over all possible songs in the song vocabulary, while the ground truth probability distribution is a one hot encoded vector of the vovabulary size as well. A cross-entropy is calculated between these two probability distribution using formula: pairs. Thus, while the MSD provides us with the song metadata and audio features, the Last.fm dataset provides us with the user s song listening history. 6.1 Data preparation The two datasets used for this project lacked a common key to correlate the two dataset entries. Therefore, the two datasets were loosely correlated using a matching on the artists musicbrainz ID, and the track name. We call this a loose approach because a small number of records remained unmatched in the song database because they didn t have the artist musicbrainz ID, or a minor difference in the track name. 6.2 Training, Test, and Validation splits Playlist data for 992 users was split into three components: 70% as training data, 15% as validation data, and 15% as test data. This split was carried on a user level, i.e., either a user and his entire playing history is tagged as training data, test data, or validation data. Finally, the training data contained 58,913 user-song pairs, the validation data contained 11,349 user-song pairs, and the test data contained 14,536 user-song pairs. H(p, q) = x p(x)logq(x) (7) 7 Features Here, p(x) is the predicted probability distribution while q(x) is the actual probability distribution. 6 Datasets The Million Song Dataset (MSD) [7] is a freelyavailable collection of audio features and metadata for a million contempory popular music tracks. The dataset is available in HDF5 format and contains the track, song, release, and artist information for every track included in the dataset. The complete MSD is 273 Gbs in size and therefore, presents a challenge in storage and also increases the time required for pre-processing of data. Thus, we decided to work with a subset of the MSD containing 10,000 randomly sub-sampled songs from the original million songs. While the Million Song Dataset provides us with the audio features and metadata for songs, we also require the playlist data for multiple users for training. For this purpose, we chose the Last.fm dataset [8] which contains the listening history for 992 unique users, and nearly 19 million user-song The features used to encode the song and the user information is listed in Table 1, and Table 2 respectively. The song encoding is a 56 dimensional vector, while the user encoding is 6 dimensional. Thus, every data sample is 62 dimensional vector. 8 Evaluation metric To evaluate the performance of the music recommender system, we chose the Mean Average Precision (MAP) evaluation metric. MAP is a ranked precision metric that places emphasis on highly ranked correct precisions [9]. Mean Average Precision is defined as: MAP = Q q=1 AP (q) Q where AP(q) is the average precision of each user. The average precision AP(q) is defined as:

4 Feature name artist familiarity artist hotness artist latitude, and artist longitude artist tags total beats danceability energy key loudness mode release total sections section pitches song hotness tempo time signature Year Feature name Gender Age Age Age 2 Latitude Longitude Description 0-1 scale familiarity as determined by the EchoNest API 0-1 scale hotness of the artist as detemined by the EchoNest API Latitude of the country the artist is based in. Top 3 tags associated to the artist Total number of beats in the song 0-1 scale danceability on the songs as determined by the EchoNest API duration : Duration (in seconds) of the song Energy in the song from the listeners point of view The key the song is in Loudness of the song in db Mode the song is in (major/minor) Album name Total number of sections present in the song Pitches of the longest 3 segments in the song 0-1 scale hotness of the song as determined by the EchoNest API Tempo of the song Time signature of the song Year of release Table 1: Song features Description Gender of the user Age of the user Square root of the age of the user Square of the age of the user Latitude of the country the user is in. Longitude of the country the user is in. Table 2: User features AP (n) = n k=1 P (k) m where P(k) is the precision at cutoff point k in the song recommendation list, and m is the number of correctly predicted nodes. 9 Experiments and Results 9.1 Baseline Baseline for the experiments is a simple collaborative filtering technique using Matrix factorization.for the sake of comparison, for the baseline techniques involving Matrix factorization, only user-item pairs with ratings present are considered. A deep model baseline for the experiments is chosen to be a simple LSTM network without the encoding-decoding logic. The LSTM is trained on sequence of songs listened by the user and is trained to start predicting after receiving a special symbol SOS. In one part of the baseline experiments, embeddings for each song is learned from the data and hence has information pertaining to its neighboring songs. In the other part, this embedding is hand-crafted with additional data as mentioned in the dataset section. To obtain recommendations in this case at each time step top-k (top-30) values have been extracted based on their probability values after the softmax layer. 9.2 Experiments Experiments have been performed using different configurations of each of the components of the seq-to-seq model. As done in the baseline, two options for song embeddings have been used: learned from data, extracted from additional data. Also, experiments with number of layers in the LSTM cell is done with number of layers in the set 1, 2, 5. Initial learning rates of AdamOptimizer have been taken in a range of 1e 5, 1e 4, 1e 3. Decay rate is the default Results Results are tabulated in Table Conclusions This project aims to utilize the sequence information present in music playlists to provide better recommendations to the user. Results show that modelling the music recommendation problem as

5 Model Embedding type Number of layers Top K for MAE Learned Extracted 1 layer 2 layers 5 layers top 10 top 20 top 30 Matrix Factorization N.A N.A. N.A. N.A N.A. N.A. Basic LSTM Seq-to-Seq Table 3: Results and comparison with baselines a sequence decision making task through LSTM and a Seq-to-Seq model improves the Mean Average Precision of the recommender system. In comparison with the baseline Matrix Factorization model, which does not take the sequence information in consideration, LSTM and seq-toseq models show better recommendations and better MAP values. This validates our premise that modelling music recommendation as a sequence decision problem improves the quality of recommendations of the system. Kaggle - Mean Average Precision. MeanAveragePrecision References Liebman, et al. Dj-mc: A reinforcement-learning agent for music playlist recommendation. Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Van den Oord, et al. Deep content-based music recommendation. Advances in neural information processing systems Chung, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arxiv: (2014). Cho, Kyunghyun, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv preprint arxiv: (2014). Cho, et al. Sequence to sequence learning with neural networks. Advances in neural information processing systems Bahdanau, et al. Neural machine translation by jointly learning to align and translate. arxiv preprint arxiv: (2014). Thierry Bertin-Mahieux, et al. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011) last.fm - listen to free music and watch videos with the largest music catalogue online. last.fm/

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier 1, Sigurd Spieckermann 2 and Volker Tresp 1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich, Germany 2- Siemens

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

Attentive Neural Architecture Incorporating Song Features For Music Recommendation

Attentive Neural Architecture Incorporating Song Features For Music Recommendation Attentive Neural Architecture Incorporating Song Features For Music Recommendation by Noveen Sachdeva, Kartik Gupta, Vikram Pudi in 12th ACM Conference on Recommender Systems (RECSYS-2018) Vancouver, Canada

More information

Million Song Dataset Challenge!

Million Song Dataset Challenge! 1 Introduction Million Song Dataset Challenge Fengxuan Niu, Ming Yin, Cathy Tianjiao Zhang Million Song Dataset (MSD) is a freely available collection of data for one million of contemporary songs (http://labrosa.ee.columbia.edu/millionsong/).

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1 Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve

More information

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier1, Sigurd Spieckermann2 and Volker Tresp1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich,

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve Renals Machine Learning

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

Using Deep Learning for Sentiment Analysis and Opinion Mining

Using Deep Learning for Sentiment Analysis and Opinion Mining Using Deep Learning for Sentiment Analysis and Opinion Mining Gauging opinions is faster and more accurate. Abstract How does a computer analyze sentiment? How does a computer determine if a comment or

More information

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang Introduction Recurrent neural networks Dates back to (Rumelhart et al., 1986) A family of

More information

arxiv: v2 [cs.sd] 22 May 2017

arxiv: v2 [cs.sd] 22 May 2017 SAMPLE-LEVEL DEEP CONVOLUTIONAL NEURAL NETWORKS FOR MUSIC AUTO-TAGGING USING RAW WAVEFORMS Jongpil Lee Jiyoung Park Keunhyoung Luke Kim Juhan Nam Korea Advanced Institute of Science and Technology (KAIST)

More information

Continuous Gesture Recognition Fact Sheet

Continuous Gesture Recognition Fact Sheet Continuous Gesture Recognition Fact Sheet August 17, 2016 1 Team details Team name: ICT NHCI Team leader name: Xiujuan Chai Team leader address, phone number and email Address: No.6 Kexueyuan South Road

More information

Deep learning architectures for music audio classification: a personal (re)view

Deep learning architectures for music audio classification: a personal (re)view Deep learning architectures for music audio classification: a personal (re)view Jordi Pons jordipons.me @jordiponsdotme Music Technology Group Universitat Pompeu Fabra, Barcelona Acronyms MLP: multi layer

More information

A Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme

A Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme Intelligent Techniques for Web Personalization and Recommendation: Papers from the AAAI 13 Workshop A Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme Geoffray

More information

arxiv: v1 [cs.lg] 2 Jan 2018

arxiv: v1 [cs.lg] 2 Jan 2018 Deep Learning for Identifying Potential Conceptual Shifts for Co-creative Drawing arxiv:1801.00723v1 [cs.lg] 2 Jan 2018 Pegah Karimi pkarimi@uncc.edu Kazjon Grace The University of Sydney Sydney, NSW 2006

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to publication record in Explore Bristol Research PDF-document

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to publication record in Explore Bristol Research PDF-document Hepburn, A., McConville, R., & Santos-Rodriguez, R. (2017). Album cover generation from genre tags. Paper presented at 10th International Workshop on Machine Learning and Music, Barcelona, Spain. Peer

More information

Neural Network Part 4: Recurrent Neural Networks

Neural Network Part 4: Recurrent Neural Networks Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction

Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction Longke Hu Aixin Sun Yong Liu Nanyang Technological University Singapore Outline 1 Introduction 2 Data analysis

More information

Conversational Systems in the Era of Deep Learning and Big Data. Ian Lane Carnegie Mellon University

Conversational Systems in the Era of Deep Learning and Big Data. Ian Lane Carnegie Mellon University Conversational Systems in the Era of Deep Learning and Big Data Ian Lane Carnegie Mellon University End-to-End Trainable Neural Network Models for Task Oriented Dialog Ian Lane Carnegie Mellon University

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Neural Network-Based Abstract Generation for Opinions and Arguments

Neural Network-Based Abstract Generation for Opinions and Arguments Neural Network-Based Abstract Generation for Opinions and Arguments Lu Wang Wang Ling Opinions What do you think? [source: www.cartoonbank.com] Mundane tasks Which movie to watch tonight? Which hotel should

More information

AUTOMATED MUSIC TRACK GENERATION

AUTOMATED MUSIC TRACK GENERATION AUTOMATED MUSIC TRACK GENERATION LOUIS EUGENE Stanford University leugene@stanford.edu GUILLAUME ROSTAING Stanford University rostaing@stanford.edu Abstract: This paper aims at presenting our method to

More information

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850

More information

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Mariam Yiwere 1 and Eun Joo Rhee 2 1 Department of Computer Engineering, Hanbat National University,

More information

Automatic Playlist Generation

Automatic Playlist Generation Automatic Generation Xingting Gong and Xu Chen Stanford University gongx@stanford.edu xchen91@stanford.edu I. Introduction Digital music applications have become an increasingly popular means of listening

More information

Study Impact of Architectural Style and Partial View on Landmark Recognition

Study Impact of Architectural Style and Partial View on Landmark Recognition Study Impact of Architectural Style and Partial View on Landmark Recognition Ying Chen smileyc@stanford.edu 1. Introduction Landmark recognition in image processing is one of the important object recognition

More information

A Kinect-based 3D hand-gesture interface for 3D databases

A Kinect-based 3D hand-gesture interface for 3D databases A Kinect-based 3D hand-gesture interface for 3D databases Abstract. The use of natural interfaces improves significantly aspects related to human-computer interaction and consequently the productivity

More information

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE A Thesis by Andrew J. Zerngast Bachelor of Science, Wichita State University, 2008 Submitted to the Department of Electrical

More information

Convolutional Neural Networks for Small-footprint Keyword Spotting

Convolutional Neural Networks for Small-footprint Keyword Spotting INTERSPEECH 2015 Convolutional Neural Networks for Small-footprint Keyword Spotting Tara N. Sainath, Carolina Parada Google, Inc. New York, NY, U.S.A {tsainath, carolinap}@google.com Abstract We explore

More information

Generating Groove: Predicting Jazz Harmonization

Generating Groove: Predicting Jazz Harmonization Generating Groove: Predicting Jazz Harmonization Nicholas Bien (nbien@stanford.edu) Lincoln Valdez (lincolnv@stanford.edu) December 15, 2017 1 Background We aim to generate an appropriate jazz chord progression

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Dota2 is a very popular video game currently.

Dota2 is a very popular video game currently. Dota2 Outcome Prediction Zhengyao Li 1, Dingyue Cui 2 and Chen Li 3 1 ID: A53210709, Email: zhl380@eng.ucsd.edu 2 ID: A53211051, Email: dicui@eng.ucsd.edu 3 ID: A53218665, Email: lic055@eng.ucsd.edu March

More information

Statistical Tests: More Complicated Discriminants

Statistical Tests: More Complicated Discriminants 03/07/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 14 Statistical Tests: More Complicated Discriminants Road Map When the likelihood discriminant will fail The Multi Layer Perceptron discriminant

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Radio Deep Learning Efforts Showcase Presentation

Radio Deep Learning Efforts Showcase Presentation Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS

CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS Xinglin Zhang Dept. of Computer Science University of Regina Regina, SK CANADA S4S 0A2 zhang46x@cs.uregina.ca David Gerhard Dept. of Computer Science,

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION Journal of Advanced College of Engineering and Management, Vol. 3, 2017 DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION Anil Bhujel 1, Dibakar Raj Pant 2 1 Ministry of Information and

More information

Neural Architectures for Named Entity Recognition

Neural Architectures for Named Entity Recognition Neural Architectures for Named Entity Recognition Presented by Allan June 16, 2017 Slides: http://www.statnlp.org/event/naner.html Some content is taken from the original slides. Named Entity Recognition

More information

Automatic Generation of Social Tags for Music Recommendation

Automatic Generation of Social Tags for Music Recommendation Automatic Generation of Social Tags for Music Recommendation Douglas Eck Sun Labs, Sun Microsystems Burlington, Mass, USA douglas.eck@umontreal.ca Thierry Bertin-Mahieux Sun Labs, Sun Microsystems Burlington,

More information

Artificial Intelligence and Deep Learning

Artificial Intelligence and Deep Learning Artificial Intelligence and Deep Learning Cars are now driving themselves (far from perfectly, though) Speaking to a Bot is No Longer Unusual March 2016: World Go Champion Beaten by Machine AI: The Upcoming

More information

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Qin Huazheng 2014/10/15 Graph-of-word and TW-IDF: New Approach

More information

BeatTheBeat Music-Based Procedural Content Generation In a Mobile Game

BeatTheBeat Music-Based Procedural Content Generation In a Mobile Game September 13, 2012 BeatTheBeat Music-Based Procedural Content Generation In a Mobile Game Annika Jordan, Dimitri Scheftelowitsch, Jan Lahni, Jannic Hartwecker, Matthias Kuchem, Mirko Walter-Huber, Nils

More information

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER 1 SAROJ P. SAHU, 2 RASHMI KEOTE 1 M.tech IVth Sem( Electronics Engg.), 2 Assistant Professor,Yeshwantrao Chavan College of Engineering,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure

More information

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES Jean-Baptiste Rolland Steinberg Media Technologies GmbH jb.rolland@steinberg.de ABSTRACT This paper presents some concepts regarding

More information

Recommender Systems TIETS43 Collaborative Filtering

Recommender Systems TIETS43 Collaborative Filtering + Recommender Systems TIETS43 Collaborative Filtering Fall 2017 Kostas Stefanidis kostas.stefanidis@uta.fi https://coursepages.uta.fi/tiets43/ selection Amazon generates 35% of their sales through recommendations

More information

A New Framework for Supervised Speech Enhancement in the Time Domain

A New Framework for Supervised Speech Enhancement in the Time Domain Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,

More information

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired 1 Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired Bing Li 1, Manjekar Budhai 2, Bowen Xiao 3, Liang Yang 1, Jizhong Xiao 1 1 Department of Electrical Engineering, The City College,

More information

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices Daniele Ravì, Charence Wong, Benny Lo and Guang-Zhong Yang To appear in the proceedings of the IEEE

More information

MODIFIED LASSO SCREENING FOR AUDIO WORD-BASED MUSIC CLASSIFICATION USING LARGE-SCALE DICTIONARY

MODIFIED LASSO SCREENING FOR AUDIO WORD-BASED MUSIC CLASSIFICATION USING LARGE-SCALE DICTIONARY 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MODIFIED LASSO SCREENING FOR AUDIO WORD-BASED MUSIC CLASSIFICATION USING LARGE-SCALE DICTIONARY Ping-Keng Jao, Chin-Chia

More information

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING 2017 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM AUTONOMOUS GROUND SYSTEMS (AGS) TECHNICAL SESSION AUGUST 8-10, 2017 - NOVI, MICHIGAN GESTURE RECOGNITION FOR ROBOTIC CONTROL USING

More information

Tag Propaga)on based on Ar)st Similarity

Tag Propaga)on based on Ar)st Similarity Tag Propaga)on based on Ar)st Similarity Joon Hee Kim Brian Tomasik Douglas Turnbull Swarthmore College ISMIR 2009 Ar)st Annota)on with Tags Ani Difranco Acoustic Instrumentation Folk Rock Feminist Lyrics

More information

Michael Clausen Frank Kurth University of Bonn. Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE

Michael Clausen Frank Kurth University of Bonn. Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE Michael Clausen Frank Kurth University of Bonn Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE 1 Andreas Ribbrock Frank Kurth University of Bonn 2 Introduction Data

More information

DEEP LEARNING FOR MUSIC RECOMMENDATION:

DEEP LEARNING FOR MUSIC RECOMMENDATION: DEEP LEARNING FOR MUSIC RECOMMENDATION: Machine Listening & Collaborative Filtering ORIOL NIETO ONIETO@PANDORA.COM SEMINAR ON MUSIC KNOWLEDGE EXTRACTION USING MACHINE LEARNING POMPEU FABRA UNIVERSITY BARCELONA

More information

ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS

ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS Sebastian Böck, Markus Schedl Department of Computational Perception Johannes Kepler University, Linz Austria sebastian.boeck@jku.at ABSTRACT We

More information

Semantic Segmentation on Resource Constrained Devices

Semantic Segmentation on Resource Constrained Devices Semantic Segmentation on Resource Constrained Devices Sachin Mehta University of Washington, Seattle In collaboration with Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi Project

More information

Auto-tagging The Facebook

Auto-tagging The Facebook Auto-tagging The Facebook Jonathan Michelson and Jorge Ortiz Stanford University 2006 E-mail: JonMich@Stanford.edu, jorge.ortiz@stanford.com Introduction For those not familiar, The Facebook is an extremely

More information

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient

More information

Reproducing Pitch Experiments in Measuring the Evolution of Contemporary Western Popular Music

Reproducing Pitch Experiments in Measuring the Evolution of Contemporary Western Popular Music Reproducing Pitch Experiments in Measuring the Evolution of Contemporary Western Popular Music Colin Raffel, Dan Ellis 1 Introduction The recent work Measuring the Evolution of Contemporary Western Popular

More information

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM DR. D.C. DHUBKARYA AND SONAM DUBEY 2 Email at: sonamdubey2000@gmail.com, Electronic and communication department Bundelkhand

More information

REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK

REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK Thomas Schmitz and Jean-Jacques Embrechts 1 1 Department of Electrical Engineering and Computer Science,

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Creating Intelligence at the Edge

Creating Intelligence at the Edge Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge

More information

HYBRID MUSIC RECOMMENDER USING CONTENT-BASED AND SOCIAL INFORMATION. Paulo Chiliguano, Gyorgy Fazekas

HYBRID MUSIC RECOMMENDER USING CONTENT-BASED AND SOCIAL INFORMATION. Paulo Chiliguano, Gyorgy Fazekas HYBRID MUSIC RECOMMENDER USING CONTENT-BASED AND SOCIAL INFORMATION Paulo Chiliguano, Gyorgy Fazekas Queen Mary, University of London School of Electronic Engineering and Computer Science Mile End Road,

More information

arxiv: v1 [cs.ce] 9 Jan 2018

arxiv: v1 [cs.ce] 9 Jan 2018 Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science

More information

Random Walk with Restart for Automatic Playlist Continuation and Query-Specific Adaptations

Random Walk with Restart for Automatic Playlist Continuation and Query-Specific Adaptations Random Walk with Restart for Automatic Playlist Continuation and Query-Specific Adaptations Master s Thesis Timo van Niedek Radboud University, Nijmegen timo.niedek@science.ru.nl 2018-08-22 First Supervisor

More information

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis Department of Electrical and Computer Engineering,

More information

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia Information Hiding Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 regalia@cua.edu Baltimore IEEE Signal Processing Society Chapter,

More information

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau

More information

Predicting Video Game Popularity With Tweets

Predicting Video Game Popularity With Tweets Predicting Video Game Popularity With Tweets Casey Cabrales (caseycab), Helen Fang (hfang9) December 10,2015 Task Definition Given a set of Twitter tweets from a given day, we want to determine the peak

More information

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1 Machine Learning Practical Part 2: Group Projects MLP Lecture 11 MLP Part 2: Group Projects 1 MLP Part 2: Group Projects Steve Renals Machine Learning Practical MLP Lecture 11 24 January 2018 http://www.inf.ed.ac.uk/teaching/courses/mlp/

More information

Prof. Maria Papadopouli

Prof. Maria Papadopouli Lecture on Positioning Prof. Maria Papadopouli University of Crete ICS-FORTH http://www.ics.forth.gr/mobile 1 Roadmap Location Sensing Overview Location sensing techniques Location sensing properties Survey

More information

A Network-based End-to-End Trainable Task-oriented Dialogue System

A Network-based End-to-End Trainable Task-oriented Dialogue System A Network-based End-to-End Trainable Task-oriented Dialogue System Deep Learning Summer school, 05 Aug 2016 Tsung-Hsien (Shawn) Wen Dialogue Systems Group Outline 2 Intro Neural Dialogue System Wizard-of-Oz

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A. MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES P.S. Lampropoulou, A.S. Lampropoulos and G.A. Tsihrintzis Department of Informatics, University of Piraeus 80 Karaoli & Dimitriou

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Energy Consumption Prediction for Optimum Storage Utilization

Energy Consumption Prediction for Optimum Storage Utilization Energy Consumption Prediction for Optimum Storage Utilization Eric Boucher, Robin Schucker, Jose Ignacio del Villar December 12, 2015 Introduction Continuous access to energy for commercial and industrial

More information

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 1 Olaf Ronneberger, Philipp Fischer, Thomas Brox (Freiburg, Germany) 2 Hyeonwoo Noh, Seunghoon Hong, Bohyung Han (POSTECH,

More information

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time. 2. Physical sound 2.1 What is sound? Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time. Figure 2.1: A 0.56-second audio clip of

More information

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS Hamid Eghbal-Zadeh Bernhard Lehner Matthias Dorfer Gerhard Widmer Department of Computational

More information

SELECTING RELEVANT DATA

SELECTING RELEVANT DATA EXPLORATORY ANALYSIS The data that will be used comes from the reviews_beauty.json.gz file which contains information about beauty products that were bought and reviewed on Amazon.com. Each data point

More information

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013 INTRODUCTION TO DEEP LEARNING Steve Tjoa kiemyang@gmail.com June 2013 Acknowledgements http://ufldl.stanford.edu/wiki/index.php/ UFLDL_Tutorial http://youtu.be/ayzoubkuf3m http://youtu.be/zmnoatzigik 2

More information

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi

CS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi CS 229, Project Progress Report SUNet ID: 06044535 Name: Ajay Shanker Tripathi Title: Voice Transmogrifier: Spoofing My Girlfriend s Voice Project Category: Audio and Music The project idea is an easy-to-state

More information

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016 Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural

More information

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1 Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1 Hidden Unit Transfer Functions Initialising Deep Networks Steve Renals Machine Learning Practical MLP Lecture

More information

Prediction of Cluster System Load Using Artificial Neural Networks

Prediction of Cluster System Load Using Artificial Neural Networks Prediction of Cluster System Load Using Artificial Neural Networks Y.S. Artamonov 1 1 Samara National Research University, 34 Moskovskoe Shosse, 443086, Samara, Russia Abstract Currently, a wide range

More information

ConvNets and Forward Modeling for StarCraft AI

ConvNets and Forward Modeling for StarCraft AI ConvNets and Forward Modeling for StarCraft AI Alex Auvolat September 15, 2016 ConvNets and Forward Modeling for StarCraft AI 1 / 20 Overview ConvNets and Forward Modeling for StarCraft AI 2 / 20 Section

More information

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks Escola Tècnica Superior d Enginyeria Informàtica Universitat Politècnica de València Audio Effects Emulation with Neural Networks Trabajo Fin de Grado Grado en Ingeniería Informática Autor: Omar del Tejo

More information

A simple RNN-plus-highway network for statistical

A simple RNN-plus-highway network for statistical ISSN 1346-5597 NII Technical Report A simple RNN-plus-highway network for statistical parametric speech synthesis Xin Wang, Shinji Takaki, Junichi Yamagishi NII-2017-003E Apr. 2017 A simple RNN-plus-highway

More information