Music Recommendation using Recurrent Neural Networks
|
|
- Imogene Walters
- 5 years ago
- Views:
Transcription
1 Music Recommendation using Recurrent Neural Networks Ashustosh Choudhary * ashutoshchou@cs.umass.edu Mayank Agarwal * mayankagarwa@cs.umass.edu Abstract A large amount of information is contained in the sequence of songs in a particular playlist. Typical collaborative filtering based recommendation approaches do not consider this information for recommendations. This project models the music recommendation problem as a sequence prediction problem and explores the application of Long Short Term Memory (LSTM) networks to the problem. 1 Introduction Music listening activity typically spans over a sequence of songs rather than a single song in isolation. The relative position of each song in the user s playlist has significance, and the user satisfaction can significantly change if the same songs are played but in a different order. Typical collaborative filtering approaches do not model the sequence aspect of music recommendation. Long Short Term Memory network is a Recurrent Neural Network (RNN) architecture that unlike traditional RNNs are capable of learning longterm dependencies in the data. Since the user playing history can span years, LSTMs are apt to model the music recommendation problem as a sequence and also learn the long term dependencies in the user s music preference. This project experiments with the application of LSTM network to music rcommendation. The network is trained on the Million Song Dataset along with Last.fm user playing history dataset. 2 Related Work Liebman et al. in [1], model the muusic playlist recommendation problem as a sequence decision making task. The paper introduces a new Reinforcement Learning based method for music recommendation task, and concludes that modelling music recommendation as a sequence decision making task gives the proposed method a small but significant boost in performance compared to only reason about song preferences. Oord et al. in [2], apply convolutional neural networks to learn latent factors of music audio and use these to predict songs to users. This paper showed that recent advances in deep learning methods along with the suggested approach translates very well to the music recommendation task. Learning from audio data alone, the suggested model performs sensibly on the Million Song Dataset. 3 LSTM LSTMs have been used extensively in language models and text generation tasks. The reason for this reliance on LSTMs for sequence based tasks is the property of LSTMs to effectively propagate long-term dependencies in a sequence of objects. In case of text, these objects are either one-hot encoded vectors or word-embeddings. Embeddings are fixed size vector representation of input datapoint i.e. input feature vector. LSTMs achieve the mentioned effective gradient propagation by making use of memory units called cell states as well as a combination of gates. These gates have an added advantage of learning to give appropriate weights to more prominent or essential parts of the sequence. There are other types of RNNs such as Gated Recurrent Units(GRUs) which are also used for similar tasks, however, unlike LSTMs they have lesser number of gates hence lesser control on gradient flow as well as they do not have memory units and expose the complete memory. Both the gated networks are superior than a simple RNN but there is no conclusive evidence to claim one is more superior than the other.[3]. Given a sequence of inputs X = {x 1, x 2,..., x nx } an LSTM associates each time step with an input gate, a memory gate and an output gate, de-
2 noted by i t,f t and o t. Let e t represent the inputembedding unit i.e. word in a sentence or song in a playlist. If c t and h t represent the cell state vector and produced hidden state at time t, then vector representation h t for each time step is given by: i t f t o t = l t tanh W. [ ht 1 e s t ] (1) c t = f t.c t 1 + i t.l t (2) h s t = o t.tanh(c t ) (3) where W i, W f, W o, W t R K 2K 4 Sequence-to-Sequence Models As the name suggests, sequence-to-sequence(seqto-seq) models are used to generate a sequence with another sequence as its input. Seq-to-seq models have been used extensively in machine translation and dialog systems. Introduced by Cho et. el [4], a basic seq-to-seq model consists of two LSTMs: an encoder that processes the input sequence and a decoder that generates the output sequence. The encoder is used to obtain a vector representation of input sequence. The final hidden state of encoder is this representation which is then fed to the decoder as part of its input at each time step. The decoder, thus at each time step, has the information pertaining to the dependent sequence, current sequence partial information till that time step and the sequence element generated in the previous time step. To increase the capacity of the model, multi-layer cells have been successfully used in seq-to-seq models [5]. If input sequence lengths are really large, terms seen long back in the sequence may be forgotten by a basic seq-to-seq model. This is despite the fact that gated units are able to remember quite long sequences. However, it has been shown that for sequence lengths of more than 30 terms, lead to gradients decaying to zero. Thus no useful information is passed from the terms more than a threshold terms away. To allow decoder a more direct access to the input, an attention mechanism was introduce in [6]. In this mechanism, the model allows decoder to peek into the input at every decoding step. Figure 1: A basic seq-to-seq model with encoder decoder architecture. 5 Proposed model We tried to model a user playlist as a sequence of songs and tried to recommend another sequence i.e. a playlist for the user. This is based on a minimalistic approach wherein no rating data needs to be available for user-song pair in order to recommend next few songs. This approach thus seems to tackle the cold start or data sparsity problem. A sequence-to-sequence model was chosen to recommend a sequence of songs to the user based upon a sequence he had already listened to. A new user s first song is suggested based on the song which is most likely to be listened first by a user. This learning is imparted in the deep RNN network by passing each decoder sequence s start as a special SOS ( start of sequence ) symbol. Each song is represented as an embedding of fixed size 65. This embedding is either learned from the sequence data or is extracted from additional data as explained later in the datasets section. An encoder with LSTM unit is used to encode the sequence of songs as an fixed vector and a decoder is used to output another sequence of songs which in our problem statement corresponds to the recommended playlist. Since the length of playlist listened by a single user, as provided in the dataset, is over a span of years, the use of attention mechanism is an obvious extension to the model as discussed in the future works section. The average length of song playlist is around 900 songs. The equation for the encoder and decoder LSTM remains the same as in (1),(2) and (3). This supervised learning of song-sequence is based upon the premise of considering the next song played being the ground truth of the output for the current time step. Thus for a playlist of songs S = (s 1, s 2,...s N ) and e t being the embedding of t th input song, the x and y vectors are: x = S[0...N 1] = e(s 0 ), e(s 1 ),...e(s N 1 ) (4)
3 y = S[1...N] = e(s 1 ), e(s 2 ),...e(s N ) (5) where each e(s t ) R 1 D. The final encoder state after an entire sequence of x has been processed by the encoder is the required vector representation of the playlist of the user. Now, this vector h maxt R 1 H is concatenated with each input e(s t ) in the decoder to generate or recommend a song appropriate for this time step. This is the recommended or predicted song ŷ. The loss is defined at the decoder final output state and back-propagated through the entire decoderencoder model end-to-end. If embeddings are being learned, then the loss is back-propagated even to the embedding layer. The loss used is cross-entropy on log softmax. The softmax function is given by: (z) j = e z j k k=1 ez j for j = 1, 2,..K (6) Thus softmax provides a probability distribution over all possible songs in the song vocabulary, while the ground truth probability distribution is a one hot encoded vector of the vovabulary size as well. A cross-entropy is calculated between these two probability distribution using formula: pairs. Thus, while the MSD provides us with the song metadata and audio features, the Last.fm dataset provides us with the user s song listening history. 6.1 Data preparation The two datasets used for this project lacked a common key to correlate the two dataset entries. Therefore, the two datasets were loosely correlated using a matching on the artists musicbrainz ID, and the track name. We call this a loose approach because a small number of records remained unmatched in the song database because they didn t have the artist musicbrainz ID, or a minor difference in the track name. 6.2 Training, Test, and Validation splits Playlist data for 992 users was split into three components: 70% as training data, 15% as validation data, and 15% as test data. This split was carried on a user level, i.e., either a user and his entire playing history is tagged as training data, test data, or validation data. Finally, the training data contained 58,913 user-song pairs, the validation data contained 11,349 user-song pairs, and the test data contained 14,536 user-song pairs. H(p, q) = x p(x)logq(x) (7) 7 Features Here, p(x) is the predicted probability distribution while q(x) is the actual probability distribution. 6 Datasets The Million Song Dataset (MSD) [7] is a freelyavailable collection of audio features and metadata for a million contempory popular music tracks. The dataset is available in HDF5 format and contains the track, song, release, and artist information for every track included in the dataset. The complete MSD is 273 Gbs in size and therefore, presents a challenge in storage and also increases the time required for pre-processing of data. Thus, we decided to work with a subset of the MSD containing 10,000 randomly sub-sampled songs from the original million songs. While the Million Song Dataset provides us with the audio features and metadata for songs, we also require the playlist data for multiple users for training. For this purpose, we chose the Last.fm dataset [8] which contains the listening history for 992 unique users, and nearly 19 million user-song The features used to encode the song and the user information is listed in Table 1, and Table 2 respectively. The song encoding is a 56 dimensional vector, while the user encoding is 6 dimensional. Thus, every data sample is 62 dimensional vector. 8 Evaluation metric To evaluate the performance of the music recommender system, we chose the Mean Average Precision (MAP) evaluation metric. MAP is a ranked precision metric that places emphasis on highly ranked correct precisions [9]. Mean Average Precision is defined as: MAP = Q q=1 AP (q) Q where AP(q) is the average precision of each user. The average precision AP(q) is defined as:
4 Feature name artist familiarity artist hotness artist latitude, and artist longitude artist tags total beats danceability energy key loudness mode release total sections section pitches song hotness tempo time signature Year Feature name Gender Age Age Age 2 Latitude Longitude Description 0-1 scale familiarity as determined by the EchoNest API 0-1 scale hotness of the artist as detemined by the EchoNest API Latitude of the country the artist is based in. Top 3 tags associated to the artist Total number of beats in the song 0-1 scale danceability on the songs as determined by the EchoNest API duration : Duration (in seconds) of the song Energy in the song from the listeners point of view The key the song is in Loudness of the song in db Mode the song is in (major/minor) Album name Total number of sections present in the song Pitches of the longest 3 segments in the song 0-1 scale hotness of the song as determined by the EchoNest API Tempo of the song Time signature of the song Year of release Table 1: Song features Description Gender of the user Age of the user Square root of the age of the user Square of the age of the user Latitude of the country the user is in. Longitude of the country the user is in. Table 2: User features AP (n) = n k=1 P (k) m where P(k) is the precision at cutoff point k in the song recommendation list, and m is the number of correctly predicted nodes. 9 Experiments and Results 9.1 Baseline Baseline for the experiments is a simple collaborative filtering technique using Matrix factorization.for the sake of comparison, for the baseline techniques involving Matrix factorization, only user-item pairs with ratings present are considered. A deep model baseline for the experiments is chosen to be a simple LSTM network without the encoding-decoding logic. The LSTM is trained on sequence of songs listened by the user and is trained to start predicting after receiving a special symbol SOS. In one part of the baseline experiments, embeddings for each song is learned from the data and hence has information pertaining to its neighboring songs. In the other part, this embedding is hand-crafted with additional data as mentioned in the dataset section. To obtain recommendations in this case at each time step top-k (top-30) values have been extracted based on their probability values after the softmax layer. 9.2 Experiments Experiments have been performed using different configurations of each of the components of the seq-to-seq model. As done in the baseline, two options for song embeddings have been used: learned from data, extracted from additional data. Also, experiments with number of layers in the LSTM cell is done with number of layers in the set 1, 2, 5. Initial learning rates of AdamOptimizer have been taken in a range of 1e 5, 1e 4, 1e 3. Decay rate is the default Results Results are tabulated in Table Conclusions This project aims to utilize the sequence information present in music playlists to provide better recommendations to the user. Results show that modelling the music recommendation problem as
5 Model Embedding type Number of layers Top K for MAE Learned Extracted 1 layer 2 layers 5 layers top 10 top 20 top 30 Matrix Factorization N.A N.A. N.A. N.A N.A. N.A. Basic LSTM Seq-to-Seq Table 3: Results and comparison with baselines a sequence decision making task through LSTM and a Seq-to-Seq model improves the Mean Average Precision of the recommender system. In comparison with the baseline Matrix Factorization model, which does not take the sequence information in consideration, LSTM and seq-toseq models show better recommendations and better MAP values. This validates our premise that modelling music recommendation as a sequence decision problem improves the quality of recommendations of the system. Kaggle - Mean Average Precision. MeanAveragePrecision References Liebman, et al. Dj-mc: A reinforcement-learning agent for music playlist recommendation. Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Van den Oord, et al. Deep content-based music recommendation. Advances in neural information processing systems Chung, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arxiv: (2014). Cho, Kyunghyun, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv preprint arxiv: (2014). Cho, et al. Sequence to sequence learning with neural networks. Advances in neural information processing systems Bahdanau, et al. Neural machine translation by jointly learning to align and translate. arxiv preprint arxiv: (2014). Thierry Bertin-Mahieux, et al. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011) last.fm - listen to free music and watch videos with the largest music catalogue online. last.fm/
Attention-based Multi-Encoder-Decoder Recurrent Neural Networks
Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier 1, Sigurd Spieckermann 2 and Volker Tresp 1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich, Germany 2- Siemens
More information신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일
신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in
More informationAttentive Neural Architecture Incorporating Song Features For Music Recommendation
Attentive Neural Architecture Incorporating Song Features For Music Recommendation by Noveen Sachdeva, Kartik Gupta, Vikram Pudi in 12th ACM Conference on Recommender Systems (RECSYS-2018) Vancouver, Canada
More informationMillion Song Dataset Challenge!
1 Introduction Million Song Dataset Challenge Fengxuan Niu, Ming Yin, Cathy Tianjiao Zhang Million Song Dataset (MSD) is a freely available collection of data for one million of contemporary songs (http://labrosa.ee.columbia.edu/millionsong/).
More information11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO
Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at
More informationRecurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1
Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1 Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent
More informationRecurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1
Recurrent neural networks Modelling sequential data MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve
More informationAttention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks
Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier1, Sigurd Spieckermann2 and Volker Tresp1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich,
More informationRecurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1
Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve Renals Machine Learning
More informationLesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.
Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result
More informationUsing Deep Learning for Sentiment Analysis and Opinion Mining
Using Deep Learning for Sentiment Analysis and Opinion Mining Gauging opinions is faster and more accurate. Abstract How does a computer analyze sentiment? How does a computer determine if a comment or
More informationDeep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang Introduction Recurrent neural networks Dates back to (Rumelhart et al., 1986) A family of
More informationarxiv: v2 [cs.sd] 22 May 2017
SAMPLE-LEVEL DEEP CONVOLUTIONAL NEURAL NETWORKS FOR MUSIC AUTO-TAGGING USING RAW WAVEFORMS Jongpil Lee Jiyoung Park Keunhyoung Luke Kim Juhan Nam Korea Advanced Institute of Science and Technology (KAIST)
More informationContinuous Gesture Recognition Fact Sheet
Continuous Gesture Recognition Fact Sheet August 17, 2016 1 Team details Team name: ICT NHCI Team leader name: Xiujuan Chai Team leader address, phone number and email Address: No.6 Kexueyuan South Road
More informationDeep learning architectures for music audio classification: a personal (re)view
Deep learning architectures for music audio classification: a personal (re)view Jordi Pons jordipons.me @jordiponsdotme Music Technology Group Universitat Pompeu Fabra, Barcelona Acronyms MLP: multi layer
More informationA Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme
Intelligent Techniques for Web Personalization and Recommendation: Papers from the AAAI 13 Workshop A Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme Geoffray
More informationarxiv: v1 [cs.lg] 2 Jan 2018
Deep Learning for Identifying Potential Conceptual Shifts for Co-creative Drawing arxiv:1801.00723v1 [cs.lg] 2 Jan 2018 Pegah Karimi pkarimi@uncc.edu Kazjon Grace The University of Sydney Sydney, NSW 2006
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to publication record in Explore Bristol Research PDF-document
Hepburn, A., McConville, R., & Santos-Rodriguez, R. (2017). Album cover generation from genre tags. Paper presented at 10th International Workshop on Machine Learning and Music, Barcelona, Spain. Peer
More informationNeural Network Part 4: Recurrent Neural Networks
Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationYour Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction
Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction Longke Hu Aixin Sun Yong Liu Nanyang Technological University Singapore Outline 1 Introduction 2 Data analysis
More informationConversational Systems in the Era of Deep Learning and Big Data. Ian Lane Carnegie Mellon University
Conversational Systems in the Era of Deep Learning and Big Data Ian Lane Carnegie Mellon University End-to-End Trainable Neural Network Models for Task Oriented Dialog Ian Lane Carnegie Mellon University
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationDeep Neural Network Architectures for Modulation Classification
Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu
More informationGenerating an appropriate sound for a video using WaveNet.
Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki
More informationNeural Network-Based Abstract Generation for Opinions and Arguments
Neural Network-Based Abstract Generation for Opinions and Arguments Lu Wang Wang Ling Opinions What do you think? [source: www.cartoonbank.com] Mundane tasks Which movie to watch tonight? Which hotel should
More informationAUTOMATED MUSIC TRACK GENERATION
AUTOMATED MUSIC TRACK GENERATION LOUIS EUGENE Stanford University leugene@stanford.edu GUILLAUME ROSTAING Stanford University rostaing@stanford.edu Abstract: This paper aims at presenting our method to
More informationCROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen
CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850
More informationDistance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks
Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Mariam Yiwere 1 and Eun Joo Rhee 2 1 Department of Computer Engineering, Hanbat National University,
More informationAutomatic Playlist Generation
Automatic Generation Xingting Gong and Xu Chen Stanford University gongx@stanford.edu xchen91@stanford.edu I. Introduction Digital music applications have become an increasingly popular means of listening
More informationStudy Impact of Architectural Style and Partial View on Landmark Recognition
Study Impact of Architectural Style and Partial View on Landmark Recognition Ying Chen smileyc@stanford.edu 1. Introduction Landmark recognition in image processing is one of the important object recognition
More informationA Kinect-based 3D hand-gesture interface for 3D databases
A Kinect-based 3D hand-gesture interface for 3D databases Abstract. The use of natural interfaces improves significantly aspects related to human-computer interaction and consequently the productivity
More informationAN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast
AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE A Thesis by Andrew J. Zerngast Bachelor of Science, Wichita State University, 2008 Submitted to the Department of Electrical
More informationConvolutional Neural Networks for Small-footprint Keyword Spotting
INTERSPEECH 2015 Convolutional Neural Networks for Small-footprint Keyword Spotting Tara N. Sainath, Carolina Parada Google, Inc. New York, NY, U.S.A {tsainath, carolinap}@google.com Abstract We explore
More informationGenerating Groove: Predicting Jazz Harmonization
Generating Groove: Predicting Jazz Harmonization Nicholas Bien (nbien@stanford.edu) Lincoln Valdez (lincolnv@stanford.edu) December 15, 2017 1 Background We aim to generate an appropriate jazz chord progression
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationDota2 is a very popular video game currently.
Dota2 Outcome Prediction Zhengyao Li 1, Dingyue Cui 2 and Chen Li 3 1 ID: A53210709, Email: zhl380@eng.ucsd.edu 2 ID: A53211051, Email: dicui@eng.ucsd.edu 3 ID: A53218665, Email: lic055@eng.ucsd.edu March
More informationStatistical Tests: More Complicated Discriminants
03/07/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 14 Statistical Tests: More Complicated Discriminants Road Map When the likelihood discriminant will fail The Multi Layer Perceptron discriminant
More informationRhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University
Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationRadio Deep Learning Efforts Showcase Presentation
Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationCHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS
CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS Xinglin Zhang Dept. of Computer Science University of Regina Regina, SK CANADA S4S 0A2 zhang46x@cs.uregina.ca David Gerhard Dept. of Computer Science,
More informationPlaying CHIP-8 Games with Reinforcement Learning
Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of
More informationDYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION
Journal of Advanced College of Engineering and Management, Vol. 3, 2017 DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION Anil Bhujel 1, Dibakar Raj Pant 2 1 Ministry of Information and
More informationNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition Presented by Allan June 16, 2017 Slides: http://www.statnlp.org/event/naner.html Some content is taken from the original slides. Named Entity Recognition
More informationAutomatic Generation of Social Tags for Music Recommendation
Automatic Generation of Social Tags for Music Recommendation Douglas Eck Sun Labs, Sun Microsystems Burlington, Mass, USA douglas.eck@umontreal.ca Thierry Bertin-Mahieux Sun Labs, Sun Microsystems Burlington,
More informationArtificial Intelligence and Deep Learning
Artificial Intelligence and Deep Learning Cars are now driving themselves (far from perfectly, though) Speaking to a Bot is No Longer Unusual March 2016: World Go Champion Beaten by Machine AI: The Upcoming
More informationGraph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)
Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Qin Huazheng 2014/10/15 Graph-of-word and TW-IDF: New Approach
More informationBeatTheBeat Music-Based Procedural Content Generation In a Mobile Game
September 13, 2012 BeatTheBeat Music-Based Procedural Content Generation In a Mobile Game Annika Jordan, Dimitri Scheftelowitsch, Jan Lahni, Jannic Hartwecker, Matthias Kuchem, Mirko Walter-Huber, Nils
More informationDESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER
DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER 1 SAROJ P. SAHU, 2 RASHMI KEOTE 1 M.tech IVth Sem( Electronics Engg.), 2 Assistant Professor,Yeshwantrao Chavan College of Engineering,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2
More informationSound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska
Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure
More informationCHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES
CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES Jean-Baptiste Rolland Steinberg Media Technologies GmbH jb.rolland@steinberg.de ABSTRACT This paper presents some concepts regarding
More informationRecommender Systems TIETS43 Collaborative Filtering
+ Recommender Systems TIETS43 Collaborative Filtering Fall 2017 Kostas Stefanidis kostas.stefanidis@uta.fi https://coursepages.uta.fi/tiets43/ selection Amazon generates 35% of their sales through recommendations
More informationA New Framework for Supervised Speech Enhancement in the Time Domain
Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,
More informationMobile Cognitive Indoor Assistive Navigation for the Visually Impaired
1 Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired Bing Li 1, Manjekar Budhai 2, Bowen Xiao 3, Liang Yang 1, Jizhong Xiao 1 1 Department of Electrical Engineering, The City College,
More informationDeep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices
Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices Daniele Ravì, Charence Wong, Benny Lo and Guang-Zhong Yang To appear in the proceedings of the IEEE
More informationMODIFIED LASSO SCREENING FOR AUDIO WORD-BASED MUSIC CLASSIFICATION USING LARGE-SCALE DICTIONARY
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MODIFIED LASSO SCREENING FOR AUDIO WORD-BASED MUSIC CLASSIFICATION USING LARGE-SCALE DICTIONARY Ping-Keng Jao, Chin-Chia
More informationGESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING
2017 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM AUTONOMOUS GROUND SYSTEMS (AGS) TECHNICAL SESSION AUGUST 8-10, 2017 - NOVI, MICHIGAN GESTURE RECOGNITION FOR ROBOTIC CONTROL USING
More informationTag Propaga)on based on Ar)st Similarity
Tag Propaga)on based on Ar)st Similarity Joon Hee Kim Brian Tomasik Douglas Turnbull Swarthmore College ISMIR 2009 Ar)st Annota)on with Tags Ani Difranco Acoustic Instrumentation Folk Rock Feminist Lyrics
More informationMichael Clausen Frank Kurth University of Bonn. Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE
Michael Clausen Frank Kurth University of Bonn Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE 1 Andreas Ribbrock Frank Kurth University of Bonn 2 Introduction Data
More informationDEEP LEARNING FOR MUSIC RECOMMENDATION:
DEEP LEARNING FOR MUSIC RECOMMENDATION: Machine Listening & Collaborative Filtering ORIOL NIETO ONIETO@PANDORA.COM SEMINAR ON MUSIC KNOWLEDGE EXTRACTION USING MACHINE LEARNING POMPEU FABRA UNIVERSITY BARCELONA
More informationENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS
ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS Sebastian Böck, Markus Schedl Department of Computational Perception Johannes Kepler University, Linz Austria sebastian.boeck@jku.at ABSTRACT We
More informationSemantic Segmentation on Resource Constrained Devices
Semantic Segmentation on Resource Constrained Devices Sachin Mehta University of Washington, Seattle In collaboration with Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi Project
More informationAuto-tagging The Facebook
Auto-tagging The Facebook Jonathan Michelson and Jorge Ortiz Stanford University 2006 E-mail: JonMich@Stanford.edu, jorge.ortiz@stanford.com Introduction For those not familiar, The Facebook is an extremely
More informationBEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor
BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient
More informationReproducing Pitch Experiments in Measuring the Evolution of Contemporary Western Popular Music
Reproducing Pitch Experiments in Measuring the Evolution of Contemporary Western Popular Music Colin Raffel, Dan Ellis 1 Introduction The recent work Measuring the Evolution of Contemporary Western Popular
More informationHIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM
HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM DR. D.C. DHUBKARYA AND SONAM DUBEY 2 Email at: sonamdubey2000@gmail.com, Electronic and communication department Bundelkhand
More informationREAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK
REAL TIME EMULATION OF PARAMETRIC GUITAR TUBE AMPLIFIER WITH LONG SHORT TERM MEMORY NEURAL NETWORK Thomas Schmitz and Jean-Jacques Embrechts 1 1 Department of Electrical Engineering and Computer Science,
More informationCS 229 Final Project: Using Reinforcement Learning to Play Othello
CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.
More informationCreating Intelligence at the Edge
Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge
More informationHYBRID MUSIC RECOMMENDER USING CONTENT-BASED AND SOCIAL INFORMATION. Paulo Chiliguano, Gyorgy Fazekas
HYBRID MUSIC RECOMMENDER USING CONTENT-BASED AND SOCIAL INFORMATION Paulo Chiliguano, Gyorgy Fazekas Queen Mary, University of London School of Electronic Engineering and Computer Science Mile End Road,
More informationarxiv: v1 [cs.ce] 9 Jan 2018
Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science
More informationRandom Walk with Restart for Automatic Playlist Continuation and Query-Specific Adaptations
Random Walk with Restart for Automatic Playlist Continuation and Query-Specific Adaptations Master s Thesis Timo van Niedek Radboud University, Nijmegen timo.niedek@science.ru.nl 2018-08-22 First Supervisor
More informationSINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis Department of Electrical and Computer Engineering,
More informationBackground Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia
Information Hiding Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 regalia@cua.edu Baltimore IEEE Signal Processing Society Chapter,
More informationVocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA
Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau
More informationPredicting Video Game Popularity With Tweets
Predicting Video Game Popularity With Tweets Casey Cabrales (caseycab), Helen Fang (hfang9) December 10,2015 Task Definition Given a set of Twitter tweets from a given day, we want to determine the peak
More informationMachine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1
Machine Learning Practical Part 2: Group Projects MLP Lecture 11 MLP Part 2: Group Projects 1 MLP Part 2: Group Projects Steve Renals Machine Learning Practical MLP Lecture 11 24 January 2018 http://www.inf.ed.ac.uk/teaching/courses/mlp/
More informationProf. Maria Papadopouli
Lecture on Positioning Prof. Maria Papadopouli University of Crete ICS-FORTH http://www.ics.forth.gr/mobile 1 Roadmap Location Sensing Overview Location sensing techniques Location sensing properties Survey
More informationA Network-based End-to-End Trainable Task-oriented Dialogue System
A Network-based End-to-End Trainable Task-oriented Dialogue System Deep Learning Summer school, 05 Aug 2016 Tsung-Hsien (Shawn) Wen Dialogue Systems Group Outline 2 Intro Neural Dialogue System Wizard-of-Oz
More informationGoogle DeepMind s AlphaGo vs. world Go champion Lee Sedol
Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides
More informationMUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.
MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES P.S. Lampropoulou, A.S. Lampropoulos and G.A. Tsihrintzis Department of Informatics, University of Piraeus 80 Karaoli & Dimitriou
More informationReinforcement Learning Agent for Scrolling Shooter Game
Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent
More informationSIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS
SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,
More informationTiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems
Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling
More informationEnergy Consumption Prediction for Optimum Storage Utilization
Energy Consumption Prediction for Optimum Storage Utilization Eric Boucher, Robin Schucker, Jose Ignacio del Villar December 12, 2015 Introduction Continuous access to energy for commercial and industrial
More informationUNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik
UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,
More informationConvolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3
Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 1 Olaf Ronneberger, Philipp Fischer, Thomas Brox (Freiburg, Germany) 2 Hyeonwoo Noh, Seunghoon Hong, Bohyung Han (POSTECH,
More informationSound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.
2. Physical sound 2.1 What is sound? Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time. Figure 2.1: A 0.56-second audio clip of
More informationCP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS Hamid Eghbal-Zadeh Bernhard Lehner Matthias Dorfer Gerhard Widmer Department of Computational
More informationSELECTING RELEVANT DATA
EXPLORATORY ANALYSIS The data that will be used comes from the reviews_beauty.json.gz file which contains information about beauty products that were bought and reviewed on Amazon.com. Each data point
More informationINTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013
INTRODUCTION TO DEEP LEARNING Steve Tjoa kiemyang@gmail.com June 2013 Acknowledgements http://ufldl.stanford.edu/wiki/index.php/ UFLDL_Tutorial http://youtu.be/ayzoubkuf3m http://youtu.be/zmnoatzigik 2
More informationCS 229, Project Progress Report SUNet ID: Name: Ajay Shanker Tripathi
CS 229, Project Progress Report SUNet ID: 06044535 Name: Ajay Shanker Tripathi Title: Voice Transmogrifier: Spoofing My Girlfriend s Voice Project Category: Audio and Music The project idea is an easy-to-state
More informationArtificial Neural Networks. Artificial Intelligence Santa Clara, 2016
Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural
More informationAre there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1
Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1 Hidden Unit Transfer Functions Initialising Deep Networks Steve Renals Machine Learning Practical MLP Lecture
More informationPrediction of Cluster System Load Using Artificial Neural Networks
Prediction of Cluster System Load Using Artificial Neural Networks Y.S. Artamonov 1 1 Samara National Research University, 34 Moskovskoe Shosse, 443086, Samara, Russia Abstract Currently, a wide range
More informationConvNets and Forward Modeling for StarCraft AI
ConvNets and Forward Modeling for StarCraft AI Alex Auvolat September 15, 2016 ConvNets and Forward Modeling for StarCraft AI 1 / 20 Overview ConvNets and Forward Modeling for StarCraft AI 2 / 20 Section
More informationAudio Effects Emulation with Neural Networks
Escola Tècnica Superior d Enginyeria Informàtica Universitat Politècnica de València Audio Effects Emulation with Neural Networks Trabajo Fin de Grado Grado en Ingeniería Informática Autor: Omar del Tejo
More informationA simple RNN-plus-highway network for statistical
ISSN 1346-5597 NII Technical Report A simple RNN-plus-highway network for statistical parametric speech synthesis Xin Wang, Shinji Takaki, Junichi Yamagishi NII-2017-003E Apr. 2017 A simple RNN-plus-highway
More information