Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Similar documents
Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Advanced Music Content Analysis

Change Point Determination in Audio Data Using Auditory Features

UNSUPERVISED SPEAKER CHANGE DETECTION FOR BROADCAST NEWS SEGMENTATION

Advanced audio analysis. Martin Gasser

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Timbral Distortion in Inverse FFT Synthesis

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Implementing Speaker Recognition

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Isolated Digit Recognition Using MFCC AND DTW

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Audio Fingerprinting using Fractional Fourier Transform

Speech Recognition on Robot Controller

Campus Location Recognition using Audio Signals

PLAYLIST GENERATION USING START AND END SONGS

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

Speech and Music Discrimination based on Signal Modulation Spectrum.

Applications of Music Processing

An Improved Voice Activity Detection Based on Deep Belief Networks

Detection of Compound Structures in Very High Spatial Resolution Images

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Introduction of Audio and Music

Rhythm Analysis in Music

Feature Diversity for Optimized Human Micro-Doppler Classification Using Multistatic Radar

SOUND SOURCE RECOGNITION AND MODELING

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

Speech Synthesis; Pitch Detection and Vocoders

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Automatic classification of traffic noise

Discriminative Training for Automatic Speech Recognition

Rhythm Analysis in Music

k-means Clustering David S. Rosenberg December 15, 2017 Bloomberg ML EDU David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, / 18

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Music Genre Classification using Improved Artificial Neural Network with Fixed Size Momentum

Calibration of Microphone Arrays for Improved Speech Recognition

Speech Coding in the Frequency Domain

Speech Synthesis using Mel-Cepstral Coefficient Feature

Drum Transcription Based on Independent Subspace Analysis

Speech Signal Analysis

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

An Approach to Very Low Bit Rate Speech Coding

DESIGN & DEVELOPMENT OF COLOR MATCHING ALGORITHM FOR IMAGE RETRIEVAL USING HISTOGRAM AND SEGMENTATION TECHNIQUES

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Mikko Myllymäki and Tuomas Virtanen

SpeakerID - Voice Activity Detection

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Speech/Music Change Point Detection using Sonogram and AANN

Real-time beat estimation using feature extraction

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Design and Implementation of an Audio Classification System Based on SVM

Auditory Based Feature Vectors for Speech Recognition Systems

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

VECTOR QUANTIZATION-BASED SPEECH RECOGNITION SYSTEM FOR HOME APPLIANCES

Survey Paper on Music Beat Tracking

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A multi-class method for detecting audio events in news broadcasts

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Audio processing methods on marine mammal vocalizations

T Automatic Speech Recognition: From Theory to Practice

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Indoor Location Detection

DERIVATION OF TRAPS IN AUDITORY DOMAIN

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Real time speaker recognition from Internet radio

Pitch Estimation of Singing Voice From Monaural Popular Music Recordings

A DEVICE FOR AUTOMATIC SPEECH RECOGNITION*

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Overview of Code Excited Linear Predictive Coder

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification.

Gammatone Cepstral Coefficient for Speaker Identification

Cepstrum alanysis of speech signals

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

UNIVERSITY OF UTAH ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS

BIOLOGICAL HEARING MODELS: Li Liu

An Automatic Audio Segmentation System for Radio Newscast. Final Project

Auditory Context Awareness via Wearable Computing

ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

Long Range Acoustic Classification

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

m+p Analyzer Revision 5.2

Transcription:

Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23

Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal Analysis (2001) Audio Similarity p.2/23

Motivation MP3s, music on the Internet Large collections of songs How to search? Digital music libraries Commercial applications Audio Similarity p.3/23

MFCCs Mel-frequency cepstral coefficients Popular in speech analysis community Feature vector characterizing one frame of audio Gives the spectral envelope for the frame Emphasizes perceptual aspects: mel frequency scale, logarithmic amplitude Audio Similarity p.4/23

Computing MFCCs Audio Similarity p.5/23

Cepstrum From http://www.cs.biu.ac.il/ aronowc/speech/features.pdf Audio Similarity p.6/23

Mel Frequency Audio Similarity p.7/23

Mel Spectra Audio Similarity p.8/23

Foote Content-Based Retrieval of Music and Audio Assess acoustic similarity of audio segments, use this to search database System trained by human input Uses vector quantization Extract feature vectors, quantize, generate template Compute distance between templates Audio Similarity p.9/23

Foote: Procedure Audio Similarity p.10/23

Foote: Training Give a set of labeled examples to the system These labels drive tree-based quantization Training deemphasizes irrelevant information Audio Similarity p.11/23

Foote: Tree-Based Quantizer Feature space partitioned into cells Cells have maximally different class populations Recursively split space along each dimension Maximize mutual information probability that the different cells contain different classes Audio Similarity p.12/23

Foote: Tree-Based Quantizer Audio Similarity p.13/23

Foote: Comparing Templates Make a template (histogram) based on the frequency of each cell Similar templates will be close to each other Define distance: Euclidean distance, cosine distance Search: compute distance to audio samples in database, sort Audio Similarity p.14/23

Foote: Comparing Templates Audio Similarity p.15/23

Foote: Performance Audio Similarity p.16/23

Logan Music Similarity Automatically determine music similarity Builds on work of Foote. Differences: Histogram bins local to each song Uses Earth Mover s Distance Audio Similarity p.17/23

Logan: Procedure Compute signature based on spectral features Generate MFCCs Cluster using K-means technique Set of clusters (mean, covariance, weight) is song s signature NB: clustering is local to each song Compare signatures using EMD Audio Similarity p.18/23

Logan: K-means Clustering Randomly assign MFCCs to K clusers For each point Calculate distance to the centroid of each cluster Move it to the closest cluster Sum of distances smaller at each step Stop when no other moves required Clusters non-hierarchical, non-overlapping Every member closest to its own cluster Audio Similarity p.19/23

Logan: Earth Mover s Distance Calculates the minimum amout of work required to transform one signature into the other Cluster p i expressed as (µ pi, Σ pi, w pi ) Uses distance d pi q j (Kullback Leibler), flow f pi q j between clusters Solve for flow subject to constraints Minimize W = m i=1 n j=1 d p i q j f pi q j m n i=1 j=1 EMD(P, Q) = d p i q j f pi q j m n i=1 j=1 f p i q j Audio Similarity p.20/23

Logan: Performance Audio Similarity p.21/23

Logan: Performance Audio Similarity p.22/23

Further Reading Logan Mel Frequency Cepstral Coefficients for Music Modeling (2000) Logan Toward Evaluation Techniques for Music Similarity (2003) Liu, Huang Content-Based Indexing and Retrieval-By-Example in Audio (2000) Audio Similarity p.23/23