(12) Patent Application Publication (10) Pub. No.: US 2004/ A1

Size: px
Start display at page:

Download "(12) Patent Application Publication (10) Pub. No.: US 2004/ A1"

Transcription

1 US A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2004/ A1 Li et al. (43) Pub. Date: (54) MUSIC FEATURE EXTRACTION USING Related U.S. Application Data WAVELET COEFFICIENT HISTOGRAMS (60) Provisional application No. 60/447,312, filed on Feb. 14, (76) Inventors: Tao Li, Rochester, NY (US); Qi Li, Newark, DE (US); Mitsunori Ogihara, Publication Classification Pittsford, NY (US) (51) Int. Cl.... G10H 1/36; G10H 7/00 Correspondence Address: (52) U.S. Cl /634 BLANK ROME LLP 600 NEW HAMPSHIRE AVENUE, N.W. (57) ABSTRACT WASHINGTON, DC (US) A music classification technique computes histograms of (21) Appl. No.: 10/777,222 Daubechies wavelet coefficients at various frequency Sub bands with various resolutions. The coefficients are then used as an input to a machine learning technique to identify (22) Filed: Feb. 13, 2004 the genre and emotional content of music. Receive music signal to be classified (2 Receive training set ) Wavelet decomposition 0 4- Wavelet decomposition Histogram of each d - Histogram of each subband subband First three moments of all ) O % First three moments of all histograms histograms Subband energy for each a 0 Subband energy for each subband subband Timbral features ) 2-1 Timbral features Form feature set - ) Af Form feature set S Form classifier 2-2 Classify music sample with classifier

2 Patent Application Publication Sheet 1 of 5 US 2004/ A1 Figure 1 Receive music signal to be -1 classified 1 & 2 ( ) Receive training set Wavelet decomposition 0 4- Wavelet decomposition Histogram of each 0 (, - Histogram of each Subband Subband First three moments of all ) 0 Q First three moments of all histograms histograms Subband energy for each 0 Subband energy for each Subband Subband Timbral features ) l 2- - Timbral features Fom feature set - { I S Form classifier 2-2 Classify music sample with classifier

3 Patent Application Publication Sheet 2 of 5 US 2004/ A1 Figure 2 Classical -a is 60 O 20 Jazz OA O D-D 20 Metal O O 20 Pop OOO - O O 20 Reggae 40 60

4 Patent Application Publication Sheet 3 of 5 US 2004/ A1 Figure Classical No Classical No.2 26lassical No.50

5 Patent Application Publication Sheet 4 of 5 US 2004/ A1 Figure 4 Blues No. 1 Blues No.2

6 Patent Application Publication Sheet 5 of 5 US 2004/ A1 Figure 5 A SAS

7 MUSIC FEATURE EXTRACTION USING WAVELET COEFFICIENT HISTOGRAMS REFERENCE TO RELATED APPLICATION The present application claims the benefit of U.S. Provisional Patent Application No. 60/447,312, filed Feb. 14, 2003, whose disclosure is hereby incorporated by ref erence in its entirety into the present disclosure. STATEMENT OF GOVERNMENT INTEREST 0002 The work leading to the present invention was supported in part by NSF grants EIA , and DUE and by NIH grants RO1-AG18231 ( ) and P30-AG The government has certain rights in the invention. FIELD OF THE INVENTION The present invention is directed to the automated classification of music by genre and more particularly to Such automated classification by use of a wavelet transform. DESCRIPTION OF RELATED ART 0004 Music is used not only for entertainment and for pleasure, but also for a wide range of purposes due to its Social and physiological effects. At the beginning of the 21st century, the World is facing ever-increasing growth of on line music information, empowered by the permeation of the Internet into daily life. Efficient and accurate automatic music information processing (accessing and retrieval, in particular) will be an extremely important issue, and it has been enjoying a growing amount of attention Music can be classified based on its style and the Styles have a hierarchical Structure. A currently popular topic in automatic music information retrieval is the problem of organizing, categorizing, and describing music contents on the web. Such endeavor can be found in on-line music databases Such as mp3.com and Napster. One important aspect of the genre Structures in these on-line databases is that the genre is specified by human experts as well as amateurs (such as the user) and that the labeling process is time-consuming and expensive Currently, music genre classification is done mainly by hand because giving a precise definition of a music genre is extremely difficult and, in addition, many music Sounds Sit on boundaries between genres. These difficulties are due to the fact that music is an art that evolves, where performers and composers have been influ enced by music in other genres However, it has been observed that audio signals (digital or analog) of music belonging to the same genre share certain characteristics, because they are composed of Similar types of instruments, having Similar rhythmic pat terns, and similar pitch distributions 7 (numbers in brack ets refer to publications listed at the end of this section). This Suggests the feasibility of automatic musical genre classifi cation. Automatic music genre classification is a fundamen tal component of music information retrieval Systems. The process of genre categorization in music is divided into two Steps: feature extraction and multi-class classification. In the feature extraction Step, information is extracted from the music Signals representing the music. The features extract should be comprehensive (representing the music very well), compact (requiring a small amount of Storage), and effective (not requiring much computation for extraction). To meet the first requirement the design has to be made So that the both low-level and high-level information of the music is included. In the Second step, a mechanism (an algorithm and/or a mathematical model) is built for identi fying the labels from the representation of the music Sounds with respect to their features There has been a considerable amount of work in extracting features for Speech recognition and music-speech discrimination, but much less work has been reported on the development of descriptive features Specifically for music Signals. Currently the most influential approach to direct modeling of music Signals for automatic genre classification is due to Tsanetakis and Cook 29, where the timbral texture, rhythm, and pitch content features are explicitly developed. The accuracy of classification based on these features, however, is only 61% is achieved on their ten-genre Sound dataset. This raises the question of whether there are different features that are more useful in music classification and whether the use of Statistical or machine learning techniques (e.g. discriminant analysis and Support vector machines) can improve the accuracy Many different features can be used for music classification, e.g. reference features including title and composer, content-based acoustic features including tonal ity, pitch, tempo, and beat, symbolic features extracted from the Scores, and text-based features extracted from the Song lyrics. The content-based acoustic features are classified into timbral texture features, rhythmic content features, and pitch content features 29. Timbral features are mostly originated from traditional Speech recognition techniques. They are usually calculated for every short-time frame of Sound based on the Short Time Fourier Transform (STFT) 22). Typical timbral features include Spectral Centroid, Spectral Roloff, Spectral Flux, Energy, Zero Crossings, Linear Prediction Coefficients, and Mel-Frequency Cepstral Coefficients (MFCCs) (see 22 for more detail). Among these timbral features, MFCCs have been dominantly used in speech recognition. Logan 18 examines MFCCs for music mod eling and music/speech discrimination. Rhythmic content features contains information about the regularity of the rhythm, the beat and tempo information. Tempo and beat tracking from acoustic musical Signals has been explored in 13, 15, 24. Foote and Uchihashi 10 use the beat spectrum to represent rhythm. Pitch content features deals with the frequency information of the music bands and are obtained using various pitch detection techniques Much less work has been reported on music genre classification. Tzanetakis and Cook proposes a comprehen Sive set of features for direct modeling of music signals and explores the use of those features for musical genre classi fication using K-Nearest Neighbors and Gaussian Mixture models. Lambrou et al. 14) uses statistical features in the temporal domain as well as three different wavelet transform domains to classify music into rock, piano and jazz. Desh pande et al. 5 uses Gaussian Mixtures, Support Vector Machines and Nearest Neighbors to classify the music into rock, piano, and jazz based on timbral features. The problem of discriminate music and Speech has been investigated by Saunders 23), Scheier and Slaney 25). Zhang and Kuo 32 propose a heuristic rule-based System to Segment and clas

8 Sify audio signals from movies or TV programs. In 31 audio contents are divided into instrument Sounds, Speech Sounds, and environment Sounds using automatically extracted features. Foote 9 constructs a learning tree vector quantizer using twelve MFCCS plus energy as audio features for retrieval. Li and Khokhar 16 propose nearest feature line methods for content based classification audio retrieval. Pye 21 investigates the use of Gaussian Mixture Modeling (GMM) and Tree-Based Vector Quantization in music genre classification. Soltau et al. 26 propose an approach of representing temporal Structures of input signal. They show that this new set of abstract features can be learned via artificial neural networks and can be used for music genre identification The four types of content features mentioned above will now be described in detail Timbral textual features are used to differentiate mixtures of Sounds that are possibly with the same or similar rhythmic and pitch contents. The use of these features originates from Speech recognition. To extract timbral fea tures, the Sound Signals are first divided into frames that are Statistically Stationary, usually by applying a windowing function at fixed intervals. The window function, typically a Hamming window, removes edge effects. Timbral textural features are then computed for each frame and the Statistical values (Such as the mean and the variance) of those features are calculated Mel-Frequency Cesptral Coefficients (MFCCs) are designed to capture short-term spec tral-based features. After taking the logarithm of the amplitude spectrum based on STFT for each frame, the frequency bins are grouped and Smoothed according to Mel-frequency Scaling, which is designed to agree with perception. MFCCs are gen erated by decorrelating the MeI-spectral vectors using discrete cosine transform. 0014) Spectral Centroid is the centroid of the mag nitude spectrum of STFT and is a measure of spectral brightness Spectral Roloff is the frequency below which 85% of the magnitude distribution is concentrated. It measures the Spectral shape Spectral Flux is the squared difference between the normalized magnitudes of Successive Spectral distributions. It measures the amount of local Spectral change Zero Crossings is the number of time domain Zero crossings of the Signal. It measures noisiness of the Signal Low Energy is the percentage of frames that have energy less than the average energy over the whole signal. It measures amplitude distribution of the Signal Rhythmic content features characterize the move ment of music Signals over time and contain Such informa tion as the regularity of the rhythm, the beat, the tempo, and the time signature. The feature Set for representing the rhythm Structure is based on detecting the most Salient periodicities of the Signal and it is usually extracted from beat histogram. To construct the beat histogram, the time domain amplitude envelope of each band is first extracted by decomposing the music Signal into a number of octave frequency bands. Then, the envelopes of each band are Summed together followed by the computation of the auto correlation of the resulting Sum envelope. The dominant peaks of the autocorrelation function, corresponding to the various periodicities of the Signal's envelope, are accumu lated over the whole sound file into a beat histogram where each bin corresponds to the peak lag. The rhythmic content features are then extracted from the beat histogram, and generally they contain relative amplitude of the first and Second histogram peaks, the ratio of the amplitude of the Second peak divided by the amplitude of the first peak, the periods of the first and Second peaks, and the overall Sum of the histogram The pitch content features describe the melody and harmony information about music Signals and are extracted based on various pitch detection techniques. Basically, the dominant peaks of the autocorrelation function, calculated via the Summation of envelopes for each frequency band obtained by decomposing the Signal, are accumulated into pitch histograms, and the pitch content features are then extracted from the pitch histograms. The pitch content features typically include: the amplitudes and periods of maximum peaks in the histogram, the pitch intervals between the two most prominent peaks, and the Overall Sums of the histograms It is not difficult to see that the traditional feature extraction described above more or less capture incomplete information of music Signals. Timbral textural features are Standard features used in Speech recognition and are calcu lated for every short-time frame of Sound while rhythmic and pitch content features are computed over the whole file. In other words, timbral features capture the Statistics of local information of music Signals from a global perspective, but do not adequately represent the global information of the music. Moreover, as indicated by our experiments to be described below, the rhythm and pitch content features do not seem to capture enough information content for classi fication purposes Prior art related to music classification includes the following: E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proc. 17th International Conf. On Machine Learning, pages Morgan Kaufmann, San Francisco, Calif., C-C. Chang and C.-J. Lin. LIBSVM: a library for Support vector machines, Software available at , 3 I. Daubechies. Ten lectures on wavelets. SIAM, Philadelphia, (4) A. David and S. Panchanathan. Wavelet histogram method for face recognition. Journal of Electronic Imaging, 9(2): , H. Deshpande, R. Singh, and U. Nam. Clas Sification of music Signals in the visual domain. In Proceedings of the COST-G6 Conference on Digital Audio Effects, T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting out put codes. Journal of Artificial Intelligence Research, 2: , 1995.

9 W. J. Dowling and D. L. Harwood. Music Cognition. Academic Press, Inc, ) 8 P. Flandrin. Wavelet analysis and synthesis of fractional Brownian motion. IEEE Transactions On Information Theory, 38(2): , J. Foote. Content-based retrieval of music and audio. In Multimedia Storage and Archiving Sys tems II, Proceedings of SPIE, pages , ) 10 J. Foote and S. Uchihashi. The beat spec trum: a new approach to rhythm analysis. In IEEE International Conference on Multimedia & Expo 2001, K. Fukunaga. 9. Introduction to Statistical pattern recognition. Academic Press, New York, 2nd edition, G. Fung and O. L. Mangasarian. Multicat egory proximal Support vector machine classifiers. Technical Report 01-06, University of Wisconsin at Madison, ) 13 M. Goto and Y. Muraoka. A beat tracking system for acoustic signals of music. In ACM Multi media, pages , ) 14 T. Lambrou, P. Kudumakis, R. Speller, M. Sandler, and A. Linney. Classification of audio signals using Statistical features on time and wavelet transform domains. In Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP-98), volume 6, pages , ) 15 J. Laroche. Estimating tempo, Swing and beat locations in audio recordings. In Workshop On Applications of Signal Processing to Audio and Acous tics (WASPAAO1), ) 16 G. Li and A. A. Khokhar. Content-based indexing and retrieval of audio data using wavelets. In IEEE International Conference on Multimedia and Expo (II), pages , ) 17 T. Li, Q. Li, S. Zhu, and M. Ogihara. A Survey on wavelet applications in data mining. SIGKDD Explorations, 4(2):49-68, B. Logan. MeI frequency cepstral coeffi cients for music modeling. In Proc. Int. Symposium On Music Information Retrieval ISMIR, ) 19 T. M. Mitchell. Machine Learning. The McGraw-Hill Companies, Inc., D. Perrot and R. R. Gjerdigen. Scanning the dial: an exploration of factors in the identification of musical style. In Proceedings of the 1999 Society for Music Perception and Cognition, page 88, D. Pye. Content-based methods for manag ing electronic music. In Proceedings of the 2000 IEEE International Conference on Acoustic Speech and Sig nal Processing, ) 22 L. Rabiner and B. Juang. Fundamentals of Specch Recognition. Prentice-Hall, N.J., J. Saunders. Real-time discrimination of broadcast speech/music. In Proc. ICASSP 96, pages , E. Scheirer. Tempo and beat analysis of acoustic musical Signals. Journal Of the Acoustical Society of America, 103(1), E. Scheirer and M. Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. In Proc. ICASSP'97, pages , Munich, Germany, H. Soltau, T. Schultz, and M. Westphal. Recognition of music types. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, M. Swain and D. Ballard. Color indexing. Int. J. computer vision, 7:11-32, ) 28 G. Tzanetakis and P. Cook. Marsyas: A framework for audio analysis. Organized Sound, 4(3): , ) 29 G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions On Speech and Audio Processing, 10(5), July V. N. Vapnik. Statistical Learning Theory. Wiley, New York, E. Wold, T. Blum, D. Keislar, and J. Wheaton. Content-based classification, Search and retrieval of audio. IEEE Multimedia, 3(2):27-36, T. Zhang and C.-C. J. Kuo. Audio content analysis for online audiovisual data Segmentation and classification. IEEE Transactions on Speech and Audio Processing, 3(4), SUMMARY OF THE INVENTION It will be readily apparent from the above that a need exists in the art to address the above concerns. To achieve the above and other objects, the present invention is directed to a feature extraction technique for music genre classification based on a wavelet histogram to capture local and information of music Signals simultaneously The wavelet transform is a synthesis of ideas emerging over many years from different fields Such as mathematics and image and Signal processing and has been widely used in information retrieval and data mining. A complete Survey on wavelet application in data mining can be found in 17). Generally speaking, the wavelet transform, providing good time and frequency resolution, is a tool that divides up data, functions, or operators into different fre quency components and then Studies each component with a resolution matched to its Scale 3). Straightforwardly, a wavelet coefficients histogram is the histogram of the (rounded) wavelet coefficients obtained by convolving a wavelet filter with an input music signal (details on histo gram and wavelet filter/analysis can be found in 27, 3 respectively) Several favorable properties of wavelets, such as compact Support, Vanishing moments and decorrelated coef ficients, make them useful tools for Signal representation and transformation. Generally speaking, wavelets are designed to give good time resolution at high frequencies and good frequency resolution at low frequencies. Compact Support guarantees the localization of wavelets, Vanishing moment property allows wavelet focusing on the most important information and the discarding of noisy Signals, and deco rrelated coefficients property enables the wavelet to reduce temporal correlation So that the correlation of wavelet coef

10 ficients is much Smaller than that of the corresponding temporal process8). Hence, after the wavelet transform, the complex Signal in the time domain can be reduced into a much simpler process in the wavelet domain. Computing the histograms of wavelet coefficients allows a good estimation of the probability distribution over time. The good probabil ity estimation thus leads to a good feature representation. 0058) A preferred embodiment, called DWCHs (Daubechies wavelet coefficient histograms) to be disclosed below, capture the local and global information of music Signals simultaneously by computing histograms on their Daubechies wavelet coefficients. The effectiveness of this new feature and of previously Studied features are compared using various machine learning classification algorithms, including Support Vector Machines and Linear Discriminant Analysis. It is demonstrated that the use of DWCHs signifi cantly improves the accuracy of music genre classification. AS will be explained below, wavelet techniques other than Daubechies wavelet coefficients can be used as an alterna tive DWCHs represent music signals by computing histograms on Daubechies wavelet coefficients at various frequency bands at different resolutions, and it has signifi cantly improved the accuracy of music genre classification The following publications of the inventors are hereby incorporated by reference in their entireties into the present disclosure: 0061 A Comparative Study of Content-Based Music Genre Classification, by Tao Li, Mitsunori Ogihara, and Qi Li, Proceedings of Annual ACM Conference on Research and Development in Information Retrieval, Jul. 28-Aug. 1, 2003 (SIGIR 2003), Pages Content-Based Music Similarity Search and Emo tion Detection, by Tao Li and Mitsunori Ogihara, to appear in Proceedings of The 2004 IEEE International Conference On Acoustics, Speech, and Signal Processing (ICASSP 2004). 0063) Detecting Emotion in Music by Tao Li and Mitsunori Ogihara, Proceedings of the fourth international conference on music information retrieval (ISMIR 2003). BRIEF DESCRIPTION OF THE DRAWINGS A preferred embodiment of the present invention will be disclosed in detail below with reference to the drawings, in which: 0065 FIG. 1 is a flow chart showing the operations of the preferred embodiment; FIG. 2 is a plot of the feature sets of ten different categories of music, 0067 FIG. 3 is a plot of the feature sets often pieces of classical music, 0068 FIG. 4 is a plot of the feature sets of ten pieces of blues music, and 0069 FIG. 5 is a schematic diagram of a system on which the preferred embodiment can be implemented. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 0070 A preferred embodiment of the present invention and variations thereon will be set forth in detail with reference to the drawings, in which like reference numerals refer to like elements or Steps throughout A sound file is a kind of oscillation waveform in the time domain and can be considered as a two-dimensional entity of the amplitude over time, in the form of M(t)=D(A, t), where A is the amplitude and generally ranges from -1, 1. The distinguishing characteristics are contained in the amplitude variation, and in consequence, identifying the amplitude variation would be essential for music categori Zation On one hand, the histogram technique is an effi cient technique for the distribution estimation. However, the raw Signal in time domain is not a good representation, particularly for content-based categorization, Since the most distinguished characteristics are hidden in frequency domain. On the other hand, the Sound frequency spectrum is generally divided into octaves, with each having a unique quality. An octave is the interval between any frequencies that have a tonal ratio of 2 to 1, a logarithmic-relation in frequency band. The wavelet decomposition Scheme matches the models of Sound octave division for perceptual Scales and provides good time and frequency resolution 16. In other words, the decomposition of an audio signal using wavelets produces a Set of Subband Signals at different frequencies corresponding to different characteristics. This motivates the use of wavelet histogram techniques for feature extraction. The wavelet coefficients are distributed in various frequency bands at different resolutions The process according to the preferred embodi ment will be explained with reference to the flow chart of FIG. 1. In step 102, a file or other electronic signal repre Senting the piece of music to be classified is received. The file or other electronic Signal can be retrieved from persistent Storage, either locally or remotely, e.g., over the Internet. Alternatively, it can be received from a non-persistent Source, e.g., in real time The signal is subjected to a wavelet decomposition in step 104. There are many kinds of wavelet filters, includ ing Daubechies wavelet filters and Gabor filters. Daubechies wavelet filters are the one commonly in image retrieval (more details on wavelet filters can be found in 3). The preferred embodiment uses Daubechies wavelet filter Db8 with Seven levels of decomposition After the decomposition, the histogram of the wavelet coefficients is constructed at each Subband in Step 106. The coefficient histogram provides a good approxima tion of the waveform variations at each Subband. From probability theory, a probability distribution is uniquely characterized by its moments. Hence, if the waveform distribution is interpreted as a probability distribution, then it can be characterized by its moments. To characterize the waveform distribution, the first three moments of a histo gram are used 4. The first three moments are the average, the variance and the skewness of each Subband and are calculated in step 108. In addition, the Subband energy, defined as the mean of the absolute value of coefficients, is computed for each Subband in step 110. In addition, the final DWCHs feature set also includes the traditional timbral features for Speech recognition, which are determined in Step 112. The final feature set is formed from the above infor mation in step Each music file in the datasets used in the experi ments is 30-second signal that will be first converted to an extremely long vector. Based on an intuition of "Self

11 Similarity of a piece of music, i.e., its repeated theme, the DWCHs feature can be extracted on a small slice of an input music Signal, and in the experiments, Sound clips of three Seconds in duration are used. In Summary, the algorithm of DWCHs extraction contains the following steps: The wavelet decomposition of the music signals is obtained (step 104) Histogram of each Subband is constructed (step 106). 0079) 3. Compute the first three moments of all histograms (step 108) Compute the Subband energy for each Sub band (step 110) The algorithm is very easy to implement in Matlab, which contains a complete wavelet package Concrete examples will make clearer the DWCHs features and their putative advantages. In FIG. 2, from left to right and top to bottom, DWCHs features of ten music Sounds drawn from the different genres of the ten-genre dataset are shown. The feature representations of different music genres show characteristics of those genres. For example, the magnitudes of Rock and Hiphop are the largest among all music genres, and Blues has the smallest DWCHs magnitude. The magnitudes are quite consistent with the impression that each musical genre makes on human listen ers. FIG. 3 and FIG. 4 show DWCHS features of ten classical and ten blues music Sounds taken from DataSet A, respectively. Since Similar features are present inside a Single music genre, a unique DWCHS feature pattern exists in each music genre, and the use of DWCHs will improve classification of music genre Once the feature set has been extracted, the music genre classification problem is reduced to a multi-class classification problem, which will be described with refer ence to FIG. 1. The problem can be formally defined as follows: The input to the problem, received in step 116, is a Set of training Samples of the form of <X, l.)>, where X is a data point and l is its label, chosen from a finite Set of labels {C; C;... ; C. In the present case, the labels are music genres. The raw data from the musical Signals in the training set is processed in the same steps described above to produce feature Sets The goal, represented by step 118, is to infer a function f that well approximates the mapping of the X's to their labels. Once such a function f is obtained, it can be used to classify music Signals in Step 120. Generally Speak ing, approaches to multi-class classification problems can be roughly divided into two groups. The first group consists of those binary classification algorithms that can be naturally extended to handle multi-class cases. This group contains Such algorithm as Discriminant Analysis, K-Nearest Neigh bors, regression, and decision trees including C4.5 and CART. The second group consists of methods that involve decomposition of multi-class classification problems to a collection of binary ones. Many decomposition techniques exist, including Such popular methods as the one-versus the-rest method, pairwise comparison, Error-Correcting Output Coding (ECOC), and multi-class objective functions The idea of the one-versus-the-rest method is as follows: to get a K-class classifier, first construct a set of binary classifiers C, C,..., C. Each binary classifier is first trained to Separate one class from the rest, and then the multi-class classification is carried out according to the maximal output of the binary classifiers. In pairwise com parison, a classifier is trained for each possible pair of classes. For K classes, this results in (K-1)K/2 binary classifiers. Given a new instance, the multi-class classifica tion is then executed by evaluating all (K-1)K/2 individual classifiers and assigning the instance to the class which gets the highest number of votes. Error-Correcting Output Cod ing (ECOC) 6, roughly speaking, generates a number of binary classification problems by intelligently splitting the original Set of classes into two sets. In other words, each class is assigned a unique binary String of length 1 (these Strings are regarded to codewords). Then l classifiers are trained to predict each bit of the String. For new instances, the predicted class is the one whose codeword is the closest (in Hamming distance) to the codeword produced by the classifiers. One-verSus-the-rest and pairwise comparison can be regarded as two special cases of ECOC with Specific coding Schemes. Multi-class objective functions aims to directly modify the objective function of binary SVMs in Such a way that it simultaneously allows the computation of a multi-class classifier An illustrative example of a system in which the present invention can be implemented is shown in FIG. 5. The system 500 includes an input 502 for supplying the music files to a computer 504. As noted above, the input 502 may or may not include persistent Storage and may be local or remote. The computer 504 is sufficiently powerful to run Software Such as Matlab or an equivalent Software package and includes a CPU 506 and Such interface elements as a display 508 and a keyboard Experiments were conducted with the following additional multi-class classification approaches (see 19 for more information about the methods): 0088 Support Vector Machines (SVMs) 30 have shown Superb performance in binary classification tasks. Basically, Support Vector Machines aim at Searching for a hyperplane that Separates the positive data points and the negative data points with maxi mum margin. To extend SVMs for multi-class clas sification, we use one-versus-the-rest, pairwise com parison, and multi-class objective functions K-Nearest Neighbor (KNN) is a nonparamet ric classifier. It is proved that the error of KNN is asymptotically at most twice as large as the Bayesian error rate. KNN has been applied in various musical analysis problems. The basic idea is to allow a Small number of neighbors to influence the decision on a point Gaussian Mixture Models (GMM) model the distribution of acoustics and has been widely used in music information retrieval. For each class we assume the existence of a probability density func tion (pdf) expressible as a mixture of a number of multidimensional Gaussian distributions. The itera tive EM algorithm is then used to estimate the parameters for each Gaussian component and the mixture weight Linear Discriminant Analysis (LDA): In the Statistical pattern recognition literature discriminant analysis approaches are known to learn discrimina tive feature transformations very well. The approach has been Successfully used in many classification tasks 11). The basic idea of LDA is to find a linear transformation that best discriminates among classes

12 and performs classification in the transformed Space based on Some metric Such as Euclidean distances. Fisher discriminant analysis finds a discriminative feature transform as eigenvectors of a matrix T-XX, whee), w is 0092) is the intra-class covariance matrix and E. 0093) is the inter-class covariance matrix. This matrix T captures both the compactness of each class and Separations between classes. So, the eigenvec tors corresponding to the largest eigenvalues of T are expected to constitute a discriminative feature trans form We used two datasets for our experiments. The first dataset, DataSet A, contains 1000 Songs over ten genres with 100 songs per genre. This dataset was used in 29). The ten genres are Blues, Classical, Country, Disco, Hiphop, JaZZ, Metal, Pop, Reggae, and Rock. The excerpts of the dataset were taken from radio, compact disks, and MP3 compressed audio les. The second dataset, Dataset B, contains 756 Sounds over five genres: Ambient, Classical, Fusion, JaZZ, and Rock. This dataset was constructed for this paper from the CD collection of the second author. The collection of 756 Sound files was created from 189 music albums as follows: From each album the first four music tracks were chosen (three tracks from albums with only three music tracks). Then from each music track the Sound Signals over a period of 30 seconds after the initial 30 seconds were extracted in MP3. The distribution of different genres is: Ambient (109 files), Classical (164 files), Fusion (136 files), Jazz (251 files) and Rock (96 files). For both datasets, the sound files are converted to Hz, 16-bit, mono audio files We used MARSYAS, a public software framework for computer audition applications, for extracting the fea tures proposed in 29: MeI-frequency Cepstral Coefficients (denoted by MFCCs), the timbral texture features excluding MFCCs (denoted by FFT), the rhythm content features (denoted by Beat), and the pitch contents feature (denoted by Pitch). The MFCCs feature vector consists of the mean and variance of each of the first five MFCC coefficients over the frames, the FFT feature vector consists of the mean and variance of Spectral centroid, of rolloff, of flux, and of Zero crossings, and of low energy; the Beat feature vector con sists of six features from the rhythm histogram; the Pitch feature vector consists of five features from the pitch histo grams. More information of the feature extraction can be found in 28). Our original DWCH feature set contains four features for each of Seven frequency Subbands along with nineteen traditional timbral features. However, we found that not all the frequency Subbands are informative, and we only use Selective Subbands, resulting a feature vector of length For classification methods, we use three different reduction methods to extend SVM for multi-class: pairwise, one-against-the-rest, and multi-class objective functions. For one-against-the-rest and pairwise methods, our SVM implementation was based on the LIBSVM2, a library for Support vector classification and regression. For multi-class objective functions, our implementation was based on 12. For experiments involving SVMs, we tested them with linear, polynomial, and radius-based kernels, and the best results are reported in the tables below. For Gaussian Mix ture Models, we used three Gaussian mixtures to model each music genre. For K-Nearest Neighbors, we set k= Table 1 shows the accuracy of the various classi fication algorithms on Dataset A. The bottom four rows show how the classifiers performed on a Single Set of features proposed in 29). The experiments verify the fact that each of the tradition features contains useful yet incom plete information characterizing music Signals. The classi fication accuracy on any Single feature Set is significantly better than random guessing (the accuracy of random guess ing on dataset A is 10%). The performance with either FFT or MFCC was significantly higher than that with Beat or Pitch in each of the methods tested. This naturally raises a question of whether FFT and MFCC are each more suitable than Beat or Pitch for music genre classification. We com bined the four sets of features in every possible way to examine the accuracy. The accuracy with only Beat and Pitch is significantly Smaller than the accuracy with any combination that includes either FFT or MFCC. Indeed, the accuracy with only FFT and MFCC is almost the same as that with all four for all methods. This seems to answer positively to our question. Features TABLE 1. Classification accuracy of the learning methods tested on Dataset A using various combinations of features. The accuracy values are calculated via ten-fold cross validation. The numbers within parentheses are standard deviations. SVM1 and SVM2 respectively denote the pairwise SVM and the One-Versus-the-rest SVM. DWCHS Beat + FFT + MFCC + Pitch Beat - FFT - MFCC Beat + FFT + Pitch Beat + MFCC + Pitch Methods SVM1 SVM2 MPSVM GMM LDA KNN 74.9(4.97) 78.5(4.07) 68.3(4.34) 63.5(4.72) 71.3(6.10) 62.1 (4.54) 70.8(5.39) 71.9(5.09) 66.2(5.23) 61.4(3.87) 69.4(6.93) 61.3(4.85) 71.2(498) 72.1 (4.68) 64.6(4.16) 60.8(3.25) 70.2(6.61) 62.3(4.03) 65.1 (4.27) 67.2(3.79) 56.0(4.67) 53.3(3.82) 61.1 (6.53) 51.8(2.94) 64.3(4.24) 63.7(4.27) 57.8(3.82) 50.4(2.22) 61.7(5.23) 54.0(3.30)

13 TABLE 1-continued Classification accuracy of the learning methods tested on Dataset A using various combinations of features. The accuracy values are calculated via ten-fold cross validation. The numbers within parentheses are standard deviations. SVM1 and SVM2 respectively denote the pairwise SVM and the One-Versus-the-rest SVM. Methods Features SVM1 SVM2 MPSVM GMM FFT - MFCC - Pitch 70.9(6.22) 722(3.90) 64.9(5.06) 59.6(3.22) Beat - FFT 61.7(5.12) 62.6(4.83) 50.8(5.16) 48.3(3.82) Beat - MFCC 60.4(3.19) 60.2(4.84) 53.5(4.45) 47.7(2.24) Beat + Pitch 42.7(5.37) 41.1 (4.68) 35.6(4.27) 34.0(2.69) FFT - MFCC 70.5(5.98) 71.8(4.83) 63.6(4.71) 59.1(3.20) FFT - Pitch 64.0(5.16) 68.2(3.79) 55.1(5.82) 53.7(3.15) MFCC - Pitch 60.6(4.54) 64.4(4.37) 53.3(2.95) 48.2(2.71) Beat 26.5(3.30) 21.5(2.71) 22.1(3.04) 22.1 (1.91) FFT 61.2(6.74) 61.8(3.39) 50.6(5.76) 47.9(4.91) MFCC 58.4(3.31) 58.1 (4.72) 49.4(2.27) 46.4(3.09) Pitch 36.6(2.95) 33.6(3.23) 29.9(3.76) 25.8(3.02) LDA 69.9(6.76) 56.0(6.73) 59.6(4.03) 36.9(4.38) 66.8(6.77) 60.0(6.68) 59.4(4.50) 24.9(2.99) 56.5(6.90) 55.5(3.57) 30.7(2.79) KNN 61.0(5.40) 48.8(5.07) 50.5(4.53) 35.7(3.59) 61.2(7.12) 53.8(4.73) 54.7(3.50) 22.8(5.12) 52.6(3.81) 53.7(4.11) 33.3(3.20) On the other hand, the use of DWCHs further improved the accuracy on all methods. In particular, there is a Significant jump in the accuracy when Support Vector Machines are used with either the pairwise or the one Versus-all approach. The accuracy of the one-versus-the-rest SVM is 78.5% on the average in the ten-fold cross valida tion. For Some of the cross validation tests, the accuracy went beyond 80%. This is a remarkable improvement over Tsanetakis and Cook's 61%. Perrot and Gjerdigen 20 report a human Study in which college Students were trained to learn a music company's genre classification on a ten genre data collection in which about 70% of accuracy is achieved. Although these results are not directly comparable due to the different dataset collections, it clear implies that the automatic content-based genre classification could pos sibly achieve Similar accuracy as human performance. In fact, in comparison the performance of our best method Seems to go far beyond that There are papers reporting better accuracy of auto matic music genre recognition of Smaller datasets. Pye 21 reports 90% on a total Set of 175 Songs covering six genres (Blues, Easy Listening, Classical, Opera, Dance, and Indie Rock). Soltau et al. 26) report 80% accuracy on four classes (Rock, Pop, Techno, and Classical). Just for the sake of comparison, we show in Table 2 the performance of the best classifier (DWCHs with SVM) on one-versus-all tests on each of the ten music genres in DataSet A. The performance of these classifiers are extremely good. Also, in Table 3 we show the performance of the multi-class classification for distinction among Smaller numbers of classes. The accuracy gradually decreases as the number of classes increases TABLE 2 Genre specific accuracy of SVM1 on DWCHs. The results are calculated via tenfold cross validation and each entry in the table is in the form of accuracy (standard deviation). Number Genre Accuracy 1. Blues 95.49(1.27) 2 Classical 98.89(1.10) 3 Country 94.29(2.49) 4 Disco 92.69(2.54) 5 Jazz 97.90(0.99) 6 Metal 95.29(2.18) 7 Pop 95.8O(1.69) 8 Hiphop 96.49(1.28) 9 Reggae 92.30(2.49) 1O Rock 91.29(2.96) TABLE 3 Accuracy on various subsets of Dataset A using DWCHs. The class numbers correspond to those of Table 2. The accuracy values are calculated via ten-fold cross validation. The numbers in the parentheses are the standard deviations. Methods Classes 1 & 2 1, 2 & 3 1 through 4 1 through 5 SVM (3.50) 92.33(5.46) 90.5(4.53) 88.00(3.89) SVM2 MPSVM GMM LDA KNN 98.00(2.58) 99.00(2.11) 98.00(3.22) 99.00(2.11) 97.5(2.64) 92.67(4.92) 93.33(3.51) 91.33(3.91) 94.00(4.10) 87.00(5.54) 90.00(4.25) 89.75(3.99) 85.25(5.20) 89.25(3.92) 83.75(5.92) 86.80(4.54) 83.40(5.42) 81.2(4.92) 86.2(5.03) 78.00(5.89)

14 TABLE 3-continued Accuracy on various subsets of Dataset A using DWCHs. The class numbers correspond to those of Table 2. The accuracy values are calculated via ten-fold cross validation. The numbers in the parentheses are the standard deviations. Methods Classes SVM1 SVM2 MPSVM GMM LDA KNN 1 through (4.81) 86.67(5.27) 81.0(6.05) 73.83(5.78) 82.83(6.37) 73.5(6.01) 1 through (4.26) 84.43(3.53) 78.85(3.67) 74.29(6.90) 81.00(5.87) 73.29(5.88) 1 through (4.56) 83.00(3.64) 75.13(4.84) 72.38(6.22) 79.13(6.07) 69.38(5.47) 1 through (4.83) 79.78(2.76) 70.55(4.30) 68.22(7.26) 74.47(6.22) 65.56(4.66) 0101 Table 4 presents the results on our own dataset. This dataset was generated with little control by blindly taking 30 seconds after introductory 30 seconds of each music and covers many different albums, So the performance was anticipated to be lower than that for Database A. Also, there is the genre of Ambient, which covers music bridging between Classical and Jazz. The difficulty in classifying such borderline cases is compensated for the reduction in the number of classes. The overall performance was only 4 to 5% lower than that for Database A. than ten genres of music can be used. Also, the present invention is not limited to the use of Matlab; instead, any other suitable software can be used. Therefore, the present invention should be construed as limited only by the appended claims. We claim: 1. A method for automatically forming a feature Set describing an electronic Signal representing a piece of music, the method comprising: TABLE 4 Classification accuracy of the learning methods tested on Dataset B using various combinations of features calculated via tenfold cross validation. The numbers within parentheses are standard deviations. Methods Features SVM1 SVM2 MPSVM GMM LDA KNN DWCHS 71.48(6.84) (4.65) 67.16(5.60) 64.77(6.15) 65.74(6.03) 61.84(4.88) Beat + FFT + MFCC + Pitch 68.65(3.90) 69.19(4.32) (3.63) 63.08(5.89) 66.00(5.57) 60.59(5.43) FFT - MFCC 66.67(4.40) 70.63(4.13) 64.29(4.54) 61.24(6.29) 65.35(4.86) 60.78(4.30) Beat 43.37(3.88) 44.52(4.14) (4.46) 37.95(5.21) 40.87(4.50) 41.27(2.96) FFT 61.65(5.57) 62.19(5.26) 54.76(2.94) 50.8O(4.89) 57.94(5.11) 57.42(5.64) MFCC 60.45(5.12) 67.46(3.57) 57.42(4.67) 53.43(5.64) 59.26(4.77) 59.93(3.49) Pitch 37.56(4.63) 39.37(3.88) 36.49(5.12) 29.62(5.89) 37.82(4.67) 38.89(5.04) 0102) We observe that SVMs are always the best classi fiers for content-based music genre classification. However, the choice of the reduction method from multi-class to binary Seems to be problem-dependent and there is no clear overall winner. It is fair to say that there is probably no reduction method generally outperforms the others. Feature extraction is crucial for music genre classification. The choice of features is more important than the choice of classifiers. The variations of classification accuracy on dif ferent classification techniques are much Smaller than those of different feature extraction techniques. 0103) The technique disclosed above is not limited in applicability to classifying music into genres Such as clas Sical and blues. It can be expanded to the classification of music in accordance with the emotions that it provokes in listeners, e.g., cheerful, dreamy. Of course, like the genres discussed above, a piece of music may fall into more than one emotional category While a preferred embodiment of the present invention has been set forth above, those skilled in the art who have reviewed the present disclosure will readily appre ciate that other embodiments can be realized within the Scope of the present invention. For example, more or fewer (a) receiving the electronic signal into a computing device; (b) performing a wavelet decomposition of the electronic Signal to obtain a plurality of wavelet coefficients in a plurality of Subbands; (c) forming a histogram of the wavelet coefficients in each of the Subbands; (d) calculating an average, variance and skewness of each of the histograms, (e) calculating a Subband energy of each of the histo grams, and (f) forming the feature set Such that the feature Set comprises the average, variance, skewness, and Sub band energy of at least Some of the Subbands. 2. The method of claim 1, wherein step (f) comprises forming the feature Set Such that the feature Set comprises the average, variance, skewness, and Subband energy of fewer than all of the Subbands. 3. The method of claim 2, further comprising extracting timbral features from the electronic Signal, and wherein the feature Set further comprises the timbral features.

15 4. The method of claim 1, wherein step (b) comprises convolving at least part of the electronic signal with a Daubechies wavelet filter. 5. The method of claim 4, wherein step (b) is performed with less than all of the electronic Signal. 6. The method of claim 1, further comprising (g) using the feature Set to classify the piece of music into at least one of a plurality of categories of music. 7. The method of claim 6, wherein step (g) is performed using a multi-class classification algorithm. 8. The method of claim 6, wherein step (g) is performed using a plurality of binary classification algorithms. 9. The method of claim 8, wherein the binary classifica tion algorithms comprise Support vector machine classifica tion algorithms. 10. A method for automatically forming a classifier algo rithm for classifying a piece of music represented by an electronic Signal into one or more of a plurality of musical genres, the method comprising: (a) receiving into a computing device a plurality of classified electronic Signals, each of the classified elec tronic Signals representing a known piece of music which has already been classified into one or more of the plurality of musical genres, (b) for each of the classified electronic signals: (i) performing a wavelet decomposition of the classi fied electronic signal to obtain a plurality of wavelet coefficients in a plurality of Subbands, (ii) forming a histogram of the wavelet coefficients in each of the Subbands; (iii) calculating an average, variance and skewness of each of the histograms, (iv) calculating a Subband energy of each of the histo grams, and (v) forming a feature Set Such that the feature Set comprises the average, variance, skewness, and Sub band energy of at least Some of the Subbands, and (c) automatically forming the classifier algorithm from the feature Sets Such that the classifier algorithm properly classifies the known pieces of music. 11. The method of claim 10, wherein step (b)(v) com prises forming each feature Set Such that the feature Set comprises the average, variance, skewness, and Subband energy of fewer than all of the Subbands. 12. The method of claim 11, further comprising extracting timbral features from each classified electronic Signal, and wherein each feature Set further comprises the timbral features. 13. The method of claim 10, wherein step (b)(i) comprises convolving at least part of the classified electronic Signal with a Daubechies wavelet filter. 14. The method of claim 13, wherein step (b)(i) is performed with less than all of the electronic Signal. 15. The method of claim 10, wherein the classifier algo rithm comprises a multi-class classification algorithm. 16. The method of claim 15, wherein the classifier algo rithm comprises a plurality of binary classification algo rithms. 17. The method of claim 16, wherein the binary classifi cation algorithms comprise Support vector machine classi fication algorithms. 18. A device for automatically forming a feature Set describing an electronic Signal representing a piece of music, the device comprising: an input for receiving the electronic Signal; and a computing device, in communication with the input, for: performing a wavelet decomposition of the electronic Signal to obtain a plurality of wavelet coefficients in a plurality of Subbands; forming a histogram of the wavelet coefficients in each of the Subbands; calculating an average, variance and skewness of each of the histograms, calculating a Subband energy of each of the histograms, and forming the feature Set Such that the feature Set comprises the average, variance, skewness, and Subband energy of at least Some of the Subbands. 19. The device of claim 18, wherein the computing device forms the feature Set Such that the feature Set comprises the average, variance, skewness, and Subband energy of fewer than all of the Subbands. 20. The device of claim 19, wherein the computing device also extracts timbral features from the electronic Signal, and wherein the feature set further comprises the timbral fea tures. 21. The device of claim 18, wherein the computing device performs the wavelet decomposition by convolving at least part of the electronic signal with a Daubechies wavelet filter. 22. The device of claim 21, wherein the wavelet decom position is performed with less than all of the electronic Signal. 23. The device of claim 18, wherein the computing device uses the feature Set to classify the piece of music into at least one of a plurality of categories of music. 24. The device of claim 23, wherein the computing device classifies the piece of music using a multi-class classification algorithm. 25. The device of claim 23, wherein the computing device classifies the piece of music using a plurality of binary classification algorithms. 26. The device of claim 23, wherein the binary classifi cation algorithms comprise Support vector machine classi fication algorithms. 27. A device for automatically forming a classifier algo rithm for classifying a piece of music represented by an electronic Signal into one or more of a plurality of musical genres, the device comprising: an input for receiving a plurality of classified electronic Signals, each of the classified electronic Signals repre Senting a known piece of music which has already been classified into one or more of the plurality of musical genres, and a computing device, in communication with the input for forming the classifier algorithm by: for each of the classified electronic Signals: (i) performing a wavelet decomposition of the classi fied electronic Signal to obtain a plurality of wavelet coefficients in a plurality of Subbands,

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A. MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES P.S. Lampropoulou, A.S. Lampropoulos and G.A. Tsihrintzis Department of Informatics, University of Piraeus 80 Karaoli & Dimitriou

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Speech/Music Discrimination via Energy Density Analysis

Speech/Music Discrimination via Energy Density Analysis Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Automatic classification of traffic noise

Automatic classification of traffic noise Automatic classification of traffic noise M.A. Sobreira-Seoane, A. Rodríguez Molares and J.L. Alba Castro University of Vigo, E.T.S.I de Telecomunicación, Rúa Maxwell s/n, 36310 Vigo, Spain msobre@gts.tsc.uvigo.es

More information

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition Shigueo Nomura and José Ricardo Gonçalves Manzan Faculty of Electrical Engineering, Federal University of Uberlândia, Uberlândia, MG,

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

A multi-class method for detecting audio events in news broadcasts

A multi-class method for detecting audio events in news broadcasts A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY A Speech/Music Discriminator Based on RMS and Zero-Crossings

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY A Speech/Music Discriminator Based on RMS and Zero-Crossings TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005 1 A Speech/Music Discriminator Based on RMS and Zero-Crossings Costas Panagiotakis and George Tziritas, Senior Member, Abstract Over the last several

More information

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner University of Rochester ABSTRACT One of the most important applications in the field of music information processing is beat finding. Humans have

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

An Optimization of Audio Classification and Segmentation using GASOM Algorithm An Optimization of Audio Classification and Segmentation using GASOM Algorithm Dabbabi Karim, Cherif Adnen Research Unity of Processing and Analysis of Electrical and Energetic Systems Faculty of Sciences

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

Environmental Sound Recognition using MP-based Features

Environmental Sound Recognition using MP-based Features Environmental Sound Recognition using MP-based Features Selina Chu, Shri Narayanan *, and C.-C. Jay Kuo * Speech Analysis and Interpretation Lab Signal & Image Processing Institute Department of Computer

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

(12) Patent Application Publication (10) Pub. No.: US 2008/ A1. Kalevo (43) Pub. Date: Mar. 27, 2008

(12) Patent Application Publication (10) Pub. No.: US 2008/ A1. Kalevo (43) Pub. Date: Mar. 27, 2008 US 2008.0075354A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0075354 A1 Kalevo (43) Pub. Date: (54) REMOVING SINGLET AND COUPLET (22) Filed: Sep. 25, 2006 DEFECTS FROM

More information

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure

More information

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1 US 2003O108129A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2003/0108129 A1 Voglewede et al. (43) Pub. Date: (54) AUTOMATIC GAIN CONTROL FOR (21) Appl. No.: 10/012,530 DIGITAL

More information

Content Based Image Retrieval Using Color Histogram

Content Based Image Retrieval Using Color Histogram Content Based Image Retrieval Using Color Histogram Nitin Jain Assistant Professor, Lokmanya Tilak College of Engineering, Navi Mumbai, India. Dr. S. S. Salankar Professor, G.H. Raisoni College of Engineering,

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1 US 2011 0029.108A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0029.108A1 Lee et al. (43) Pub. Date: Feb. 3, 2011 (54) MUSIC GENRE CLASSIFICATION METHOD Publication Classification

More information

AUTOMATED MUSIC TRACK GENERATION

AUTOMATED MUSIC TRACK GENERATION AUTOMATED MUSIC TRACK GENERATION LOUIS EUGENE Stanford University leugene@stanford.edu GUILLAUME ROSTAING Stanford University rostaing@stanford.edu Abstract: This paper aims at presenting our method to

More information

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection. Global Journal of Researches in Engineering: J General Engineering Volume 15 Issue 4 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information

Electric Guitar Pickups Recognition

Electric Guitar Pickups Recognition Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

FPGA implementation of DWT for Audio Watermarking Application

FPGA implementation of DWT for Audio Watermarking Application FPGA implementation of DWT for Audio Watermarking Application Naveen.S.Hampannavar 1, Sajeevan Joseph 2, C.B.Bidhul 3, Arunachalam V 4 1, 2, 3 M.Tech VLSI Students, 4 Assistant Professor Selection Grade

More information

Feature Analysis for Audio Classification

Feature Analysis for Audio Classification Feature Analysis for Audio Classification Gaston Bengolea 1, Daniel Acevedo 1,Martín Rais 2,,andMartaMejail 1 1 Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos

More information

Campus Location Recognition using Audio Signals

Campus Location Recognition using Audio Signals 1 Campus Location Recognition using Audio Signals James Sun,Reid Westwood SUNetID:jsun2015,rwestwoo Email: jsun2015@stanford.edu, rwestwoo@stanford.edu I. INTRODUCTION People use sound both consciously

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Rhythm Analysis in Music

Rhythm Analysis in Music Rhythm Analysis in Music EECS 352: Machine Perception of Music & Audio Zafar Rafii, Winter 24 Some Definitions Rhythm movement marked by the regulated succession of strong and weak elements, or of opposite

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar Biomedical Signals Signals and Images in Medicine Dr Nabeel Anwar Noise Removal: Time Domain Techniques 1. Synchronized Averaging (covered in lecture 1) 2. Moving Average Filters (today s topic) 3. Derivative

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

(12) Patent Application Publication (10) Pub. No.: US 2010/ A1

(12) Patent Application Publication (10) Pub. No.: US 2010/ A1 (19) United States US 2010O2O8236A1 (12) Patent Application Publication (10) Pub. No.: US 2010/0208236A1 Damink et al. (43) Pub. Date: Aug. 19, 2010 (54) METHOD FOR DETERMINING THE POSITION OF AN OBJECT

More information

Speech/Music Change Point Detection using Sonogram and AANN

Speech/Music Change Point Detection using Sonogram and AANN International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 6, Number 1 (2016), pp. 45-49 International Research Publications House http://www. irphouse.com Speech/Music Change

More information

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes 216 7th International Conference on Intelligent Systems, Modelling and Simulation Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes Yuanyuan Guo Department of Electronic Engineering

More information

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY Jesper Højvang Jensen 1, Mads Græsbøll Christensen 1, Manohar N. Murthi, and Søren Holdt Jensen 1 1 Department of Communication Technology,

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification.

Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification. Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification. Carlos A. de los Santos Guadarrama MASTER THESIS UPF / 21 Master in Sound and Music Computing Master thesis supervisors:

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Multiresolution Analysis of Connectivity

Multiresolution Analysis of Connectivity Multiresolution Analysis of Connectivity Atul Sajjanhar 1, Guojun Lu 2, Dengsheng Zhang 2, Tian Qi 3 1 School of Information Technology Deakin University 221 Burwood Highway Burwood, VIC 3125 Australia

More information

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM DR. D.C. DHUBKARYA AND SONAM DUBEY 2 Email at: sonamdubey2000@gmail.com, Electronic and communication department Bundelkhand

More information

Audio Classification by Search of Primary Components

Audio Classification by Search of Primary Components Audio Classification by Search of Primary Components Julien PINQUIER, José ARIAS and Régine ANDRE-OBRECHT Equipe SAMOVA, IRIT, UMR 5505 CNRS INP UPS 118, route de Narbonne, 3106 Toulouse cedex 04, FRANCE

More information

Short Time Energy Amplitude. Audio Waveform Amplitude. 2 x x Time Index

Short Time Energy Amplitude. Audio Waveform Amplitude. 2 x x Time Index Content-Based Classication and Retrieval of Audio Tong Zhang and C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical Engineering-Systems University of Southern California, Los Angeles,

More information

Rhythm Analysis in Music

Rhythm Analysis in Music Rhythm Analysis in Music EECS 352: Machine Perception of Music & Audio Zafar RAFII, Spring 22 Some Definitions Rhythm movement marked by the regulated succession of strong and weak elements, or of opposite

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

5. 5. EEN - INTERPICTURE -- HISTOGRAM.H.A.)

5. 5. EEN - INTERPICTURE -- HISTOGRAM.H.A.) USOO6606411B1 (12) United States Patent (10) Patent No.: US 6,606,411 B1 Louiet al. (45) Date of Patent: Aug. 12, 2003 (54) METHOD FOR AUTOMATICALLY 5,751,378 A 5/1998 Chen et al.... 348/700 CLASSIFYING

More information

A new quad-tree segmented image compression scheme using histogram analysis and pattern matching

A new quad-tree segmented image compression scheme using histogram analysis and pattern matching University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai A new quad-tree segmented image compression scheme using histogram analysis and pattern

More information

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Wavelet Transform From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Fourier theory: a signal can be expressed as the sum of a series of sines and cosines. The big disadvantage of a Fourier

More information

Image Extraction using Image Mining Technique

Image Extraction using Image Mining Technique IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 9 (September. 2013), V2 PP 36-42 Image Extraction using Image Mining Technique Prof. Samir Kumar Bandyopadhyay,

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

Advanced Music Content Analysis

Advanced Music Content Analysis RuSSIR 2013: Content- and Context-based Music Similarity and Retrieval Titelmasterformat durch Klicken bearbeiten Advanced Music Content Analysis Markus Schedl Peter Knees {markus.schedl, peter.knees}@jku.at

More information

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller) Lecture 6 Rhythm Analysis (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller) Definitions for Rhythm Analysis Rhythm: movement marked by the regulated succession of strong

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt Aalborg Universitet Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt Published in: Proceedings of the

More information

ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING

ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING th International Society for Music Information Retrieval Conference (ISMIR ) ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING Jeffrey Scott, Youngmoo E. Kim Music and Entertainment Technology

More information

Classification of Digital Photos Taken by Photographers or Home Users

Classification of Digital Photos Taken by Photographers or Home Users Classification of Digital Photos Taken by Photographers or Home Users Hanghang Tong 1, Mingjing Li 2, Hong-Jiang Zhang 2, Jingrui He 1, and Changshui Zhang 3 1 Automation Department, Tsinghua University,

More information

Real-time beat estimation using feature extraction

Real-time beat estimation using feature extraction Real-time beat estimation using feature extraction Kristoffer Jensen and Tue Haste Andersen Department of Computer Science, University of Copenhagen Universitetsparken 1 DK-2100 Copenhagen, Denmark, {krist,haste}@diku.dk,

More information

A New Scheme for No Reference Image Quality Assessment

A New Scheme for No Reference Image Quality Assessment Author manuscript, published in "3rd International Conference on Image Processing Theory, Tools and Applications, Istanbul : Turkey (2012)" A New Scheme for No Reference Image Quality Assessment Aladine

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

(12) Patent Application Publication (10) Pub. No.: US 2001/ A1

(12) Patent Application Publication (10) Pub. No.: US 2001/ A1 US 2001 004.8356A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2001/0048356A1 Owen (43) Pub. Date: Dec. 6, 2001 (54) METHOD AND APPARATUS FOR Related U.S. Application Data

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Upgrading pulse detection with time shift properties using wavelets and Support Vector Machines

Upgrading pulse detection with time shift properties using wavelets and Support Vector Machines Upgrading pulse detection with time shift properties using wavelets and Support Vector Machines Jaime Gómez 1, Ignacio Melgar 2 and Juan Seijas 3. Sener Ingeniería y Sistemas, S.A. 1 2 3 Escuela Politécnica

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Target detection in side-scan sonar images: expert fusion reduces false alarms

Target detection in side-scan sonar images: expert fusion reduces false alarms Target detection in side-scan sonar images: expert fusion reduces false alarms Nicola Neretti, Nathan Intrator and Quyen Huynh Abstract We integrate several key components of a pattern recognition system

More information

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1 US 2016O2.91546A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2016/0291546 A1 Woida-O Brien (43) Pub. Date: Oct. 6, 2016 (54) DIGITAL INFRARED HOLOGRAMS GO2B 26/08 (2006.01)

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG

More information

(12) United States Patent (10) Patent No.: US 6,436,044 B1

(12) United States Patent (10) Patent No.: US 6,436,044 B1 USOO643604.4B1 (12) United States Patent (10) Patent No.: Wang (45) Date of Patent: Aug. 20, 2002 (54) SYSTEM AND METHOD FOR ADAPTIVE 6,282,963 B1 9/2001 Haider... 73/602 BEAMFORMER APODIZATION 6,312,384

More information

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Wavelet Transform From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Fourier theory: a signal can be expressed as the sum of a, possibly infinite, series of sines and cosines. This sum is

More information

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1 (19) United States US 2003O132800A1 (12) Patent Application Publication (10) Pub. No.: US 2003/0132800 A1 Kenington (43) Pub. Date: Jul. 17, 2003 (54) AMPLIFIER ARRANGEMENT (76) Inventor: Peter Kenington,

More information

(12) Patent Application Publication (10) Pub. No.: US 2015/ A1

(12) Patent Application Publication (10) Pub. No.: US 2015/ A1 US 201502272O2A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0227202 A1 BACKMAN et al. (43) Pub. Date: Aug. 13, 2015 (54) APPARATUS AND METHOD FOR Publication Classification

More information

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES Jean-Baptiste Rolland Steinberg Media Technologies GmbH jb.rolland@steinberg.de ABSTRACT This paper presents some concepts regarding

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

Audio and Speech Compression Using DCT and DWT Techniques

Audio and Speech Compression Using DCT and DWT Techniques Audio and Speech Compression Using DCT and DWT Techniques M. V. Patil 1, Apoorva Gupta 2, Ankita Varma 3, Shikhar Salil 4 Asst. Professor, Dept.of Elex, Bharati Vidyapeeth Univ.Coll.of Engg, Pune, Maharashtra,

More information

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1 (19) United States US 2011 O273427A1 (12) Patent Application Publication (10) Pub. No.: US 2011/0273427 A1 Park (43) Pub. Date: Nov. 10, 2011 (54) ORGANIC LIGHT EMITTING DISPLAY AND METHOD OF DRIVING THE

More information

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES MATH H. J. BOLLEN IRENE YU-HUA GU IEEE PRESS SERIES I 0N POWER ENGINEERING IEEE PRESS SERIES ON POWER ENGINEERING MOHAMED E. EL-HAWARY, SERIES EDITOR IEEE

More information

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER Department of Computer Science, Institute of Management Sciences, 1-A, Sector

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Design and Implementation of an Audio Classification System Based on SVM

Design and Implementation of an Audio Classification System Based on SVM Available online at www.sciencedirect.com Procedia ngineering 15 (011) 4031 4035 Advanced in Control ngineering and Information Science Design and Implementation of an Audio Classification System Based

More information

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE International Journal of Technology (2011) 1: 56 64 ISSN 2086 9614 IJTech 2011 IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE Djamhari Sirat 1, Arman D. Diponegoro

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information