An Improved Feature Extraction and Combination of Multiple Classifiers for Query-by-Humming

Similar documents
Knowledge Transfer in Semi-automatic Image Interpretation

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Memorandum on Impulse Winding Tester

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

A New and Robust Segmentation Technique Based on Pixel Gradient and Nearest Neighbors for Efficient Classification of MRI Images

Signal Characteristics

Comparing image compression predictors using fractal dimension

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

A Segmentation Method for Uneven Illumination Particle Images

Pointwise Image Operations

Discrete Word Speech Recognition Using Hybrid Self-adaptive HMM/SVM Classifier

ECE-517 Reinforcement Learning in Artificial Intelligence

Sketch-based Image Retrieval Using Contour Segments

Development of Temporary Ground Wire Detection Device

P. Bruschi: Project guidelines PSM Project guidelines.

Journal of Next Generation Information Technology Volume 1, Number 2, August, 2010

Lecture #7: Discrete-time Signals and Sampling

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

Comparitive Analysis of Image Segmentation Techniques

4 20mA Interface-IC AM462 for industrial µ-processor applications

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

Notes on the Fourier Transform

The regsubseq Package

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

SPEAKER IDENTIFICATION USING MODULAR RECURRENT NEURAL NETWORKS. M W Mak. The Hong Kong Polytechnic University

Square Waves, Sinusoids and Gaussian White Noise: A Matching Pursuit Conundrum? Don Percival

Sound. Audio DSP. Sound Volume. Sinusoids and Sound: Amplitude

UNIT IV DIGITAL MODULATION SCHEME

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

MATLAB/SIMULINK TECHNOLOGY OF THE SYGNAL MODULATION

Mobile Communications Chapter 3 : Media Access

Negative frequency communication

A new image security system based on cellular automata and chaotic systems

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

Digital Communications - Overview

FROM ANALOG TO DIGITAL

Lecture September 6, 2011

SLAM Algorithm for 2D Object Trajectory Tracking based on RFID Passive Tags

Revision: June 11, E Main Suite D Pullman, WA (509) Voice and Fax

Increasing Measurement Accuracy via Corrective Filtering in Digital Signal Processing

EXTREME SLAP BASS - A DICTIONARY OF GROOVES BY JOHN M. SHAUGHNESSY II 2001 JOHN M. SHAUGHNESSY II

EE201 Circuit Theory I Fall

4.5 Biasing in BJT Amplifier Circuits

A NEW DUAL-POLARIZED HORN ANTENNA EXCITED BY A GAP-FED SQUARE PATCH

Lecture 4. EITN Chapter 12, 13 Modulation and diversity. Antenna noise is usually given as a noise temperature!

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

Monaural Speech Separation

Chapter 14: Bandpass Digital Transmission. A. Bruce Carlson Paul B. Crilly 2010 The McGraw-Hill Companies

Double Tangent Sampling Method for Sinusoidal Pulse Width Modulation

Chapter 2 Summary: Continuous-Wave Modulation. Belkacem Derras

Acquiring hand-action models by attention point analysis

THE OSCILLOSCOPE AND NOISE. Objectives:

Abstract. 1 Introduction

AN303 APPLICATION NOTE

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

Universal microprocessor-based ON/OFF and P programmable controller MS8122A MS8122B

ELEG 3124 SYSTEMS AND SIGNALS Ch. 1 Continuous-Time Signals

Improving the Sound Recording Quality of Wireless Sensors Using Automatic Gain Control Methods

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

Activity Recognition using Hierarchical Hidden Markov Models on Streaming Sensor Data

Sound. Audio DSP. Sinusoids and Sound: Amplitude. Sound Volume

5 Spatial Relations on Lines

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Modulation exercises. Chapter 3

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

Examination Mobile & Wireless Networking ( ) April 12,

Note separation of polyphonic music by energy split

IMPROVEMENT OF THE TEXT DEPENDENT SPEAKER IDENTIFICATION SYSTEM USING DISCRETE MMM WITH CEPSTRAL BASED FEATURES

Increasing multi-trackers robustness with a segmentation algorithm

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

HIGH THROUGHPUT EVALUATION OF SHA-1 IMPLEMENTATION USING UNFOLDING TRANSFORMATION

An Automated Fish Counting Algorithm in Aquaculture Based on Image Processing

Surveillance System with Object-Aware Video Transcoder

Lecture 11. Digital Transmission Fundamentals

Technology Trends & Issues in High-Speed Digital Systems

Improving Driver Alertness through Music Selection Using a Mobile EEG to Detect Brainwaves

Passband Data Transmission I References Phase-shift keying Chapter , S. Haykin, Communication Systems, Wiley. G.1

Multiuser Interference in TH-UWB

Sound so far: 10/13/2013. Sound and stringed instruments

The design of an improved matched filter in DSSS-GMSK system

Diodes. Diodes, Page 1

Generating Polar Modulation with R&S SMU200A

Automatic Power Factor Control Using Pic Microcontroller

Classification of Multitemporal Remote Sensing Data of Different Resolution using Conditional Random Fields

DS CDMA Scheme for WATM with Errors and Erasures Decoding

Phase-Shifting Control of Double Pulse in Harmonic Elimination Wei Peng1, a*, Junhong Zhang1, Jianxin gao1, b, Guangyi Li1, c

Adaptive Approach Based on Curve Fitting and Interpolation for Boundary Effects Reduction

EECE 301 Signals & Systems Prof. Mark Fowler

(This lesson plan assumes the students are using an air-powered rocket as described in the Materials section.)

OpenStax-CNX module: m Elemental Signals. Don Johnson. Perhaps the most common real-valued signal is the sinusoid.

A-LEVEL Electronics. ELEC4 Programmable Control Systems Mark scheme June Version: 1.0 Final

ISSCC 2007 / SESSION 29 / ANALOG AND POWER MANAGEMENT TECHNIQUES / 29.8

GaN-HEMT Dynamic ON-state Resistance characterisation and Modelling

Chapter 2 Introduction: From Phase-Locked Loop to Costas Loop

ACTIVITY BASED COSTING FOR MARITIME ENTERPRISES

Network Design and Optimization for Quality of Services in Wireless Local Area Networks using Multi-Objective Approach

Transcription:

The Inernaional Arab Journal of Informaion Technology, Vol. 11, No. 1, January 2014 103 An Improved Feaure Exracion and Combinaion of Muliple Classifiers for Query-by-ming Naha Phiwma 1 and Parinya Sanguansa 2 1 Deparmen of Compuer Science, Suan Dusi Rajabha Universiy, Thailand 2 Faculy of Engineering and Technology, Panyapiwa Insiue of Managemen, Thailand Absrac: In his paper, we propose new mehods for feaure exracion and sof majoriy voing o adjus efficiency and accuracy of music rerieval. For our work, he inpu is humming sound which is sound wave and Musical Insrumen Digial Inerface (MIDI) is used as he reference song in daabase. A criical issue of humming sound are variaion such as duraion, sound, empo, key, and noise inerference from boh environmen and acquisiion insrumens. Besides all he problems of humming sound we have menioned earlier, wheher humming sound and MIDI in differen domain which will make he difficuly for wo domains o compare each oher. However, o make hese wo in he same domain, we conver hem ino he frequency domain. Our approach sars from pre-processing by using feaures for noe segmenaion by humming sound. The process consiss of four seps as follows: Firsly, he MIDI is already a sequence of pich while he pich in humming sound is needed o exrac by Subharmonic-o-Harmonic (SHR). Subsequenly, he exraced pich can be used o calculae all above aribues and hen muliple classifiers are applied o classify he muliple subses of hese feaures. Aferwards, he subse conain he muliple aribues, Muli-Dimensional Dynamic Time Warping (MD-DTW) is used for similariy measuremen. Finally, Neares Neighbours (NN) and sof majoriy voing are used o obain he rerieval resuls in case of equal scores. From he experimens, o achieve 100% accuracy rae a he early op-n rank in rerieving, he appropriae feaure se should consis of five classifiers. Keywords: Query-by-ming, feaure exracion, majoriy voing, muliple classifiers, MD-DTW, SHR. Received February 8, 2012; acceped May 22, 2012; published online January 29, 2013 1. Inroducion A presen, he music becomes par of our lives boh lisening and singing o enerain and relax ourselves. The prevalen of problem, mos users forge he name of he song, bu hey wan o find a song for lisening and singing. However, radiional approaches for rerieving music daa were based on he exual informaion such as iles, composers, file names or singers. Because of heir incompleeness, here are many difficulies in saisfying paricular requiremens of applicaions. Therefore, many researchers have proposed echniques o query a song base on humming which is called Query-By-ming (QBH) sysem [2, 13, 14, 20, 21, 26]. QBH sysem allows he user o rerieve an inended song based on humming some par of he song. The general framework of QBH sysem conains hree main componens, which are query processing module, melody daabase and maching engine [2]. Firsly, he sysem handles he Musical Insrumen Digial Inerface (MIDI) in daabase. Subsequenly, he sysem process he users inpu humming signal hen exrac signal fundamenal frequency, humming query is convered ino melody represenaion. Finally, when a search is iniiaed, melody represenaion is used o mach agains he melody in he feaure daabase, according o heir similariies and reurn a rank lis of songs. Normally, naural sounds are a composiion of a fundamenal frequency wih a se of harmonics. The frequency ha he human ear inerpres as he pich of a sound is his fundamenal frequency, even if i is absen in he sound. The pich of naural sounds is imporan in many conexs. Pich is he percepion of how high or low a musical noe sounds, which can be considered as a frequency which corresponds closely o he fundamenal frequency or main repeiion rae in he signal [15]. I is one of he mos imporan parameers in he voice signal analysis and can be deermined by he fundamenal frequency of he uni frame [6]. For QBH sysem, pich is he key feaure of melody. As he humming sound consiss of noise, pich needs o be exraced and in order o ge he mos significan informaion. 2. Relaed Works For early work, pich is used in QBH as a feaure [2, 8, 14]. There are many echniques o analyse and

104 The Inernaional Arab Journal of Informaion Technology, Vol. 11, No. 1, January 2014 exrac pich conour, pich inerval and duraion from voice humming query [2]. In general, radiional mehods for deecing piches have been proposed in he pas, i can be divided roughly ino wo domains o idenify he pich: ime-domain based, frequency domain based [11, 15]. Pich and fundamenal frequency are imporan feaures, herefore i mus be exraced pich. A Pich Deerminaion Algorihm (PDA) based on Subharmonic-o-Harmonic Raio (SHR) is developed in he frequency domain and describes he ampliude raio beween subharmonics and harmonics [23, 24]. For our sysem, we have implemened pich racking using SHR. The Mel Frequency Cepsral Coefficiens (MFCC) was adoped in many speech analysis applicaions. This ype of feaure exracion is being widely used in robus speech recogniion sysems inspired by human audiory percepion and focusing on effecive signal processing in he ear using cochlear filerbanks [1]. MFCCs were also used as feaures [7, 12]. From hese experimens i shows ha using MFCC wih he dimension 13 and audio recogniion will give beer resuls han oher dimensions. MFCC is used in our pre-processing. Generally, o gaher all aribues o use all a once migh no give good resul. Some feaures are appropriae bu some are no. However, we need o find many classifiers o help wih he resul. To improve he performance of he sysem, here are many researches used a lo of informaion, such as pich, duraion, rhyhm, iner-onse inerval, sar and end ime o be muually considered and make feaure in [10, 14]. The mos ofen used classifiers combinaion approaches in Muliple Classifiers Sysem (MCS) include classifier selecion, he majoriy voing, he weighed combinaion (weighed averaging), he probabilisic schemes, various rank-ordered rules and ec., [4]. Besides, MCS and majoriy voe is applied for off-line Arabic handwriing recogniion, he accuracy is higher han individual classifier [9]. Therefore, MCS will be used o find he resul and he easies way o do is majoriy voing. However, he feaure sill has variable lengh in he form of melody conour, hence he radiional Dynamic Time Warping (DTW) canno be used for his feaure. Muli-Dimensional Dynamic Time Warping ime series (MD-DTW) algorihm was proposed for DTW on muli-dimensional ime series, which he algorihm uilises all dimensions o find he bes synchronizaion [3]. Muli-dimensional (ime) series are series in which muliple measuremens are made simulaneously. MD- DTW is applied wih image exure [19], gesure recogniion [3], ime series [25], hus we have an idea o apply his o he QBH. The segmen of a noe in he humming waveform is model by a Hidden Markov Model (HMM) while he pich of he noe is model by a pich model using a Gaussian mixure model. The frame based analysis is performed on a noe segmen which usually has several frames. Muliple frames of a segmened noe are used for pich model analysis. Afer applying auocorrelaion o hose frames, pich feaures are exraced. The firs sage of he proposed algorihm is noe segmenaion, where he process of segmening noes of a humming piece is conduced. Firs, a feaure se which can characerize a noe is chosen. Nex, he HMM definiion is chosen before raining. During he raining phase, noes phone level HMMs are rained using he seleced feaure se. The rained noe models are hen used by he noe decoder for noe segmenaion. Finally, he duraion of a segmened noe is label according o is relaive duraion change [5, 18, 19, 22]. 3. Maerials and Mehods 3.1. Melody Conour Exracion Algorihm The following algorihm describes how o exrac pich from humming sound o obain he melody conour. Melody Conour Exracion, we have proposed in [16]. Le m represens melody conour and le p be he pich. The variables of algorihm are described as follows: s is he size of he window for filering, g is he gap of pich difference, T is he hreshold of sandard deviaion, and v is he variance of pich inerval. The Algorihm proceeds as follows: Require: p, g, T, s Ensure: m Sep 1: Smoohing p by median filer. Sep 2: Iniial m 1 p 1 Sep 3: N lengh of p Sep 4: j 1 Sep 5: While N do d= p p -1 Y {p -v,p -v+1,,p +v-1,p +v } S Y Sandard deviaion of Y If d>g and S Y <T hen m j p End if +s j j+1 End while Sep 6: Reurn m The firs sep of his echnique is o ake a pich o pass hrough he noise filering process which uses he median filer in order o make he signal go smoohly. Then, find he differen value of p by comparing wih he defined g value by selecing only he exceed value. The value of s is deermined in order o apply o find he range of signal ha changes a lile a ha period of ime. In oher words, i discards he signal ha changes rapidly in a shor ime comparing wih his inerval. There is he spread around he signal and i only needs he group of significan signals. Hence, i finds he range of signal which has a small value of he spread when comparing wih he hreshold of

An Improved Feaure Exracion and Combinaion of Muliple Classifiers for Query-by-ming 105 sandard deviaion (T) as shown in Figure 1. The oupu of he algorihm melody conour conains significan pich. Finally, when his echnique is applied o rerieval ask, i o do rerieval process, he resul will be more correc han he radiional mehod. Pich (Hz) Figure 1. Example of pich exracion by melody conour exracion. 3.2. Noe Segmenaion by ming Sound For his paper, we have proposed he mehod of noe segmenaion by humming sound o differeniae he sounds par from he silence pars in order o choose he mos imporan par, which is he sound par, o use in he nex process [17]. From he sound wave in Figure 2, he silence inerval is removed manually as preprocessing before being fed o he HMM. Figure 2. Sound wave from humming sandard noe in C major scale (do, re, me,...,do). Figure 3. Noe model and silence model. As shown in Figure 3, he HMM conain 3 saes wih lef-o-righ opology using 2 Gaussian mixure disribuions. Boh he noe and he silence are used o rain hese HMMs. 3.3. Muli-Dimensional Dynamic Time Warping Muli-dimensional series consis of a number of measuremens made a each insance. The number of measuremens is he dimensionaliy of he series, he number of ime insances is lengh. Noe ha mulidimensional series need no be ime signals, any siuaion in which several measuremens are made simulaneously depending on one variable ha gives a mulidimensional series. They assume ha measuremens are sored in a marix, in which columns are feaures and rows are ime insances. MD-DTW was proposed [3] as an approach o calculae he DTW by synchronizing mulidimensional series, which is basically an exension of he original DTW, where he marix D is creaed by compuing he disance beween k-dimensional poins (where, differenly from he original approach, k can be larger han This approach pre-processes he mulidimensional series, which mus have he same number of dimensions. The las sep of his algorihm is he execuion of he radiional DTW. However, in many cases, all dimensions will conain informaion needed for synchronisaion herefore proposes MD-DTW for synchronising such series. The MD-DTW algorihm runs in 4 seps: Le A, B be wo series of dimension K and lengh M, N, respecively. Sep 1: Normalize each dimension of A and B separaely o a zero mean and uni variance. Sep 2: If desired, smooh each dimension wih a Gaussian filer. Sep 3: Fill he M by N disance marix D according o: D i j K A i k B j k k= 1 (, ) = ( ) ( ) Sep 4: Use his disance marix o find he bes synchronizaion wih he regular DTW algorihm. Take wo series A and B. DTW involves he creaion of a marix in which he disance beween every possible combinaion of ime insances A(i) B(j) is sored. This disance is calculaed in erms of he feaure values of he poins. Various norms are possible. In 1D-DTW, he disance is usually calculaed by aking he absolue or he squared disance beween he feaure values of each combinaion of poins. For MD-DTW, a disance measure for wo K- dimensional poins mus be calculaed. This disance can be any p-norm. They use he 1-norm, i.e., he sum of he absolue differences in all dimensions. To combine differen dimensions in his way, i is necessary o normalize each dimension o a zero mean and uni variance. For his, he dimensions mus be comparable. If for insance one dimension conains real valued measuremens and one is binary, comparing hem direcly is no possible and a more sophisicaed disance measure mus be found [3, 19].

106 The Inernaional Arab Journal of Informaion Technology, Vol. 11, No. 1, January 2014 4. Our Approach In our approach, i consiss of wo seps which are preprocessing and processing. Pre-processing is process of noe segmenaion by humming sound. Processing, i consiss of feaure exracion and sof majoriy voing. F 4. Sring numeric relaive (UDR): 0, p p 1 < ε udr( p) = 1, p p 1 > ε 2, p p 1 < ε 4.2. Feaure Exracion of MIDI Conducing feaure exracion of MIDI has four approaches as following: Normalized pich: P p N p MIDI 1 1 (4) (5) Normalized duraion of ime: MIDI 2 P T N T ime Normalized duraion of pich: MIDI 3 P p N p ime (6) (7) Figure 4. Block diagram of our approach. We propose wo mehods in our framework as shown in Figure 4 which are feaure exracion secions 4.1, 4.2 and 4.3 and majoriy voing exracion secion 4.4. Our framework sars from pre-processing by using a feaure o faciliae noe segmenaion by a humming sound. The process consiss of four seps as follows: Firsly, he MIDI is already a sequence of pich while he pich in humming sound is needed o exrac by SHR [23, 24]. Consequenly, he pich is exraced by our new feaure exracion mehod and hen muliple classifiers are applied o classify he muliple subses of hese feaures. Aferwards, MD-DTW is used for similariy measuremen. Finally, Neares Neighbors (NN) and sof majoriy voing are used o obain he rerieval resuls in case of equal scores. 4.1. Feaure Exracion In his process, he principle funcion used for making feaure exracion of inpu (humming sound) and reference songs in daabase MIDI. F 1. Normalized pich: N 1 ( p ) log p log = logσ p p F 2. Normalized duraion of ime: N ime ( T ) = T T where represens noe duraions in seconds and T is he summaion of duraion ime. F 3. Melody conour exracion (Melslope): M p melslope p (1) (2) (3) Sring numeric relaive (UDR): MIDI 4 P p udr p 4.3. Feaure Exracion of Inpu Conducing feaure exracion of hum has six approaches as following: Normalized pich: P p N p 1 1 Normalized duraion of ime: 2 P p N T ime Normalized duraion of pich: 3 P p N p ime Sring numeric relaive (UDR): 4 P p udr p Melody conour exracion (Melslope): 7 P p M p Melslope of pich pass noe segmenaion: 8 P pseg M pseg where pseg represens pich passed noe segmenaion. The MIDI is already a sequence of pich while he pich in humming sound is needed o exrac by SHR [23, 24]. Due o he difference characerisic of MIDI and humming sound, he feaure exracion P 1 o P 6 are performed o boh of hem as P 7 o P 10 are only used for humming sound. While P1, P8,, P 10 (8) (9) (10) (11) (12) (13) (14)

An Improved Feaure Exracion and Combinaion of Muliple Classifiers for Query-by-ming 107 MIDI MIDI MIDI MIDI are compared wih P1, P1, P5, P, 6 respecively. Aferwards, he exraced pich can be used o calculae all above aribues and hen muliple classifiers are applied o classify he muliple subses of hese feaures. In case of he subse conain he muliple aribues, MD-DTW is used insead of DTW for similariy measuremen. Finally, NN and sof majoriy voing are used o obain he rerieval resuls. 4.4. Sof Majoriy Voing Majoriy voing mehod is widely used in many asks classificaion. Voing is a mehod for a group o make a decision. By he principle of voing, in general, he final decision is based on highes score. Neverheless, in erms of equal voe, here are many ways of making decision, depending on paricular siuaion. Thus, we propose o make imporan decisions if he voe is equal, based on he principle of minimum disance, which i is called sof majoriy voing. used 100 humming sounds from differen people o es our sysem. The recording was done a 8kHz sampling rae, mono and ime duraion 10seconds, sar a he beginning of song. The resuls are showed ha when he number of MIDI in daabase is smaller, he accuracy rae is higher. 5.2. Variaion of Feaure Ses In his paper, some single aribue are used for creaing muliple aribues, such as P1, P2, P4 and P7, as described in Tables 1 and 2. Experimens have shown he effeciveness of he sysem and according o he various condiions such as he variaions of number of songs in daabase, feaure exracion, op-n rank and combinaion of feaure. Name P1 P2 P3 P4 P7 P8 Table 1. Lis of single aribue. Single Aribue Descripion Normalized Pich Normalized Duraion Time Normalized Duraion of Pich Sring Numeric Relaive (UDR) Melody Conour Exracion (Melslope) Melslope of Pich Pass Noe Segmenaion Table 2. Lis of muliple aribues. Name P5 P6 Muliple Aribues Descripion P1, P2 P1, P4 P2, P7 P4, P7 Figure 5. Sof majoriy voing mehod. From Figure 5, if here is he only one highes score, i will reurn he class ha has he highes score as he resul. Bu if here are muliple highes scores, all members will be reconsidered minimum disance by sof majoriy voing mehod. The principle of sof majoriy voing makes all members reconsidered hrough disance. 5. Resuls We have conduced exensive experimens o measure rerieval performance in erms of accuracy. Experimens have shown he effeciveness of he sysem and according o he various condiions. For effeciveness of his sysem, he measures were seup o explore such as he variaion of number of songs in daabase, feaure exracion, op-n rank and combinaion of feaure. 5.1. Daase In his daase, here are 500 MIDI forma songs and hey are divided ino hree subses which are 100, 300 and 500. We used 100 ess humming sound o query songs in daabase. The es query is a humming sound which consiss of hummed unes wih Da Da Da. We Table 3. Lis of feaure. Feaure Aribues Feaure Aribues Feaure Aribues 1 P1 14 P2 P3 P8 27 P1 P3 P6 P8 2 P2 15 P2 P4 28 P1 P4 P6 P8 3 P3 16 P2 P4 P6 29 P1 P6 P8 4 P4 17 P2 P6 30 P2 P3 P4 P6 5 P5 18 P2 P8 31 P2 P5 P6 P8 6 P6 19 P3 P6 P8 32 P2 P4 P8 7 P7 20 P4 P6 P8 33 P3 P4 P5 P6 8 P8 21 P1 P2 P4 P4 P5 P6 P8 34 P6 9 22 P1 P2 P3 P3 P4 P8 35 P8 10 23 P1 P4 P6 P2 P4 P5 P6 36 P8 P8 11 P2 P4 24 P2 P4 P8 P3 P4 P5 P6 37 12 P2 P6 25 P2 P7 P8 P1 P2 P3 P4 38 P5 P6 P8 13 P1 P6 26 P3 P5 P8 P2 P3 P4 P6 39 P8 In his experimen, he number of classifiers was varied from wo o en. The feaure ses for each classifier are defined in Table 4. I is fixed as P7 in every classifiers. Since, our experimen, we have found ha combining aribue P7 wih oher feaures can achieve 100% accuracy rae, which faser han using random mehod. The deail of classifier is used in experimenal as shown in Tables 3 and 4. Table 3

108 The Inernaional Arab Journal of Informaion Technology, Vol. 11, No. 1, January 2014 shows he aribue of each feaure. Table 4 conains a feaure se ha is used as each classifier. Table 4. Lis of classifiers. # Classifier Feaures Se 2 7 20 3 7 25 27 4 7 21 24 37 5 7 14 15 16 28 6 7 17 18 19 29 33 7 7 11 12 13 14 23 33 8 7 11 12 13 14 22 23 33 9 7 11 12 13 14 22 23 32 33 10 7 6 13 26 30 31 35 36 38 39 The performance evaluaions vary op-n from op-1 o op-60. The experimens are shown in Tables 5, 6 and 7 for each daase. Table 5. Tes resuls of experimen wih 100 MIDI songs wih variaions of feaure ses. Top-n Rae(%) Classifier 2 3 4 5 6 7 8 9 10 1 56 67 74 73 71 72 72 71 71 5 91 90 94 96 91 87 91 91 94 10 97 96 97 100 94 96 96 96 97 15 97 97 99 100 97 97 97 98 97 20 97 99 100 100 99 99 99 99 99 25 97 100 100 100 100 100 100 100 99 30 97 100 100 100 100 100 100 100 100 35 98 100 100 100 100 100 100 100 100 40 98 100 100 100 100 100 100 100 100 45 99 100 100 100 100 100 100 100 100 50 99 100 100 100 100 100 100 100 100 55 99 100 100 100 100 100 100 100 100 60 100 100 100 100 100 100 100 100 100 Table 6. Tes resuls of experimen wih 300 MIDI songs wih variaions of feaure ses. Top-n Classifier Rae(%) 2 3 4 5 6 7 8 9 10 1 50 63 70 71 67 66 67 68 61 5 81 84 90 85 85 85 84 84 85 10 95 93 94 98 93 90 90 91 94 15 97 94 95 98 93 93 95 95 96 20 97 97 97 100 95 97 96 96 97 25 97 98 98 100 98 98 96 97 97 30 97 99 99 100 99 98 99 98 97 35 97 99 99 100 100 100 100 100 97 40 97 99 100 100 100 100 100 100 98 45 97 99 100 100 100 100 100 100 99 50 97 100 100 100 100 100 100 100 100 55 97 100 100 100 100 100 100 100 100 60 97 100 100 100 100 100 100 100 100 Table 7. Tes resuls of experimen wih 500 MIDI songs wih variaions of feaure ses. Top-n Classifier Rae(%) 2 3 4 5 6 7 8 9 10 1 42 58 65 64 63 62 65 61 56 5 78 82 86 84 80 78 76 78 81 10 86 91 91 89 87 86 87 87 88 15 93 94 92 94 93 90 91 92 92 20 95 95 93 97 93 93 93 95 94 25 96 95 93 98 93 95 96 95 94 30 96 97 93 99 96 97 97 97 94 35 96 97 95 100 98 97 97 98 94 40 97 97 97 100 98 97 98 98 95 45 97 97 98 100 100 99 100 99 96 50 97 97 99 100 100 100 100 100 97 55 97 97 100 100 100 100 100 100 99 60 97 98 100 100 100 100 100 100 100 The resuls of using five classifiers give he bes performance in all daases. Tha is, i can achieve 100% of op-n which are op-10 in case of 100 MIDI songs, op-20 in case of 300 MIDI songs, and op-35 in case of 500 MIDI songs. The feaure se of five classifiers consiss of feaures, 7, 14, 15, 16, 28, as shown in Table 4. Tha is he ses of aribues {P7}, {P2, P3, P8}, {P2, P4, }, {P2, P4, P6} and {P1, P4, P6, P8, }, as shown in Table 3. From his resul, we found ha P5 is no included in his se. While P8, which noe segmenaion, was processed, is employed. Feaure ses ha use wo classifiers can achieve 100% accuracy rae a op 60 or feaure se of en classifiers can achieve 100% accuracy rae a op-30 for MIDI 100 songs in daabase, as shown in Table 5. Meanwhile, if MIDI songs in daabase increase feaure se of 2 classifiers can only achieve 97% accuracy rae while feaure se of 10 classifiers can achieve 100% accuracy rae a op-50 and op-60 for MIDI 300 and 500 songs in daabase, as shown in Tables 6 and 7 and Figures 6, 7 and 8. Accuracy Rae (%) Top-n Rank Figure 6. The performance of feaure ses wih 100 MIDI songs. Accuracy rae (%) Top-n Rank Figure 7. The performance of feaure ses wih 300 MIDI songs. Accuracy Rae (%) Top-n Rank Figure 8. The performance of feaure ses wih 500 MIDI songs. In addiion, query ime is used o measure he complexiy of our proposed echnique, as shown in

An Improved Feaure Exracion and Combinaion of Muliple Classifiers for Query-by-ming 109 Table 8. We performed all he ess on a noebook wih a CPU of Inel Core 2 Duo processor 2.26GHz, 2GB of RAM. Normally, MCS wih more classifiers ake more query imes. Table 8. Tes resuls of query ime. # Classifier Query Time (Second) 100 MIDI Songs 300 MIDI Songs 500 MIDI Songs 2 1.39 5.84 10.00 3 2.63 7.99 13.73 4 3.33 10.14 17.50 5 4.10 12.24 21.14 6 4.73 14.46 24.90 7 5.42 16.69 28.76 8 6.15 18.86 32.58 9 6.77 20.89 36.16 10 7.50 22.97 39.84 6. Conclusions In his paper, we propose new mehod for feaure exracion and sof majoriy voing o make imporan decision if he voe is equal in applicaion of QBH. Our approach consiss of wo processes which make humming sound go hrough noe segmenaion and hen exrac he feaure o creae many feaure ses by using six approaches for inpu and four approaches for MIDI. The main feaure we use in each se, i obains from melody conour exracion algorihm, which we have proposed earlier. Nex, sof majoriy voing will be used for making a decision o choose he bes resul. The advanage of our approach is o increase efficiency and accuracy for rerieving daa. From he use of muliple classifiers sysem by using sof majoriy voing as we have proposed, if he score is equal, all he members will ge o reconsider by finding minimum disance, which we can look a his as an advanage. Moreover, using more han one feaure can achieve beer accuracy rae han one feaure because of including more informaion and obaining muliple aspecs of ha. From he experimens, using feaure se which consiss of 5 classifiers will ge 100% accuracy a he early op-n rank in rerieving. Neverheless, using a greaer number of classifiers makes he sysem higher complexiy and longer query ime. Acknowledgemens This work was assised by Suan Dusi Rajabha Universiy hrough suppor wih a scholarship and Rangsi Universiy by providing he laboraory room for daa processing. We would like o hanks all people who fain hummed a lo of unes for us. Addiionally, he invaluable recommendaion and supervision from he anonymous reviewers are much appreciaed. References [1] Behroozmand R. and Almasganj F., Comparison of Neural Neworks and Suppor Vecor Machines Applied o Oimized Feaures Exraced from Paiens Speech Signal or Classificaion of Vocal Fold Inflammaion, in Proceedings of he 5 h IEEE Inernaional Symposium on Signal Processing and Informaion Technology, Ahens, pp. 844-849, 2005. [2] Ghias A., Logan J., Chamberlin D., and Smih B., Query by ming: Musical Informaion Rerieval in an Audio Daabase, in Proceedings of he 3 rd ACM Inernaional Conference on Mulimedia, New York, pp. 231-236, 1995. [3] Hol A., Reinders T., and Hendriks A., Muli- Dimensional Dynamic Time Warping for Gesure Recogniion, in Proceedings of he 13 h Annual Conference of he Advanced School for Compuing and Imaging, Holland, pp. 23-32, 2007. [4] Jiangao H. and Minghui W., Dynamic Combinaion of Muliple Classifiers Based on Normalizing Decision Space, in Proceedings of WASE Inernaional Conference on Informaion Engineering, Beidaihe, vol. 1, pp. 149-153, 2010. [5] Jing Q., Wang X., Zhou M., and Liu X., A Novel MIR Approach Based on Dynamic Thresholds Segmenaion and Weighed Synhesis Maching, in Proceedings of IET Conference on Wireless, Mobile and Sensor Neworks, Shanghai, pp. 1017-1020, 2007. [6] Jun B., Rho S., and Hwang E., An Efficien Voice Transcripion Scheme for Music Rerieval, in Proceedings of Inernaional Conference on Mulimedia and Ubiquious Engineering, Seoul, pp. 366-371, 2007. [7] Kim H. and Sikora T., Audio Specrum Projecion Based on Several Basis Decomposiion Algorihms Applied o General Sound Recogniion and Audio Segmenaion, in Proceedings of he 13 h European Signal Processing Conference, Ausria, pp. 1047-1050, 2004. [8] Kosugi N., Nishihara Y., Sakaa T., Yamamuro M., and Kushima K., A Pracical Query-By- ming Sysem for a Large Music Daabase, in Proceedings of he 8 h ACM Inernaional Conference on Mulimedia, New York, pp. 333-342, 2000. [9] Leila C., Maamai K., and Salim C., Combining Neural Neworks for Arabic Handwriing Recogniion, Inernaional Arab Journal of Informaion Technology, vol. 9, no. 6, pp. 588-595, 2011. [10] Lemsrom K., Laine P., and Peru S., Using Relaive Inerval Slope in Music Informaion Rerieval, in Proceedings of he Inernaional Compuer Music Associaion, China, pp. 317-320, 1999. [11] Li P., Zhou M., Wang X., and Li N., A Novel MIR Sysem Based on Improved Melody

110 The Inernaional Arab Journal of Informaion Technology, Vol. 11, No. 1, January 2014 Conour Definiion, in Proceedings of Inernaional Conference on Mulimedia and Informaion Technology, China, pp. 409-412, 2008. [12] Liu Y., Xu J., Wei L., and Tian Y., The Sudy of he Classificaion of Chinese Folk Songs by Regional Syle, in Proceedings of Inernaional Conference on Semanic Compuing, Irvine, pp. 657-662, 2007. [13] Lu L., You H., and Zhang H., A New Approach o Query by ming in Music Rerieval, in Proceedings of Inernaional Conference on Mulimedia and Expo, Tokyo, pp. 595-598, 2001. [14] McNab R., Smih L., Wien I., Henderson C., and Cunningham S., Towards he Digial Music Library: Tune Rerieval from Acousic Inpu, in Proceedings of he 1 s ACM Inernaional Conference on Digial Libraries, New York, pp. 11-18, 1996. [15] Mcnab R., Smih L., and Wien I., Signal Processing for Melody Transcripion, in Proceedings of he 19 h Ausralasian Compuer Science Conference, Ausralia, pp. 301-307, 1996. [16] Phiwma N. and Sanguansa P., A Music Informaion Sysem Based on Improved Melody Conour Exracion, in Proceedings of Inernaional Conference on Signal Acquisiion and Processing, Bangalore, pp. 85-89, 2010. [17] Phiwma N. and Sanguansa P., An Improved Noe Segmenaion and Normalizaion for Queryby-ming, Rangsi Journal of Ars and Sciences, vol. 1, no. 2, pp. 139-148, 2011. [18] Raphael C., Auomaic Segmenaion of Acousic Musical Signals using Hidden Markov Models, IEEE Transacions on Paern Analysis and Machine Inelligence, vol. 21, no. 4, pp. 360-370, 1998. [19] De-Mello R. and Gondra I., Muli-Dimensional Dynamic Time Warping for Image Texure Similariy, in Proceedings of he 19 h Brazilian Symposium on Arificial Inelligence Savador, Brazil, vol. 5249, pp. 23-32, 2008. [20] Ryyndnen M. and Klapuri A., Query by ming of MIDI and Audio using Localiy Sensiive Hashing, in Proceedings of Inernaional Conference on Acousics, Speech and Signal Processing, Las Vegas, pp. 2249-2252, 2008. [21] Shih H., Narayanan S., and Kuo C., A Saisical Mulidimensional ming Transcripion using Phone Level Hidden Markov Models for Query by ming Sysems, in Proceedings of Inernaional Conference on Mulimedia and Expo, USA, vol. 1, pp. 61-64, 2003. [22] Shih H., Narayanan S., and Kuo C., Mulidimensional ming Transcripion using A Saisical Approach for Query by ming Sysems, in Proceedings of Inernaional Conference on Acousics, Speech, and Signal Processing, China, vol. 5, pp. 541-544, 2003. [23] Sun X., A Pich Deerminaion Algorihm Based on Subharmonico-Harmonic Raio, in Proceedings of he 6 h Inernaional Conference of Spoken Language Processing, USA, pp. 676-679, 2000. [24] Sun X., Pich Deerminaion and Voice Qualiy Analysis using Subharmonic-o-Harmonic Raio, in Proceedings of Inernaional Conference on Acousics, Speech, and Signal, Orlondo, vol. 1, pp. 333-336, 2002. [25] Vlachos M., Hadjielefheriou M., Gunopulos D., and Keogh E., Indexing Muli-Dimensional Time-Series wih Suppor for Muliple Disance Measures, in Proceedings of he 9 h ACM SIGKDD Inernaional Conference on Knowledge Discovery and Daa Mining, Springer-Verlag, pp. 216-225, 2003. [26] Zhu Y., Xu C., and Kankanhalli M., Melody Curve Processing for Music Rerieval, in Proceedings of Inernaional Conference on Mulimedia and Expo, Tokyo, pp. 285-288, 2001. Naha Phiwma received her PhD degree in informaion echnology from Rangsi Universiy, Thailand in 2011. She is an assisan professor in he Deparmen of Compuer Science a Suan Dusi Rajabha Universiy, Thailand. Her research areas are music informaion rerieval and digial signal processing. Parinya Sanguansa received his B.Eng, M.Eng. and PhD degrees in elecrical engineering from he Chulalongkorn Universiy, Thailand. He is an assisan professor in he Faculy of Engineering and Technology, Panyapiwa Insiue of Managemen, Thailand in 2001, 2004 and 2007 respecively. His research areas are digial signal processing in paern recogniion including on-line handwrien recogniion, face and auomaic arge recogniion.