Discrete Word Speech Recognition Using Hybrid Self-adaptive HMM/SVM Classifier

Similar documents
Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

Pointwise Image Operations

Variation Aware Cross-Talk Aggressor Alignment by Mixed Integer Linear Programming

Laplacian Mixture Modeling for Overcomplete Mixing Matrix in Wavelet Packet Domain by Adaptive EM-type Algorithm and Comparisons

Memorandum on Impulse Winding Tester

Negative frequency communication

Comparing image compression predictors using fractal dimension

Square Waves, Sinusoids and Gaussian White Noise: A Matching Pursuit Conundrum? Don Percival

Notes on the Fourier Transform

A New and Robust Segmentation Technique Based on Pixel Gradient and Nearest Neighbors for Efficient Classification of MRI Images

Knowledge Transfer in Semi-automatic Image Interpretation

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

A Segmentation Method for Uneven Illumination Particle Images

Lecture 4. EITN Chapter 12, 13 Modulation and diversity. Antenna noise is usually given as a noise temperature!

The regsubseq Package

A new image security system based on cellular automata and chaotic systems

Role of Kalman Filters in Probabilistic Algorithm

A NEW DUAL-POLARIZED HORN ANTENNA EXCITED BY A GAP-FED SQUARE PATCH

Performance Analysis of High-Rate Full-Diversity Space Time Frequency/Space Frequency Codes for Multiuser MIMO-OFDM

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

Deblurring Images via Partial Differential Equations

ECE-517 Reinforcement Learning in Artificial Intelligence

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

P. Bruschi: Project guidelines PSM Project guidelines.

Chapter 2 Summary: Continuous-Wave Modulation. Belkacem Derras

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Revision: June 11, E Main Suite D Pullman, WA (509) Voice and Fax

UNIT IV DIGITAL MODULATION SCHEME

Lecture #7: Discrete-time Signals and Sampling

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

Signal Characteristics

Lecture September 6, 2011

Power losses in pulsed voltage source inverters/rectifiers with sinusoidal currents

4 20mA Interface-IC AM462 for industrial µ-processor applications

SPEAKER IDENTIFICATION USING MODULAR RECURRENT NEURAL NETWORKS. M W Mak. The Hong Kong Polytechnic University

OpenStax-CNX module: m Elemental Signals. Don Johnson. Perhaps the most common real-valued signal is the sinusoid.

Activity Recognition using Hierarchical Hidden Markov Models on Streaming Sensor Data

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

An Improved Feature Extraction and Combination of Multiple Classifiers for Query-by-Humming

Mobile Communications Chapter 3 : Media Access

A neurofuzzy color image segmentation method for wood surface defect detection

Digital Communications - Overview

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Comparitive Analysis of Image Segmentation Techniques

Prediction of Pitch and Yaw Head Movements via Recurrent Neural Networks

Particle Filtering and Sensor Fusion for Robust Heart Rate Monitoring using Wearable Sensors

Acquiring hand-action models by attention point analysis

Answer Key for Week 3 Homework = 100 = 140 = 138

ARobotLearningfromDemonstrationFrameworktoPerform Force-based Manipulation Tasks

5 Spatial Relations on Lines

Adaptive Approach Based on Curve Fitting and Interpolation for Boundary Effects Reduction

Increasing Measurement Accuracy via Corrective Filtering in Digital Signal Processing

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

BRIEF PAPER Accurate Permittivity Estimation Method for 3-Dimensional Dielectric Object with FDTD-Based Waveform Correction

ELEG 3124 SYSTEMS AND SIGNALS Ch. 1 Continuous-Time Signals

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

MAP-AIDED POSITIONING SYSTEM

AN303 APPLICATION NOTE

FROM ANALOG TO DIGITAL

f t 2cos 2 Modulator Figure 21: DSB-SC modulation.

Jitter Analysis of Current-Mode Logic Frequency Dividers

Abstract. 1 Introduction

THE OSCILLOSCOPE AND NOISE. Objectives:

MATLAB/SIMULINK TECHNOLOGY OF THE SYGNAL MODULATION

Dead Zone Compensation Method of H-Bridge Inverter Series Structure

The design of an improved matched filter in DSSS-GMSK system

The University of Melbourne Department of Mathematics and Statistics School Mathematics Competition, 2013 JUNIOR DIVISION Time allowed: Two hours

Monaural Speech Separation

EECE 301 Signals & Systems Prof. Mark Fowler

Experiment 6: Transmission Line Pulse Response

ACTIVITY BASED COSTING FOR MARITIME ENTERPRISES

ISSCC 2007 / SESSION 29 / ANALOG AND POWER MANAGEMENT TECHNIQUES / 29.8

CHAPTER CONTENTS. Notes. 9.0 Line Coding. 9.1 Binary Line Codes

Study and Analysis of Various Tuning Methods of PID Controller for AVR System

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

TRIPLE-FREQUENCY IONOSPHERE-FREE PHASE COMBINATIONS FOR AMBIGUITY RESOLUTION

EE123 Digital Signal Processing

Wrap Up. Fourier Transform Sampling, Modulation, Filtering Noise and the Digital Abstraction Binary signaling model and Shannon Capacity

Electrical connection

Sketch-based Image Retrieval Using Contour Segments

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

March 13, 2009 CHAPTER 3: PARTIAL DERIVATIVES AND DIFFERENTIATION

An Emergence of Game Strategy in Multiagent Systems

Key Issue. 3. Media Access. Hidden and Exposed Terminals. Near and Far Terminals. FDD/FDMA General Scheme, Example GSM. Access Methods SDMA/FDMA/TDMA

Double Tangent Sampling Method for Sinusoidal Pulse Width Modulation

Two-area Load Frequency Control using IP Controller Tuned Based on Harmony Search

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

The student will create simulations of vertical components of circular and harmonic motion on GX.

Research Article Optimization of Fixed Microphone Array in High Speed Train Noises Identification Based on Far-Field Acoustic Holography

Parameters Affecting Lightning Backflash Over Pattern at 132kV Double Circuit Transmission Lines

A new method for classification and characterization of voltage sags

A Simple Method to Estimate Power Losses in Distribution Networks

Announcement. Allowed

Robust speech recognition using harmonic features

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

A Multi-model Kalman Filter Clock Synchronization Algorithm based on Hypothesis Testing in Wireless Sensor Networks

Location Tracking in Mobile Ad Hoc Networks using Particle Filter

Transcription:

Journal of Technical Engineering Islamic Azad Universiy of Mashhad Discree Word Speech Recogniion Using Hybrid Self-adapive HMM/SVM Classifier Saeid Rahai Quchani (1) Kambiz Rahbar (2) (1)Assissan professor, Islamic Azad Universiy of Mashhad, IRAN (2) M.S. Sashiraz elecro-opic and laser echnology research cener, Shiraz, IRAN Received: 3 June 2006; Reviewed: 10 July 2007; Acceped: 12 Ocober 2007 Absrac This research addresses independen speaker s discree word speech recogniion (DWSR) using hybrid Self-adapive Hidden Markov Model/Suppor Vecor Machine (SA- HMM/SVM) classifier. Our proposed mehod includes wo main unis: preprocessing uni, and classificaion uni. The firs uni ries o frame he speech wave ino proper segmens and exrac ime-frequency relevan feaures in a way o maximize relaive enropy of ime-frequency energy disribuion among segmens, and he second uni classifies words wihin he proper classes. To fulfill his goal, SA-HMM calculaes word s likelihood o each exising class correspondenly, and finally Suppor Vecor Machine (SVM) classifies i by using all classes likelihood as an inpu vecor. To validae our proposed mehod, we es i wihin our IAUM daase which conains Persian digis uered by Persian speakers. Comparing he resuls wih he oucomes of a similar mehod based on he original HMM shows around 1.2% improvemen. Key Words Discree Word Speech Recogniion, Local Orhogonal Discriminae Bases, Hybrid SVM/Self-adapive HMM classifier Corresponding Auhor: Address: Engineering Deparmen, Islamic Azad Universiy of Mashhad, IRAN Tel: 0511-6613000 Email: rahai@mshdiau.ac.ir

Discree Word Speech Recogniion 08 1. Inroducion Speech recogniion is amongs he subjecs which has been receiving special aenion during recen decades. Generally we can caegorize speech recogniion mehods ino wo main classes: firs, mehods employing meaningful linguisic pars such as words, syllables and phonemes, and second, mehods ha are based on signal processing. Each caegory has is own advanages and disadvanages. Since here are oo many differen views in breaking he words ino syllables or phonemes depending grealy on linguisics feaures, as well as being unidenical in differen languages, our proposed mehod employs signal processing echniques, o prepare relevan feaure vecors. In he classificaion process, usually modern speech recogniion sysems use saisical approaches which are based on he Bayes rule. HMM, as one of hese saisical approaches is one of he mos powerful ools for signal modeling, because i uses he probabiliy disribuion associaed wih each sae o model he emporal variabiliy ha occurs in speech across speakers or phoneic conex via an underlying Markov process [1], [5]. The shorcoming of sysems like HMM is ha he complexiy of he sysem is ypically predefined or chosen hrough a crossvalidaion process. In oher mehods like SVM, he daase iself defines how complehe classifier needs o be. SVM is a new approach for paern recogniion problems wih clear connecion o he underlying saisical learning heory. Also, SVM canno model he emporal srucure of speech efficienly [1]; i always finds a global minimum [2]. Here, our proposed mehod employs hybrid HMM/SVM for classificaion process. However, o have a good classificaion, exracing relevan feaures from a signal is very imporan. I is clear ha, he performance of saisical classifier like HMM is improved by using mehods which reduce he dimensionaliy of he problems wihou loosing imporan informaion. So, here for he problem a hand, jus like Shao e al [12], we ry o selec he feaures bases in he way ha maximizes relaive enropy of ime-frequency energy disribuion among classes. We ry o achieve his goal by using he basis funcions which are well localized in he ime-frequency plane as feaure exracor called modified besbasis menioned by Saio e al [6]. This paper is organized as follows: Secion 2 presens he mehodology of proposed mehod. Sub-secion 2.1 describes local orhogonal discriminae bases. Subsecions 2.2 and 2.3 presen he SA-HMM and SVM respecively. Secion 3 repors and

Preprocessing Uni Classificaion Uniz SA-HMM (9) SA-HMM (8) SA-HMM (7) SA-HMM (6) SA-HMM (5) SA-HMM (4) SA-HMM (3) SA-HMM (2) SA-HMM (1) SA-HMM (0) 08 Saeid Rahai Quchani- Kambiz Rahbar analyzes he experimenal resuls, and finally, secion 4 liss he conclusions. 2. Mehodology Fig.1 summarizes our proposed DWSR. As in our previous work [4], [10] in he firs uni silence is eliminaed from speech signal. To fulfill his goal, wo crieria, i.e. energy and zero crossing, should be me. Wihin his sysem, mean and sandard deviaion of domain and passing rae of zero are esimaed for calculaing he saisical characerisics of background noise. Energy hreshold and passing rae of zero are calculaed by using saisical characerisics and maximum mean of domain in he disance; hreshold mean is used for finding he disance where hreshold ofen passes. I is assumed ha saring and erminaing poins are ou of his disance [5], and [11]. In fac, hese saisical characerisics are used for finding high and low hresholds (ITL, ITU). Therefore, if he signal energy is higher han ITU or beween ITU and ITL, and passing from zero is less han IZCT hreshold, i would be speech. In he reverse siuaion, i would be silence. Afer eliminaing silence from speech, i becomes framed in he way ha each frame could be processed as a speech segmen wih consan properies. In order o keep he frame s saring and erminaing informaion, overlapping is used. Frame s lengh should be as much o keep is saic logical. In oher words, i should be so small o keep semi-saic properies of signal parameers; in case of being wihin he pace alernae limiaions i will lose he energy of semi-saic properies. On he oher hand, he frame should be long enough o keep pace harmonics. For example, if i is chosen much bigger han he pace alernae, some parameers such as energy changes smoohly and herefore canno reflec speech signal properies. Speech voice Silence Deleion Framing Pre-emphasis Hamming Windowing Local Discriminae Bases Segmen Mean Suppor Vecor Machine Fig.1 The general diagram of DWSR

Discree Word Speech Recogniion 08 Afer framing, he frames are passed hrough pre-emphasis filer o eliminae high frequencies of speech signal, drop he properies of specrum boundaries and make he frames smooh, Then each frame is muliplied by Hamming window o minimize he effec of undesirable speech signal boundaries. The lengh of window should be as shor as possible o accelerae he reacion agains he domain variabiliy. Bu, if he window lengh is oo shor, i will no provide a suiable means for producing even energy funcion. Now, we can calculae ime-frequency feaures using Local Discriminae Bases (LDB) uni for each frame. This uni is going o be sudied in secion II. In he Segmen Mean uni for decreasing inpu vecor size we represen each frame wih is mean o prepare proper feaure vecor for classificaion uni. Finally, in classificaion uni en SA- HMMs rained represened for en classes exising in our daase. Each SA-HMM calculaes he amoun of likelihood for inpu feaures, and finally, SVM classifies i by using all classes likelihood as an inpu vecor. 2.1. Local Orhogonal Discriminae Bases (LDB) The wavele packe mehod is a generalizaion of wavele decomposiion (see Fig.2) which offers a richer signal analysis. Wavele packe aoms are waveforms indexed by hree naurally inerpreed parameers: posiion and scale as in wavele decomposiion, and frequency. For a given orhogonal wavele funcion, a library of wavele packes bases is generaed. Each of hese bases offers a paricular way of coding signals, preserving global energy and reconsrucing exac feaures. The wavele packes can hen be used for numerous expansions of a given signal. In he orhogonal wavele decomposiion procedure, he generic sep splis he approximaion coefficiens ino wo pars. Afer spliing, we obain a vecor of approximaion coefficiens and a vecor of deail coefficiens, boh a a coarser scale. The informaion los beween wo successive approximaions is capured in he deail coefficiens. The nex sep consiss of spliing he new approximaion coefficien vecor; successive deails are never re-analyzed. The original bes-basis mehod inroduced by Coifman, e. al.[7] exracs relevan feaures from signal for signal compressing purposes in wo sages: firs by expanding signal ino an orhogonal bases dicionary (i.e., a redundan se of wavele packe bases or local sine/cosine bases having a binary ree srucure) and second by minimizing a cerain

08 Saeid Rahai Quchani- Kambiz Rahbar Signal Low-pass Filer Down-sampling 2 Approximaion coefficiens High-pass Filer Down-sampling 2 Classified Deail word coefficiens Fig. 2. Single-level discree 1-D wavele ransform X 00 X 10 X 11 X 20 X 21 X 22 X 23 X 30 X 31 X 32 X 33 X 34 X 35 X 36 X 37 Fig.3. Wavele packe decomposiion of signal X a deph 3 wih bes seleced ree informaion cos funcion hrough searching his binary ree. Consider he one-dimensional case, saring wih he roo node. The bes ree is calculaed using he following scheme. A node N is spli ino wo nodes N 1 and N 2 if and only if he sum of he enropy of N 1 and N 2 is lower han he enropy of N. This is a local crierion based only on he informaion available a he node N. For insance Fig.3 shows wavele packe decomposiion of signal X a deph hree. The bes ree seleced based on enropy crieria. Several enropy ype crieria can be used. If he enropy funcion is an addiive funcion along he wavele packe coefficiens, his algorihm leads o he bes ree [9]. For classificaion, Saio, e. al. [6] subsiue cerain informaion cos funcion by symmeric relaive enropy which is defined for wo classes as follows: n pi qi J (p,q) pi log qi log (1) q p i 1 i n i 1 n n Where p {p i } i 1 and q {q i } i 1 are he sequences of normalized energy disribuions of signals belonging o each class. The local discriminae basis algorihm (LDB) as described in [6] is as follow: 1. Selecing an orhogonal bases dicionary which specifies QMFs for a wavele package dicionary or selecing he local cosine or sine dicionary 2. For each class, consruc a ime-frequency energy map by: i

Discree Word Speech Recogniion 08 a. Normalizing each signal by he oal energy of all signals of ha class, b. Expanding ha signal ino he ree-srucured subspaces, and accumulaing he signal energy in each coordinae, 3. Compuing he discriminae measure symmeric relaive enropy J among L disribuions ime-frequency energy maps for each node, 4. Pruning he binary ree by eliminaing children nodes where sum of heir discriminae measures is smaller han or equal o he discriminae measure of heir paren, and 5. Ordering and selecing mos discriminae basis vecor by heir discriminaion power for consrucing classifiers. 2.2. Self-Adapive Hidden Markov Model (SA-HMM) There are differen models for signalsmodeling, such as DFA, Mealy, Poisson, ec. which are classified ino wo general caegories: Saisical and Deerminisic models [5]. HMM (,A,B) is one of he saisical signals-modeling schemes which is characerized as follow: π: The vecor of he iniial sae probabiliies, ha conains he probabiliy of he (hidden) model being in a paricular hidden sae a ime = 1. A={a ij }: The sae ransiion marix, holding he probabiliy of a hidden sae given he previous hidden sae. B ={b j (k)}: The confusion marix, conaining he probabiliy of observing a paricular observable sae given ha he hidden model is in a paricular hidden sae. An imporan poin o remember is ha he number of saes in classical HMMs was usually predefined and fixed during raining. The basic philosophy of HMM is ha he signal can be well modeled, if is parameers are carefully and correcly chosen, i.e., successfully rained. We rain HMM wih known samples and finally obain a model ha is he neares o he signal source, in he sense of a predefined crierion; e.g., maximum likelihood (ML) in he Baum Welch raining algorihm. In paern recogniion applicaions, differen signal sources probably have differen sae numbers, hereby canno be well modeled by HMMs wih a fixed sae number. If he numbers of predefined saes are greaer han he real word, hen he raining akes more ime, so i needs more samples o floa addiional saes. On he oher hand, if he number of saes is less han he real word, hen he signal canno be well modeled [3], [8]. According o Self-adapive HMM ( N,,A,B) design, an HMM auomaically maches is saes numbers (N) o he real sae number of he signal source which is being modeled. The idea behind his design is ha he rue saes

08 Saeid Rahai Quchani- Kambiz Rahbar numbers had less enropy han oher false saes numbers [3]. The enropy (H) for he model can be calculaed approximaely by sum of parial enropies, i.e.: H( ) H( ) H(A) H(B) (2) where: N H ( ) j log j (3) j N N H (A) aijlog a ij (4) i j N M H (B) b j (vk )logb j(vk ) (5) j k where M is he number of observable symbols and V {v, v,..., v } is he symbol se. 1 2 M 2.3. Suppor Vecor Machine (SVM) SVM is a learning sysem ha uses a hypohesis space of linear funcions in a high dimensional feaure space o esimae decision surfaces direcly raher han modeling a probabiliy disribuion across raining daa. I uses suppor vecor (SV) kernel o map he daa from inpu space o a high-dimensional feaure space which faciliaes he problem o be processed in linear form. SVs are samples ha have non-zero mulipliers a he end of opimizaion process which is referred o equaion (7). SVM always finds a global minimum because i usually ries o minimize a bound on he srucural risk, raher han he empirical risk [2]. Empirical risk is defined as measured mean error rae on he raining se as bellow: l 1 emp ( ) yi f (xi, ) 2l i 1 R (6) where l is number of observaion, y i is class label and x i is sample vecor. Srucural risk is defined as a srucure of divided enire class of funcion ino nesed subse and finding he subse of funcion which minimizes he bound on he acual risk. SVM achieves his goal by minimizing he following Lagrangian formulaion: l 1 2 P w iyi (xi.w b) 2 i 1 L (7) l i 1 where α i is posiive Lagrange mulipliers 3. Experimenal Resuls The proposed mehod is validaed by using IAUM Persian digis daase. This daase conains one housand digis uered by hundred Persian speakers. I should be noed ha his daase has en classes labeled from zero o nine. Here, we are going o presen some oucomes of each wo-main uni: preprocessing uni, and classificaion uni. In he firs uni, as described in Secion 2, for all iems in daase silence eliminaed from speech signal hrough meeing wo crieria: energy and zero crossing. Then, speech signals are framed ino proper unis. Each frame passed hrough pre-emphases i

Discree Word Speech Recogniion 08 filer and muliplied by Hamming window o eliminae high frequencies of speech and minimize he effec of undesirable speech signal boundaries. A he end, LDB ime- frequency feaures exraced from each frame (Fig. 4) and represened wih is mean o feedforward in he second main uni as a proper feaure vecor for classificaion. (a) ree decomposiion (b) bes ree decomposing (c) daa for node (0,0) (d) daa for node (1,0) (e) daa for node (1,1) (f) daa for node (2,0) Fig.4. Wavele packe decomposiion and bes seleced ree for Persian number 2 uered by a Persian speaker. (a) Wavele packe decomposiion, (b) LDB (bes) seleced ree, and he res are wavele approximaion and deail coefficiens

08 Saeid Rahai Quchani- Kambiz Rahbar (g) daa for node (2,1) (h) daa for node (2,2) (i) daa for node (2,3) (j) daa for node (3,0) (m) daa for node (3,3) (n) daa for node (3,4) Fig.4. Wavele packe decomposiion and bes seleced ree for Persian number 2 uered by a Persian speaker. (a) Wavele packe decomposiion, (b) LDB (bes) seleced ree, and he res are wavele approximaion and deail coefficiens

Discree Word Speech Recogniion 00 (o) daa for node (3,5) (p) daa for node (3,6) (q) daa for node (3,7) Fig.4. Wavele packe decomposiion and bes seleced ree for Persian number 2 uered by a Persian speaker. (a) Wavele packe decomposiion, (b) LDB (bes) seleced ree, and he res are wavele approximaion and deail coefficiens In classificaion uni, en SA-HMMs rained represened for en exising classes in our daase. Each SA-HMM calculaed he amoun of likelihood for inpu feaures and finally SVM classified i by using all classes likelihood as an inpu vecor. We used par of his daase for raining HMMs, SA-HMMs and SVM separaely. Afer each HMM and SA-HMM were rained, we rained SVM wih RBF kernel using paricular daase achieved from main daase by applying HMMs and SA- HMMs. Tables 1, 2 and Table 3 show he classificaion rae when fix and adapive saes number were idenified in advance while LDB ree-srucured deep varies from hree o five. Table 1. Comparison of HHM base classificaion error (LDB ree-srucured deep = 3) Mehod Error Rae on Train Daase Error Rae on Tes Daase Proposed mehod 12.5% 13.6% Proposed mehod using HMM insead of SA-HMM 13.1% 16.2%

08 Saeid Rahai Quchani- Kambiz Rahbar Table 2. Comparison of HHM base classificaion error (LDB ree-srucured deep = 4) Mehod Error Rae on Train Daase Error Rae on Tes Daase Proposed mehod 9.3% 10% Proposed mehod using HMM insead of SA-HMM 8.9% 11% Table 3. Comparison of HHM base classificaion error (LDB ree-srucured deep = 5) Mehod Error Rae on Train Daase Error Rae on Tes Daase Proposed mehod 8.5% 9.1% Proposed mehod using HMM insead of SA-HMM 8.3% 11.3% These ables show ha when he model enropy decreases, he recogniion power increases because of beer modeling. Addiionally, using LDB wih deeper ree-srucure decreases error rae. References 4. Conclusions In his paper we sudied independen speakers DWSR based on hybrid SA-HMM/SVM classifier. Our mehod includes wo main unis: a) Preprocessing uni ha ries o frame he speech ino proper segmens and exrac imefrequency relevan feaures hrough maximizing relaive enropy of ime-frequency energy disribuion among segmens, and b) Classificaion uni which classifies words ino proper classes by calculaing degree of words likelihood wih SA-HMM and classifying i hrough SVM classifier by using all classes likelihood as an inpu vecor. We validaed his mehod wihin he IAUM daase and found ha LDB wih deeper ree-srucure provides beer feaures vecor for classificaion. Addiionally, by decreasing he model enropy, he recogniion power increases. 1. A. Ganapahiraju, J.E. Hamaker and J. Picone, Applicaion of Suppor Vecor Machines o Speech Recogniion, IEEE Trans. Signal Processing, vol. 52, no. 8, Aug. (2004). 2. C.J.C. Burges, A Tuorial on Supor Vecor Machines for Paern Recogniion, Knowledge Discovery Daa Mining, vol. 2, no.2, pp. 121-167, (1998). 3. J. Li, J. Wang, Y. Zhao, and Z. Yang, Self-Adapive Design of Hidden Markov Models, Paern Recogniion Leers 25, pp. 197-210, (2004). 4. K. Rahbar and M. Rahbar, Discree Words Speech Recogniion (DWR) Using Self-adapive Hidden Markov Model (SAHMM), in Proc. In. Conf. GSPx 2005 Pervasive Signal Processing, USA, (2005).

Discree Word Speech Recogniion 88 5. L.R. Rabiner, A Tuorial on Hidden Markov Models and Seleced Applicaions in Speech Recogniion, in Proc. IEEE 77, No. 2, Feb (1989). 6. N. Saio and R.R. Coifman, On Local Orhogonal Bases for Classificaion ad Regression, in Proc. IEEE In. Conf. ICASSP-95 Acousics, Speech, and Signal Processing,, vol. 3, pp. 1529-1532, May (1995). 7. R.R. Coifman and M.V. Wickerhauser, Enropy-Based Algorihms for Bes Basis Selecion, IEEE Trans. Informaion Theory, vol. 38, Issue 2, Par 2, pp. 713-718, Mar. (1992). 8. S. Kwong, Q.H. He, K.W. Ku, T.M. Chan, K.F. Man and K.S. Tang, A geneic Classificaion Error Mehod for Speech Recogniion, Signal Processing, no. 82, pp.737-748, (2002). 9. S. Malla, A Wavele Tour of Signal Processing, Academic Press, (1998). 10. S. Rahai, K. Rahbar, Local Orhogonal Discriminae Bases o Hybrid SVM/Self-Adapive HMM Classifier for Discree Word Speech Recogniion, IEEE In. Symp. ISSPIT 2006, Vancouver, Canada., Augus (2006). 11. V. Digalakis, S. Tsakalidis, C. Harizakis and L. Neumeyer, Efficien Speech Recogniion using Subvecor Quanizaion and Discree-Mixure HMMS, Compuer Speech and Language, no.14, pp. 33-46, (2000). 12. Y. Shao and C.H. Chang, Wavele Transform o Hybrid Suppor Vecor Machine and Hidden Markov Model for Speech Recogniion, Circuis and Sysems, 2005, in Proc. IEEE In. Symp. ISCAS 2005, vol. 4, pp. 3833-3836, May (2005).