SPEECH ENHANCEMENT WITH KALMAN FILTERING THE SHORT-TIME DFT TRAJECTORIES OF NOISE AND SPEECH

14th Euopean Signal Pocessing Confeence (EUSIPCO 006), Floence, Italy, Septembe 4-8, 006, copyight by EURASIP SPEECH ENHANCEMEN WIH KALMAN FILERING HE SHOR-IME DF RAJECORIES OF NOISE AND SPEECH Esfandia Zavaehei, Saeed Vaseghi, and Qin Yan School of Design and Engineeing, Bunel Univesity Uxbidge, UB8 3PH, London, UK phone: + (44) 01895 74000, fax: + (44) 01895 3806, email: esfandia.zavaehei@bunel.ac.uk web: http://dea.bunel.ac.uk/cmsp/home_esfandia/home.htm ABSRAC his pape pesents a time-fequency estimato fo enhancement of noisy speech in the DF domain. he timevaying tajectoies of the DF of speech and noise in each channel ae modelled by low ode autoegessive pocesses incopoated in the state equation of Kalman filtes. he paametes of the Kalman filtes ae estimated ecusively fom the signal and noise in DF channels. he issue of convegence of the Kalman filtes to noise statistics duing the noise-dominated peiods is addessed and a method is incopoated fo estating of Kalman filtes afte long peiods of noise-dominated activity in each DF channel. he pefomance of the poposed method is compaed with cases whee the noise tajectoies ae not explicitly modelled. he sensitivity of the method to voice activity detecto is evaluated. Evaluations show that the poposed method esults in substantial impovement in peceived quality of speech. 1. INRODUCION Speech enhancement impoves the quality and intelligibility of voice communication fo a ange of applications including mobile phones, teleconfeence systems, heaing aids, voice codes and automatic speech ecognition. Among diffeent solutions poposed fo enhancement of noisy speech, estoation of shot-time speech spectum has been extensively studied [1][]. his appoach is nomally based on estimation of the shot time spectal amplitude (SSA) of the clean speech using an estimate of the signal-to-noise atio (SNR) at each fequency. he effect of phase distotion is assumed to be inaudible. An altenative to estimation of the SSA is the estimation of the eal and imaginay components of the DF of the clean speech. he MMSE estimation of the DF components with Gaussian pios, leads to the well-known Wiene filte solution [3] while the MMSE estimation of the SSA within the same set of Gaussian assumptions esults in Ephaim s noise suppession method [1]. In ecent yeas Matin has poposed the use of Gamma and Laplacian distibutions fo modelling the eal and imaginay components of the DF of speech [3]. Speech enhancement methods often assume that the spectal samples ae independent identically distibuted (IID) samples acoss fequency and time dimensions. Howeve, thee seems to be an appaent contadiction [4]; these same methods that stat with the IID assumption, often also use the assumption of the dependency of successive fames fo the calculation and smoothing of some key speech paametes such as the SNRs [1][3][5][6]. he application of Kalman filte fo speech enhancement has been extensively exploed duing the past few decades [7][8][9].hese methods ae mostly concened with the estimation of the speech signal in pesence of noise using an AR model of speech fo each fame. Howeve, the intefame coelation of speech signals, which has been shown is of geat impotance, is usually ovelooked in most of these methods. he modelling and utilization of the timevaying tajectoy of speech and noise spectum is the main focus of this pape. In this pape, the tempoal tajectoy model of the DF of speech and noise ae used in a moe igoous mathematical famewok fo a moe eliable estimation of speech specta. he use of Gaussian pios lends itself to application of Kalman filte fo modelling the tempoal tajectoies of the DF of speech. A set of AR models ae incopoated in Kalman filtes fo adaptive estimation and modelling of the tempoal tajectoies of the DF of the speech and noise signals. he est of this pape is oganized as follows. Section discusses the modelling of the samples of the tempoal tajectoies of DF components. In Section 3 the Kalman estimato of DF tajectoies is intoduced. In Section 4 the empiical issues and the paamete estimation of the new estimato ae discussed. In section 5 the evaluation esults ae compaed with othe methods of speech enhancement. Conclusions ae dawn in Section 6.. MODELLING DF RAJECORIES In this section the tempoal dependency and pedictability of the tajectoy of the DF components ae examined. he level of coelation between successive tempoal samples of DF components vaies fo diffeent fequencies as well as diffeent phonemes (i.e. along time and fequency). Moeove, the pobability distibutions of DF components ae stongly dependent on the fequency channel and the phoneme unde study. Figue 1 illustates the distibution of DF components of channel 6 (1000 Hz) fo phoneme /ah/. he data is obtained fom 130 sentences spoken by a male speake selected andomly fom the Wall Steet Jounal (WSJ) database. It is evident fom Figue 1 that the peak of

14th Euopean Signal Pocessing Confeence (EUSIPCO 006), Floence, Italy, Septembe 4-8, 006, copyight by EURASIP 0.08 Histogam Gaussian SKLD=0.8 0.07 Laplacian SKLD=0.4 Gamma SKLD=0.36 0.06 0.05 0.04 0.03 0.0 Aveaged Coelation Coefficient 1 0.8 0.6 0.4 0. Ca Noise ain Noise White Speech 0.01 0-4 -3 - -1 0 1 3 4 x 10 4 Figue 1. Nomalized histogam of S-DF components fo channel 6 (1 khz), Phoneme /ah/ the histogam is modelled bette with a Gamma distibution while the sides tend to fit a Gaussian distibution. able 1 shows the aveage symmetic Kullback-Leible distance (SKLD) [10] between histogams and paametic distibutions. hese esults show that, on aveage, Gamma distibution models the distibution of DFs of speech bette than Gaussian distibution. his is obseved fom the SKLD of speech with paametic distibutions. It is also obseved that most noise types have a easonably low SKLD with the Gaussian distibution. Howeve, as often, a compomise, between the complexity and the mathematical tactability of the model, suggests the use of Gaussian distibution and Kalman filtes fo modelling the tempoal tajectoies of DF. he eal pat of the DF of clean speech, S (n), can be modelled using an AR pocess: N k k = 1 ( ) = ( ) ( ) + ( ) (1) S n a n S n k e n whee S (n) is the eal pat of the DF of clean speech at fame n of an abitay fequency channel, a k (n) is the kth AR coefficient at the nth fame of the same fequency channel, e (n) is the coesponding estimation eo and N is the model ode. Moeove, it is assumed that S (n) is a stationay pocess within the pediction peiod. Assuming Gaussian distibutions fo DF components, the MMSE linea pedicto (LP) coefficients of Equation (1) can be obtained using Yule- Walke equation: 1 a = ( Rs ) ( ) s n () whee R s and ( ) s n ae the autocoelation matix and vecto of the eal pat of speech DF, S (n)=[s (n), S (n- L+1)], espectively, L is the numbe of samples used to obtain the autocoelation and a(n) is the AR coefficient vecto at fame n. A simila equation stands fo the imaginay com- able 1. Aveage SKLD between the histogams and diffeent paametic distibutions fo speech (aveaged ove all phonemes/fequency channels fo 130 labeled sentences spoken by a male speake) and diffeent noise types Distibution Gaussian Laplacian Gamma Speech 0.81 0.6 0.56 Ca noise 0.04 0.10 0.85 ain noise 0.15 0.05 0. Babble noise 0.69 0.51 0.46 Helicopte fly-by noise 0.1 0.15 0.59 White Gaussian 0.01 0. 0.83 0 0 4 6 8 10 ime lag ( 5ms) Figue. Aveaged absolute coelation in S-DF tajectoies ponent of the DF. he speech fame length, ovelap size and the LP ode should be caefully chosen to comply with the stationaity assumption of Equation(1), that is, between say 0-40 ms. Figue illustates the coelation coefficients between delayed samples of the DF of noise and speech signals, aveaged ove all fequency channels. Note howeve, that while the coelation coefficient may be negative, it is the absolute value which shows the level of coelation. It is evident that although, due to the fame ovelap, thee is a coelation between successive samples of DF of noise, this does not vay much with the noise type and is less than that of speech. he shift-size used in Figue is 5ms and the fame size is 5ms which expeimentally poved to esult in good noise eduction. 3. KALMAN DF RAJECORY RESORAION his section pesents the fomulation of Kalman filtes fo estoation of DF tajectoies. It is assumed that the clean speech signal s(t) is contaminated by the additive backgound noise d(t) uncoelated with the speech signal. he noisy speech signal x(t) is modelled as: x( t) = s( t) + d( t) (3) whee t denotes time. Fo each fequency channel Equation (3) is ewitten in DF domain as: X + jxi = ( S + D ) + j( Si + Di ) (4) whee the subscipts and i epesent the eal and imaginay pats of DF espectively and n denotes fame index. It is assumed that the eal and imaginay pats of the DF ae independent and have Gaussian distibutions. he independency assumption is veified fom a study of the scatte plots of the eal and imaginay pats of the DF coefficients of clean speech [3][11]. he eal pat of the DF of noise, D (n), is modelled using an AR model as: M D = bk D( n k) + g (5) k = 1 whee D (n) is the eal pat of the DF of noise at fame n of an abitay fequency channel, b k (n) is the kth AR coefficient at the nth fame of the same fequency channel, g (n) is the coesponding estimation eo which has a vaiance of σ g and M is the model ode. Following staight-fowad algeba manipulation, equations(1), (4) and (5) fo the eal pat may be epesented in canonical fom: X = A X( n 1) + GcE (6) X = HX (7) c

14th Euopean Signal Pocessing Confeence (EUSIPCO 006), Floence, Italy, Septembe 4-8, 006, copyight by EURASIP whee the state vecto X (n) is defined as: X ( ) ( ) ( ) n = n n S D (8) S = [ S ( n N + 1) S ] (9) D = [ D( n M + 1) D ] (10) whee S and D ae speech and noise state vectos espectively. he tansition matix A (n) is given by: F 0 A = 0 B (11) F (n) and B (n) ae speech and noise tansition matices espectively: 0 1 0 0 0 0 1 0 F = 0 0 0 1 an an1 an a1 (1) 0 1 0 0 0 0 1 0 B = 0 0 0 1 bm bm1 bm b1 (13) E (n) is the AR eo vecto of noise and speech and H c and G c ae constant vectos defined below: E = [ e g ] (14) U( N ) 0 Gc = ( M ) 0 U (15) Hc = ( N) ( M) U U (16) and U(N) is a N 1 vecto defined as: N 1 U( N ) 0 0 1 (17) A pediction of the state vecto is obtained fom the pevious state vecto using the tansition matix A(n) as: ˆ X = E X X ˆ ( n 1) = A n X ˆ n1 (18) { } ( ) ( ) ˆ n 1 ˆ n 1 whee X ( ) is the estimate of X (n-1). As e (n) and g (n) ae othogonal to X ( ) and each othe, the pediction eo covaiance matix is calculated as: Pc = A Pc ( n 1) A + GcΛ G c (19) Λ(n) is a matix defined as: σ e 0 Λ 0 σ g (0) P c ( n 1 ) is the state estimation eo covaiance matix. Note that, since accoding to Equation (7) thee is no noise added to H c X (n), the innovation hee is the diffeence between the pedicted noisy signal and the obseved noisy signal. Incopoating the innovation in the cuent noisy obsevation, the optimum estimate of the state vecto is calculated as: ˆ ˆ ( ) ( ) ( ) ˆ X = X n + K n X n H X (1) ( ) c c whee K c (n) is the Kalman gain vecto: 1 K c = P c H c c c H P H c () whee HP c c Hc is a scala value. he estimation eo covaiance of this estimate, P c (n), is obtained as: Pc = [ IKc Hc ] P c (3) he same set of equations holds fo the imaginay component of all fequency channels with nonzeo imaginay pats. he estimated clean speech DF is the by-poduct of the estimated state vecto X in Equation (1). 4. PARAMEER ESIMAION As the autocoelation of the DF tajectoies of clean speech is not available fo estimation of AR paametes in Equation(), the autocoelation vecto obtained fom the past estoed samples is used. hat is: 1 aˆ = ( Rˆ s ( n 1) ) ˆs ( n1) (4) he autocoelation vecto and matix ae calculated fom the past L=8 samples (with a shift-size of 5ms this is equivalent to 40ms). An implementation issue aises fom the feedback of estoed speech fo calculation of AR paametes using Equation (4). Duing long (typically >00ms) noise-only peiods, whee the vaiance of the noisy signal is equal to that of noise, the ecusive solution given by Equations (19) and (), esults in convegence of the output of Equations (1) towads zeo which consequently deceases the vaiance of pediction eo, σ e, towads zeo. In othe wods, the Kalman filtes speech output conveges to zeo duing noiseonly peiods. At the beginning of the speech signal, just afte a long noise-only peiod, due to the suppession of noise and the absence of speech the pediction of the DF tajectoies will be vey small with a consequently small pediction eo vaiance, σ e, which esults in a high weight fo the pediction of the state vecto (vey small Kalman gain) and zeo- ing of the output speech signal. In ode to pevent the consequent zeoing of speech following a long peiod of speech inactivity the value of σ e needs to be evived fom zeo at the beginning of speech active peiods. his is achieved by ensuing that values of σ e will not be less than a dynamic theshold which is a faction of the noisy signal enegy at each time-fequency bin. hat is: ˆ σe = max ( σ ( ), ( ) e n α X n ) (5) his limits the pediction eo vaiance to a small potion of the instantaneous powe spectum of noisy speech. Equation (5) implies that the DF tajectoies can be only pedicted with a limited pecision, i.e. the pediction eo vaiance cannot be smalle than a theshold popotional to the vaiance of the noisy speech. Vey small values fo α poved to be sufficient fo eviving the conveged tajectoies of σ e and the signal at the beginning of speech activity (e.g. α =0.07).

14th Euopean Signal Pocessing Confeence (EUSIPCO 006), Floence, Italy, Septembe 4-8, 006, copyight by EURASIP In ode to obtain the AR models of the DF tajectoies of noise fo each fequency channel, the autocoelations of the DF tajectoies ae obtained and smoothed duing the noise-only peiods. hese autocoelation vectos ae obtained using L samples of the eal and imaginay components sepaately and then aveaged fo each time step. hat is, the same AR model is used fo the eal and imaginay components of each channel of noise. 5. EVALUAION RESULS he evaluation of the pefomance of DF-Kalman filte with coelated noise model (DFKCN) descibed in section 3, fo enhancement of speech signals coupted by backgound noise is caied out using subjective and objective measues. Vaious types and levels of noise ae added to the speech signals selected fom the WSJ speech database. he noisy signals ae segmented using 5ms hamming windows with a shift size of 5ms. he ca noise signal is ecoded by ou colleagues in a 3-seies BMW at 70 Mph in a ainy day and the tain noise is ecoded in a moving tain. he paametes used in Kalman method ae: Autocoelation length L=8, LP odes N=4 and M= and α=0.07. 5.1. Mean Opinion Scoe (MOS) A set of twenty sample sentences ae dawn fom WSJ database and contaminated by ca noise and tain noise at two diffeent SNRs, 0dB and 10dB. he esulting noisy speech sentences ae then de-noised using fou diffeent methods: (i) paametic spectal subtaction (PSS) [], (ii) MMSE log- SSA [5], (iii) DF-Kalman filte with uncoelated noise model [1] (DFKUN) and (iv) DFKCN. Note that in the fist two methods decision-diected method is used fo tacking the a pioi SNR [1]. en tained listenes wee asked to scoe the quality of the esulting output signals fom 1 to 5, based on the peceptual ease of undestanding (intelligibility) and the comfot of listening (less annoying noise). he mean opinion scoe esults ae pesented in able. he esults of able show that the Kalman filte outputs ae pefeed by the listenes. As often, the extent of validity of these esults is limited by the numbe of listenes and test sentences used. 5.. Objective Evaluation Fom a numbe of diffeent speech quality and distotion measues applied to the estoed sample speech sentences of section 5.1, six ae listed in able 3. he coelation coefficient of each distotion measue with MOS was calculated and the thee most coelated distotion measues wee chosen fo futhe objective evaluation of the pefomance of diffeent methods. able 3 summaizes the coelation coefficients between MOS and six of the most popula objective measues obtained fom this expeiment. Pefomance of the DFKCN in pesence of ca and tain noise is evaluated using Itakua-Saito distance (ISD), Log-Likelihood atio (LLR) [13] and Peceptual Evaluation of Speech Quality (PESQ) scoes. One hunded sentences spoken by 0 speakes (10 Females and 10 Males) ae andomly selected fom WSJ database and contaminated by tain and ca noise at diffeent noise levels. hese noisy signals ae then de-noised using PSS, MMSE, DFKUN and able. Mean opinion scoe esults SNR Noise DFKUN DFKCN MMSE PSS Wiene 0dB Ca 3.7 3.8 3.5 3.4 3. ain.7.9.0.0.1 10dB Ca 4.5 4.7 4.6 4.4 4. ain 3.7 3.9 3.7 3.3 3.5 able 3. he coelation coefficient ρ of MOS and objective evaluation esults PESQ LLR ISD Kullback SegSNR SNR ρ 0.86-0.69-0.61-0.45 0.4 0.07 DFKCN methods and thei distotion measues ae obtained. he aveaged esults of the distotion measues ae summaized in able 4. 5.3. Sensitivity to VAD As mentioned in the pevious section, the estimato is evived at the beginning of speech signal afte long peiods of noise only signal. Moeove the noise statistics ae estimated and aveaged duing noise-only peiods. Many sophisticated methods have been poposed in the liteatue fo obust estimation of noise statistics which tack/detect noise nonstationaity. Although these methods povide bette estimates of the noise statistics, in his wok a simple voice-activitydetecto (VAD) based method is used to keep the focus on the de-noising method used to estimate the speech signal. It is assumed that duing the fist 00ms, the signal contains no speech. his is consistent with the database used in the expeiments. his pat of the signal is used to deive a noise model including the aveaged spectum of the signal, its vaiance, and the AR models of the DF pogessions. Afte this initialization, the spectum of each fame is compaed to that of noise and if the enegy of thei diffeence is less than 3dB the fame is flagged as noise. Afte 16 successive noise fames the VAD stats updating the noise model until a nonnoise fame is detected. his pocedue fo noise estimation has two type of eo, (i) the fames might be misclassified and (ii) the method cannot detect/tack the statistics of fast changing non-stationay noises. he sensitivity of the methods to these eos is evaluated and the esults ae shown in table 5. he esults of table 5 show that, geneally, thee is not able 4. PESQ, LLR and ISD scoes fo vaious noise levels and types, obtained using diffeent de-noising methods Ca Noise SNR (db) ain Noise SNR (db) Measue Method -5 0 5 10-5 0 5 10 DFKUN.41.80 3.13 3.43 1.81..6.98 DFKCN.51.90 3.0 3.49 1.90.30.69 3.05 PESQ MMSE.39.75 3.10 3.38 1.78.0.58.89 PSS.44.79 3.08 3.8 1.65.1.51.84 DFKUN 1.59 1.3 0.95 0.75. 1.74 1.35 1.03 DFKCN 1.5 1.18 0.90 0.68.09 1.68 1.31 1.00 LLR MMSE 1.60 1.6 1.01 0.91.53.07 1.61 1.19 PSS 1.59 1.5 1.01 0.87.64.17 1.67 1.3 DFKUN 1.08 0.78 0.58 0.44.63 1.8 1.0 0.81 DFKCN 1.15 0.85 0.64 0.49.56 1.75 1.17 0.80 ISD MMSE 1.7 0.93 0.71 0.54 3.07.33 1.61 1.08 PSS 1.41 1.04 0.77 0.59 3.43.71 1.89 1.19

14th Euopean Signal Pocessing Confeence (EUSIPCO 006), Floence, Italy, Septembe 4-8, 006, copyight by EURASIP able 5. PESQ scoes of enhanced speech signals when (A) VAD is used to detect noise fames and (B) the coect label of each fame (noise/speech and noise) is povided to the system SNR (db) ain Noise -5 0 5 10 A.41.80 3.13 3.43 DFKUN B.41.81 3.15 3.45 A.51.90 3.0 3.49 DFKCN B.55.9 3.0 3.50 Misclassification % 7.07 5.1 3.80.76 A 1.81..6.98 DFKUN Ca Noise B 1.79.0.61.98 A 1.90.30.69 3.05 DFKCN B 1.98.33.7 3.06 Misclassification % 8.8 7.07 5.75 4.17 ain Noise much diffeence between the PESQ scoes of the enhanced speech signals when the system is povided with the speech activity labels (B). In tain noise which is moe nonstationay we can see that the pefomance of the DFKUN degades if the exact noise fames ae specified while the pefomance of the DFKCN is impoved. We believe that since DFKUN only uses the vaiance of the noise (and not the AR model of the DF tajectoies), it would pefom bette if the abupt changes of the tain noise, which ae the most likely ones to be misclassified, ae not used fo estimation of the vaiances. On the othe hand if these fames ae used in estimation of the AR models fo DFKCN of the noise it would help the system to decompose the noise and speech bette by tacking the noise tajectoies using moe accuate models. Futhemoe, in ca noise which is a moe stationay noise, the pefomance of the system is slightly impoved by poviding the speech-activity labels to the system. 5.4. Discussion Infomal listening tests and compaisons of the quality of the output of the DFKUN and DFKCN methods with the MMSE log-ssa method eveal some majo diffeences. he level of esidual noise of DF-Kalman methods is much less than that of MMSE. While DFKUN slightly distots the low enegy potions of speech signal specta as a esult of the convegence of signal to small values. Due to this effect, at lowe SNRs, the hamonics of the speech ae well estoed while the non-hamonic potions of the speech spectum ae elatively suppessed. his effect is mitigated in DFKCN, while maintaining a simila o lowe level of esidual noise. Moeove, DFKCN esults in much less echo level than DFKUN method poducing a moe natualsounding speech signal. While the natue of the esidual noise in spectal subtaction is musical (shot busts of naowband enegy), the esidual noise of DF-Kalman methods seems to have the same peceptual chaacteistic of the oiginal noise. 6. CONCLUSION A method is poposed fo the enhancement of speech signals coupted with backgound noise. he oveall pefomance of the poposed method is shown to outpefom MMSE log- SSA estimato and paametic spectal subtaction. Listening tests show that the esidual noise of DF-Kalman methods is not composed of annoying naowband noise busts, musical tones. Infomal expeiments show that if the AR model of the DF tajectoies of clean speech ae povided to the system (even in the case of using aveaged models fo the noise obtained fom noise-only peiods), the DFKCN esults in exceptionally supeb quality of the de-noised speech. his suggests that the use of moe sophisticated methods fo estimation of the speech AR models is expected to esult in futhe gain in the pefomance of the DF-Kalman methods. he application of Expectation-Maximization (EM) methods fo this pupose is being studied [8]. REFERENCES [1] Ephaim, Y., Malah, D., Speech enhancement using a minimum mean-squae eo shot-time spectal amplitude estimato, IEEE ans. ASSP on Acoustics, Speech, and Signal Pocessing, vol. -3, no. 6, pp. 1109-111, Dec. 1984. [] Sim, B., ong, Y., Chang, J., an, C., A Paametic Fomulation of the Genealized Spectal Subtaction Method, IEEE ans. on Speech and Audio Pocessing, vol. 6, No. 4, July 1998, pp. 38-337. [3] Matin, R., Speech Enhancement Using MMSE Shot ime Spectal Estimation with Gamma Distibuted Speech Pios, IEEE ICASSP'0, Olando, Floida, May 00. [4] Cohen, I., On the Decision-Diected Estimation Appoach of Ephaim and Malah, ICASSP 04, Monteal, Canada, 17-1 May 004, pp. I-93-96 [5] Ephaim, Y., Malah, D., Speech enhancement using a minimum mean squae eo log-spectal amplitude estimato, IEEE ans. on Acoust., Speech, Signal Pocessing, vol. ASSP-33, pp. 443-445, Ap. 1985. [6] Cohen, I., Relaxed Statistical Model fo Speech Enhancement and a Pioi SNR Estimation, Speech and Audio Pocessing, IEEE ansactions on Volume 13, Issue 5, Pat, Sept. 005 pp. 870 881 [7] Paliwal, K.K., Basu, A., A speech enhancement method based on Kalman filteing, in Poc. Int. Conf. Acoust., Speech, Signal Pocessing, 1987, pp. 177 180 [8] Gannot, S., Bushtein, D., Weinstein, E., Iteative and Sequential Kalman Filte-Based Speech Enhancement Algoithms, IEEE ans. on Speech and Audio Poc., vol. 6, no. 4, pp. 373-385, Jul. 1998 [9] Ma, N., Bouchad, M., Gouban, R.A. Speech Enhancement Using a Masking heshold Constained Kalman Filte and Its Heuistic Implementations, Audio, Speech and Language Pocessing, IEEE ansactions on, Volume 14, Issue 1, Jan. 006 pp. 19-3 [10] Kullback, S., Leible, R.A., On infomation and sufficiency, Ann. Math. Stat., vol., pp. 79-86, 1951 [11] Billinge, D.R., ime Seies: Data Analysis and heoy, Holden-Day, 1981 [1] E. Zavaehei, S. Vaseghi, Speech Enhancement In empoal DF ajectoies Using Kalman Filtes, Intespeech 005, pp. 077-080 [13] Hansen, J., Pellom, B., An Effective Quality Evaluation Potocol fo Speech Enhancement Algoithms, poc. of ICSLP 1998, Sydney