AUDIO QUALITY MEASUREMETS I COMUICATIO SYSTEMS Ivo Matelja Faculty of electrical egieerig, Uiversity of Split R. Bosovica bb, 000 Split, Croatia (ivo.matelja@fesb.hr) Abstract: ITU gives P-class recommedatios for the measuremet of audio quality i commuicatio systems. The recommedatios are based o system trasfer fuctio estimatio, oliear distortio measuremet ad perceptual evaluatio of audio sigal degradatio. This wor shows ad discusses implemetatio of these methods i real-time "poit-to-poit" testig of commuicatio systems. A refiemet of existig Fourier aalyzer techique, a multitoe distortio testig, ad modulatio quality evaluatio i speech testig are preseted. Moder digital commuicatios itroduce ew types of sigal degradatio: codig distortios, time varyig delays, time variace of system parameters (gai ad oise floor) ad eve a time clippig i VoIP systems. Echo cacelig systems impose restrictio o selectio of excitatio sigals i distortio testig, as well as o selectio o miimal legth of sigal aalysis widow. After aalysis of these factors, the wor suggests use of excitatio sigals that are slightly differet from those proposed i ITU_T recommedatios. The wor also presets ew perceptual method for the evaluatio of speech quality called MQE. It is based o speech modulatio quality evaluatio ad ca be used i real time. Key words: audio quality measuremets, poit-to-poit audio testig i commuicatios. ITRODUCTIO This paper will preset various measuremet methods for the estimatio of audio quality i commuicatio systems, with emphasize give to speech trasmissio systems. The estimatio of audio quality ca be doe with objective methods ad with subjective testig of audio quality. Special methods, called perceptual methods, are objective methods, based o perceptual ad cogitive hearig models that give quality ratig expressed as equivalet subjective ratig. Followig measuremets were cosidered importat: measuremet of frequecy respose, impulse respose ad iput/output delay, oliear distortios measuremets with sie ad multitoe sigals, perceptual evaluatio of speech quality. From results of these measuremets, other system characteristics ca be estimated (SLR, RLR, T60, STI..). For subjective ad objective quality estimatio ITU gives recommedatios i ITU_T P-class documets. Followig these recommedatios a measuremet system for "poit-to-poit commuicatio system testig has bee made for the Croatia Telecommuicatio Agecy. Emphasize is give to methods that ca be realized i real-time measuremet system. I this wor it was foud that may of ITU proposed methods eed refiemet ad validatio. It is especially true for measuremet of frequecy respose, where ITU gives recommedatio for oe method (I/O autospectrum with composite speech-oise sigal excitatio) but also allows other measuremet methods (Fourier aalyzer with radom oise excitatio, swept-sie measuremet, MLS ad crosscorrelatio) []. After tryig all these methods it was foud that they give differet results. Differeces i measuremets were cosequeces of followig commuicatio system characteristics: automatic gai cotrol, automatic oise reductio, echo suppressor, codig distortios, time variat system characteristics, speech activatio. After aalysis of various methods we propose method for measurig frequecy ad impulse respose with iterrupted periodic oise. The same method is used for determiatio of system I/O delay.
Besides ITU_T recommedatios, solutios from ow measuremet systems (i.e. Rhode&Schwarz, eutri, ad Microix) have bee applied, otably the multitoe sigal for measuremet of frequecy respose ad total distortio [4]. Measuremets with multitoe sigals gai more ad more attetio [5], but still there is o stadardized test sigal for measuremet of total distortio. I this wor we propose excitatio sigal for measurig the total distortio of coded speech systems. Fially, a method for perceptual evaluatio of speech quality called MQE modulatio quality evaluatio [6] is preseted. We offer it as replacemet for ITU_T recommeded method PESQ [7]. It has clear theoretical foudatio ad simple implemetatio algorithm.. POIT-TO-POIT TESTIG All measuremets i this wor refer to "poit-to-poit" testig of commuicatio system. Figure shows iput/output coectio i cases of testig classical phoe system, ISD or GSM system. phoe 0u/5 V 0H/0 Ω :/ 600 Ω 0 560 00 zeer 4.V out i Soud card Fig.. Iterface stadard telephoe lie to PC soudcard 3. DISTORTIO MEASUREMETS It is usual to express oliear system distortios as total harmoic distortio (THD), total harmoic distortio plus oise (THD+) ad itermodulatio distortio (IMD). I first two cases excitatio sigal is sie; i third case excitatio sigal is sum of two sie sigals. Distortio is expressed as percetage of square root of ratio of power of distortio power (+oise) to sigal power. For testig distortio i coded system it is commo practice to use multitoe sigal [4], [5]. The multitoe belogs to class of multisie sigals, defied with followig equatio: phoe iterface M m( t) = A cos(π f t + ϕ ) = () iput output atteuator V/5mV headset out ISD (IP) headset i Mobile phoes phoe iterface ISD (IP) Fig.. Iput/output coectios i poit-to-poit testig All measuremets were doe with PC soudcard ad custom build software. Iterface of soudcard with mobile phoe is made usig a phoe headset iput ad atteuator V/5mV. The iterface of soudcard to stadard 600-Ω phoe lie is made with circuit show o Figure. Measuremet cofiguratio ca be applied to testig VoIP system. I that case, the use of ISD phoe with dedicated audio iput/output coectio is recommeded. It is composed of sum of M sie sigals with amplitude A ad predefied phases ϕ that are optimized to give the lowest crest factor. I a special case, whe phase ϕ is radom variable, resultig sigal is periodic oise with ormal amplitude distributio. All multitoe compoets were geerated digitally with iverse DFT ad samplig frequecy f s, so that each sie frequecy f coicides with frequecy of DFT bis that are Δf apart. That is way; i aalyzig the respose to multitoe we do ot eed to aalyze log sigal sequece or apply sigal widowig to get high resolutio of distortio compoets. Followig multitoe sigals were used i aalysis of system respose:. Widebad multitoe - /3 octave spaced sie sigals (from 0Δf to fs/). crest factor ± db.. Speech multitoe - liearly spaced sie sigals from 00Hz to 500Hz, plus /3 octave spaced sie sigals from 500Hz to 8Hz. Phases optimized for crest factor 0 ± db. 3. ITU_T O.8-39 sie sigals with frequecies spaced 00Hz (from 00 Hz to 3800Hz). Crest factor 0 ± db. For testig speech trasmissio chaels we defied ad used the "speech multitoe" rather tha ITU_T O.8 sigal. The reaso is simple: i system lie GSM, that
uses automatic echo suppressio, multitoe sigal with liearly spaced compoets ca push system to positive feedbac ad oscillatios, as demostrated o Fig. 3. To quatify oliear distortios of multitoe sigal we used the total distortio measure TD+ (total distortio + oise), defied as percetage of square root of the ratio of power of distortio+oise to power of multitoe sigal. Fig. 5. Spectrum of sie sigal of frequecy 000Hz passed through GSM system ad two mobile phoes Soy J70 ad Motorola V300. Distortios: THD = 0.6%, THD+=0.655%. Fig. 3. Time record of oscillatio build-up i a GSM system (upper trace), for the excitatio with a multitoe sigal ITU_T O.8 (lower trace). Figures 4 ad 5 show distortio spectrum of sie sigal i phoe system PABX Ericsso MD-0 ad GSM systems. Surprisigly, GSM has lower THD tha MD-0. Figures 6 ad 7 show distortio spectrum of speech multitoe sigal i phoe system MD-0 ad i GSM system. ote that GSM system has total distortio TD+ twety times larger tha phoe system MD-0. Obviously, TD+ is a better distortio measure tha THD, because subjectively speech quality i MD-0 system is much better tha i GSM system. The TD+ has ot bee accepted i ay stadard yet. The problem is that level of distortios depeds o system badwidth, so it is ot easy to defie TD+ measure that have equal meaig for widebad ad arrowbad system. I a low-bit-rate coded system the TD+ ca be higher tha 00%, although the speech quality ca be fair. Obviously, it will be problem to accept such a distortio measure. A alterative way to percetage measure of TD+ is a db ratio. Fig. 4. The spectrum of sie sigal passed through phoe system PABX Ericsso MD-0. Distortios: THD =.090%, THD+=.504%. Fig. 6. The spectrum of "speech multitoe" passed through phoe system PABX Ericsso MD-0. Total distortios: TD+ =.409%. Fig. 7. The spectrum of "speech multitoe" passed through GSM system ad two mobile phoes Soy J70 ad Motorola V300. Total distortios: TD+ = 58.775%. 4. FREQUECY RESPOSE MEASUREMETS Frequecy respose measuremet ca be doe i various ways:. by swept-sie or stepped-sie respose measuremet,. by estimatig the magitude of frequecy respose from the multitoe respose. 3. by usig Fourier aalyzer ad variety of excitatio sigal, 4. by usig maximum-legth sequece sigal (MLS) ad iput/output crosscorrelatio. I first two cases, usually oly the magitude of frequecy respose is estimated.
The MLS sigal is usually used for the estimatio of impulse respose from iput/output crosscorrelatio. It has bee popular i acoustical measuremet, but it caot be applied i a speech activated systems. Measuremet with swept-sie or stepped-sie does ot give reliable results i GSM ad some other coded systems at higher frequecies [4]. Some maufacturers (eutri, Rhode&Schwarz) use multitoe respose for estimatio of frequecy respose. But, as show o Fig. 7, i coded systems multitoe respose is close to oise floor at the passbad margis. What seems to be the best choice is a Fourier aalyzer method, but with ecessary modificatio to accout for speech activatio ad time-variat behavior. Fourier aalyzer estimate frequecy respose ad also gives the impulse respose (as iverse Fourier trasform of frequecy respose). This sigal assures that commuicatio system is i active state durig periodic oise excitatio ad that codig algorithm for that sigal always has the same characteristics (quasi-statioarity). The measuremet must tae place durig periodic oise excitatio. The, accordig to P.50, magitude of frequecy respose ca be estimated from iput ad output autospectrums; Speech activatio ( f ) ( f ) Y H ( f ) = (magitude estimator) () X Periodic oise Pause % 00 50 0 D/A filter g D H oise A/D filter A y 50 D. U.T A x 00 0 0.05 0. 0.5 0. 0. 5 0.3 0.35 s Fig. 8. Bloc diagram of measurig system i Fourier aalyzer x is iput sigal, y is output sigal, is oise, h is impulse respose ad H is frequecy respose Fig. 8 shows typical measurig system. The computer geerated sigal g, after D/A filterig with trasfer fuctio D, is applied to the test system that has trasfer fuctio H. ote that H represet best liear fit of the possible oliear trasfer fuctio. The geerator oise is eglected. The output from the test device, together with additive system oise, is acquired by the computer as a discrete sigal sequece y. The iput to the test device is acquired by the computer as a discrete sigal sequece x. The acquisitio process implies the use of a atialiasig filter that has trasfer fuctio A. 4.. Measuremet with ITU CSS composite sigal excitatio ITU_T recommedatio P.50 defies composite sigal CSS - as show o Fig. 9. It is periodic sigal that cosists of three parts. First part speech activatio sigal is speech lie sigal, defied i P.50, with duratio 50ms. It activates ad stabilizes automatic gai cotrol. Secod part is periodic oise sigal of duratio 00ms. Third part is a pause of duratio at least 00ms. Accordig to P.50: "The basic idea for usig such a sigal is to place the device uder test i a well-defied, reproducible state for the period of measuremet ad to secure that the trasfer fuctios of the device do ot chage appreciably durig the actual measuremet (quasistatioarity)". Fig. 9. ITU_T P.50 composite CSS sigal 4.. Fourier aalyzer with cotiuous ad periodic oise excitatio I a classical Fourier aalyzer the excitatio is a radom oise ad a frequecy respose is estimated by dividig the averaged cross-spectrum X Y with averaged autospectrum X X of iput ad output discrete sigal sequeces x i ad y i. We defie the H estimator as: H e ( ω) = i= i= Y X ( f ) X ( f ) i ( f ) X ( f ) i i i ( H estimator) where H e (f) deotes the estimated frequecy respose. The H estimator gives biased estimate of the real trasfer fuctio H(f), which is depedat o oise, distortio ad delay betwee iput ad output chael. Whe oly oise cotributes to bias, the effect of averagig ca be expressed by the equatio: H e ( f ) H ( f ) H ( f ) + + X G ( f ) A( f ) X ( f ) ( f ) X ( f ) ( ) ( ) s f G f D ( f ) ( f ) G ( f ) D( f ) s (3) (4)
where bracets <> deote the averaged value. ote that sigal term is summed coheretly, while the stochastic part of the oise is power summed. The coclusio is that averagig lowers the oise level proportioally with a square root of umber of averages, thus improvig the measuremet S/ by 0log(). We ca have better isight i quality of measuremets if we aalyze the coherece fuctio defied as: ( ) Output power due to iput S xy f γ = = (5) Total output power S ( f ) S ( f ) The coherece fuctio is a measure of the proportio of the power i output sigal y that is due to liear operatios o the iput sigal x. Maximum value of coherece is. Whe estimatig trasfer fuctios, the coherece fuctio is a useful chec o the quality of the data used. Values of the coherece fuctio less tha oe are possible if some of the followig situatios occur: o correlated oise preset Additioal exteral sigal source exist System has o liearity Additioal iputs preset i the system Error leaage ot reduced with widowig. To obtai a good accuracy of coherece fuctio measuremets, it is ecessary to mae frequecy domai liear averagig. From the defiitio of coherece fuctio, the term γ ( f ) S ( f ) is the output power related to the iput yy sigal, ad the term [ γ ( f )] S yy ( f ) is the oise compoet of the output power, therefore, the sigal to oise ratio of the system uder test is so computable: S γ ( f ) f ) = (6) γ ( f ) ( ote that coherece value less the 0.5 meas that oise (or distortio) is higher the measuremet sigal. xx yy correlatio betwee measured iput ad output sigals. It is possible to delay acquisitio of iput chael, so this id of error ca be elimiated. The biggest problem is i systems with voice activatio ad time varyig codig algorithm, the, cotiuous oise excitatio ca ot give reliable results. The problem ca be elimiated by usig the iterrupted periodic oise excitatio (Fig. ), that always eeps the coded commuicatio chael i active state. For the correct implemetatio of iterrupted oise excitatio followig coditios must be met: Start of the acquisitio must be after a preaveragig cycle that is ecessary to activate system ad reach the steady state respose. After every acquired bloc, sigal geeratio must be stopped, ad ew periodic oise sequece geerated. Pause must have duratio at least 00ms. The legth of FFT bloc must be equal to legth of the geerated periodic oise sequece. This guaraties that geerated ad acquired sigals are always correlated, so there will o bias due to the iput/output delay. preaveragig pause preaveragig delay FFT bloc FFT bloc iput sigal P oise output siga Fig.. Sigal geeratio ad acquisitio i iterrupted oise method The excitatio with iterrupted periodic oise is the best choice for measuremets of frequecy respose i commuicatio systems that are voice activated ad have time-variat sigal processig (automatic gai cotrol ad oise reductio). Iterrupted oise eep commuicatio chael i "active" state, while measuremets are tae i small iterval of time to assure system statioarity. ucorrelated sigal iput excitatio - cotiuous oise system output delay FFT bloc FFT bloc FFT bloc FFT bloc Fig. 0. Illustratio of ucorrelated estimatio i classical Fourier aalyzer. FFT bloc deotes parts of iput ad output sigal used i estimatio of autospectrum ad cross-spectrum. Fig.. Frequecy respose ad coherece fuctio of phoe system MD-0. I a system with large delay betwee iput ad output (see Fig. 0), i.e. whe measurig respose of commuicatio systems with high delay, there will be low
Fig. 3. Frequecy respose ad coherece fuctio of GSM system. Fig. shows measured frequecy respose of phoe system MD-0. A high coherece value i system passbad shows system with low distortio. I GSM system (Fig. ) coherece fuctio is low (0.4-0.8) because part of codig distortio is correlated with coded sigal. 4.3. Fourier aalyzer with swept-sie sigal excitatio It was show [3] that i acoustical measuremets sweptsie excitatio gives excellet results i time-varyig eviromet. The priciple of system excitatio ad measuremet is show o Fig. 4. Swept sie, prefixed with speech activatio sigal P.50, is treated as operiodic sigal. Time of measuremet is much loger tha excitatio sigal to accout for system delay, echoes ad reverberatio. Swept-sie - of legth FFT bloc of legth Excitatio Measured output Fig. 4. Sigal geeratio ad acquisitio of operiodic swept-sie Fig. 5. Frequecy respose of GSM system measured with iterrupted pi oise ad relative iput levels: 0dB, -3dB, -0dB, -db, -5dB, -7dB, -9dB,-0dB. Fig. 6. Frequecy respose of GSM system measured with swept-sie ad relative iput levels: 0dB, -3dB, - 0dB, -db, -5dB, -7dB, -9dB,-0dB. Figures 5 ad 6 illustrate effect of automatic gai cotrol i GSM system o measuremet results. Measuremet with iterrupted oise gives the same respose patter for differet iput levels, while measuremet with swept-sie shows distorted frequecy respose. Obviously, swept-sie excitatio gives bad results i commuicatio system with automatic gai ad oise reductio cotrol. 5. IMPULSE RESPOSE AD DELAY Methods for direct measuremet of impulse respose (MLS ad direct impulse excitatio) are ot suitable for measuremet of commuicatio system impulse respose. Impulse respose ad iput/output delay ca be estimated more reliable from iverse Fourier trasform of frequecy respose. Equatio (4) shows that there will be high level of oise at frequecy ear fs/ that is way; it is ecessary to apply atialiasig filter to impulse respose. 6. PERCEPTUAL EVALUATIO OF SPEECH QUALITY Methods for perceptual evaluatio of audio quality are relatively ew type of measuremets i which origial ad degraded speech sigals are compared usig perceptual ad cogitive models of hearig. The result is the quality ratig o a equivalet subjective scale [6]. ITU proposed various perceptual methods: PSQM, MB ad PESQ [7], [8], ad we have proposed the method called Modulatio Quality Evaluatio MQE [9]. We believe that MQE method has clearer theoretical cocept the PESQ method. It is based o paradigm that the most importat speech characteristic is the modulatio. Perceptual sesitivity to chage of modulatio is depedat o iteral oise. This effect is modeled by usig just oticeable differeces [0]. Fig. 7 shows mai compoets of a system for perceptual evaluatio of speech quality. The system aalyzes ad compares the origial ad degraded speech sigals i overlapped time frames of legth 40ms. Whe testig GSM ad VOIP speech trasmissio, proper delay estimatio ad frame sychroizatio is applied.
speech Perceptual model origial sigal delay aligemet D.U.T. Dela estimatio Cogitive model degraded sigal Perceptual model regardless of loudess value. This assumptio is ot correct, as it excludes the ifluece of iteral oise. MQE method aalyzes iteral hearig oise usig cocept of just oticeable differece (JD). A fudametal postulate of psychophysics is that all decisio variables are radom variables, draw from some probability desity fuctio. From the sigal detectio theory premise the JD of loudess is Δ JD = d'σ [], where σ is stadard deviatio of, ad d' is discrimiatio costat. ow we defie sigal to oise ratio i loudess domai as: Listeig quality ratig (equivalet MOS): Excellet (5), Good (4), Fair (3), Poor (), Bad () ' SR ( ) = = d σ ( ) Δ JD (8) Fig. 7. The system for perceptual evaluatio of speech quality Perceptual modelig meas that sigal is trasformed from the physical domai (soud itesity I) to the excitatio of basilar membrae (excitatio itesity E) ad fially to the compressed eural excitatio domai (loudess ) [0]. A cogitive modelig gives proper weight ad logistic trasformatio to differeces of origial ad degraded speech i order to get the listeig quality ratig o scale to 4.5, that is as close as possible to mea opiio score (MOS) obtaied from subjective listeig tests [6]. ext, a descriptio of perceptual distace measure ad cogitive model of MQE method are give. 6. Perceptual distace measure Perceptual distace measure, or frame distortio, is defied for -th speech frame i frequecy warped barloudess domai as p-orm measure: D ( ) = C p i= W ( i) abs( S, ( i) Sdeg, ( i)) or Where: i is idex of bar bad, i=,,.., is idex of speech frame W(i) is weightig factor for bad i S or (i) is perceptual value of origial speech i bad i S deg (i) is perceptual value of degraded speech i bad i C is arbitrary costat. The total distace measure (or total distortio) is expressed as average value of all D p (). I a PESQ method ad other ITU_T approved methods, the perceptual values are specific loudess of origial speech ad specific loudess of degraded speech that are ormalized to the same total loudess. Mai problem i PESQ is the assumptio that degradatio of speech quality is proportioal to differece of loudess (7) If we suppose that betwee two successive overlapped speech frames loudess chage is equal to Δ, the perceptually sigificat value of this chage ca be expressed as icremet of the sigal to oise ratio (). Δ S = (9) Δ JD It is usual to defie loudess relative JD fuctio J(): the: Δ JD J ( ) = (0) Δ S = () J ( ) First factor represets the loudess modulatio; secod factor is reciprocal of relative JD (proportioal to SR ). Experimetal results [] show that J() is costat for SPL above 60dB. That is way; the domiat value used i perceptual distace measure is a loudess modulatio. To apply this perceptual model it is ecessary to express the perceptual value S as a fuctio of the excitatio E ad relative JD fuctio J(E)= ΔE JD /E that is usually called Weber fractio. Alle ad eely [] show that relative JD fuctios i itesity ad loudess domai are related as: J ( ) = υ J ( E) () where υ is a fuctio that is equal to the slope of logloudess vs. log-itesity curve; d(log ) d E υ = = (3) d(log E) de They too a approximatio that Δ/ΔE = d/de, as υ is a slow varyig fuctio of loge. The, substitutio of () ad (3) i () gives:
ΔE S = (4) E J ( E) This equatio shows that relevat perceptual value ca be estimated i the excitatio domai. We get the excitatio E by summig power spectrum (itesity) i each critical bad ad applyig outer-ier ear filter: f 0.8 EarFilter( f ) = 0.6 3.64( ) 000 f f + 6.5exp( 0.6( 3.) ) 0.00( ) 000 000 3.6 ( db) (5) MQE method uses equatio (4), as perceptual value i distace measure (6), i the followig form: S ΔE = (6) E J ( E ) where: is a frame idex, ΔE is excitatio differece betwee ad - speech frame, E is average value of excitatio i ad - frame. J (E ) is ormalized relative JD fuctio (J = J / J mi). To get the ormalized relative JD fuctio, experimetal data of Riesz [] are used to set the followig fuctio: frames. The quality score, called MQEscore, is defied as: MQE scor e = 4.5 total_distortio (8) The same way is defied the PESQ score, to give the maximum quality score equal to 4.5. Fially, the equivalet MOS score is calculated usig the logistic fuctio that is similar to PESQ logistic fuctio [8]: MQE MOS 4 = + (9) + exp( A MQEscore + B) where costats A ad B are chose as: A=.35, B=4.9. The choice of this logistic fuctio is quite arbitrary, just to get the equivalet MOS as close as possible to PESQ method. Fig. 8 shows equivalet MOS for oise modulated speech degradatio of male voice. Pearso correlatio coefficiets results obtaied with PESQ ad MQE are high: r(pesq,mqe)=0,999. The similar results are for female voice. The speech modulatio with oise is geerated by ITU MRU method - Modulated oise referece uit [3]. J 0000E E th / 3 ( E) = + ( ) (7) 5 MOS where E th is the excitatio at threshold of hearig corrected with outer-middle ear filter. The choice of Riecz data for toe-lie sigals is approved i experimetal wor where it is show that most of the speech frames have high toality factor [4]. 4 3 6. Cogitive model The cogitive model of MQE determies weightig factors W(i) ad costat C of distace measure (7), the total distortio over all frames ad trasform of total distortio to equivalet MOS score. The costat C is chose as C=0.45. Weightig factors are chose to accout for higher sesitivity to modulatio chage i frequecy rage from Hz to Hz [0]. For ceter bar frequecies: 350, 450, 570, 700, 840, 000, 70, 370, 600, 850, 50, 500 ad 900 Hz, weightig factors are: 0.7, 0.8, 0.9,.0,.06,.5,.5,.5,.5,.5,.5,.06 ad.0. Two types of frames are aalyzed: active frames ad silet frames. Active frames have eergy level higher tha 0dB below maximum eergy, ad silet frames have eergy level higher tha 30dB below the level of active frames. Distortios of silet frames are scaled with factor 0.. Total distortio is the sum of average distortio i active frames ad scaled average distortio of silet 0 0 0 30 40 50 MRU S/ Fig. 8. Equivalet MOS for male speech degraded with oise modulatio as a fuctio of S/, for PESQ (o), MQE (). We apply MQE method to various coded speech sigals. The, subjective tests have show that MQE method is better tha PESQ [9]. MQE method is ot suitable for aalysis of distortios that ca be preset durig speech silece, as it is a modulatio based method. That is way, it caot be treated as geeral method for perceptual evaluatio of speech quality, rather, as ame suggests, it is a method for modulatio quality evaluatio.
7. COCLUSIO This paper gives survey of fudametal measuremet methods for testig audio quality i commuicatio systems. Primary iterests were systems for speech trasmissio ad poit-to-poit testig of such systems. Followig measuremets were cosidered importat: measuremet of frequecy respose, impulse respose ad iput/output delay, oliear distortios measuremets with sie ad multitoe sigals, perceptual evaluatio of speech quality. All other system parameters ca be estimated from these measuremets. These measuremets have lot i commo with acoustical measuremets, as we deal with systems that are ot time-ivariat ad have large delays. But, there is oe big differece: acoustical system is idepedet of excitatio sigal, while commuicatio system characteristics deped o excitatio sigal (through automatic gai ad oise reductio cotrol). This leads to some fudametal differeces from acoustical measuremet. I.e. it is ot recommeded to use sweptsie excitatio of Fourier aalyzer i frequecy respose measuremets of commuicatio system, although it is a sigal of choice i acoustical measuremets. This wors shows that there is o ideal system for measurig frequecy ad impulse respose i commuicatio systems, but prefereces are give to Fourier aalyzer with iterrupted periodic oise excitatio. It allows use of cocept of coherece fuctio to moitor measuremet S/ ratio. It also satisfied ITU requiremet for excitatio sigal, which has to eep commuicatio chael i active ad quasi-statioary state. Classical methods of measuremet of oliear distortio with a THD ad IMD are useless i coded systems. A much better way is estimatio of total distortios of a multitoe sigal. We defied a "speech multitoe" sigal which has equally spaced toal compoets o bar scale. A total distortio measure still eeds to be stadardized. Fially, a method for perceptual evaluatio of speech quality called modulatio quality evaluatio (MQE) is preseted. It is based o a simple paradigm that quality of degraded speech sigals ca be predicted from chages of speech loudess modulatio i critical bads. Theoretical aalysis has show that MQE perceptual distace measure ca be estimated i the excitatio domai. There is o eed to estimate the loudess. This results with simple implemetatio of fast MQE algorithm that ca be used i real-time. MQE ca be used as a replacemet for ITU recommeded method PESQ i stadard ad GSM phoe systems. REFERECES [] ITU-T Recommedatio P.50: Test sigals for use i telephoometry, ITU, 996. [] ITU_T supplemet : The Priciples of a composite source sigal as a example of a measuremet sigal to determie the trasfer characteristics of termial equipmet, ITU, 993. [3] I. Matelja, K. Ugriović: The Compariso of Room Impulse Respose Measurig Systems, Proceedigs of AAAA Cogress 003, Portoroz 003. [4] Rohde & Schwarz Applicatio otes: Acoustic Measuremets o GSM Mobile Phoes with Audio Aalyzer UPL ad Digital Radiocommuicatio Tester CMD, Applicatio otega39_0d, 004. [5] Ta, Moore, Zacharov: The Effect of oliear Distortio o Percived Quality of Music ad Speech Sigals, JAES, vol. 5, ovember, 003. [6] ITU_T P.800: Methods for subjective determiatio of trasmissio quality - MOS, ITU, 996. [7] ITU_T P.86: Perceptual evaluatio of speech quality (PESQ): A objective method for ed-to-ed speech quality assessmet of arrow-bad telephoe etwors ad speech codecs, ITU, 00. [8] ITU_T P.86.: Mappig fuctio for trasformig P.86 raw result scores to MOS_LQO, ITU, 003. [9] I. Matelja: The Modulatio Approach i Perceptual Evaluatio of Speech Quality, Proceedigs of the Softcom 004, Split-Dubrovi-Veice (994) [0] E. Zwicer, H. Fastl: Psycho-acoustics, Facts ad Models, Spriger Verlag, Berli, 999. [] Jot B. Alle, Stephe T. eely: Modelig the relatio betwee the itesity just-oticeable differeces ad loudess for pure toes ad widebad oise, JASA, vol. 0, December 997. [] A. Faria: Simultaeous measuremet of impulse respose ad distortio with a swept sie techique, 08 AES Covetio, Paris, 000. [3] ITU_T P.80: Modulated oise referece uit, ITU, 996. [4] E. Terhardt, G. Stoll ad M. Seewa: "Algorithm for extractio of pitch ad pitch saliece from complex toal sigals", J. Acoust. Soc. Am., vol. 7(3), March 98.