Evaluation of short-time speech-based intelligibility metrics

Similar documents
To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

RC Filters TEP Related Topics Principle Equipment

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

MTBF PREDICTION REPORT

antenna antenna (4.139)

Evaluate the Effective of Annular Aperture on the OTF for Fractal Optical Modulator

Effect of reducing slow temporal modulations on speech reception

DETERMINATION OF WIND SPEED PROFILE PARAMETERS IN THE SURFACE LAYER USING A MINI-SODAR

Multicarrier Modulation

Space Time Equalization-space time codes System Model for STCM

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

A Differentiable Approximation to Speech Intelligibility Index with Applications to Listening Enhancement

Cod and climate: effect of the North Atlantic Oscillation on recruitment in the North Atlantic

Comparison of Two Measurement Devices I. Fundamental Ideas.

NOISE ESTIMATION USING STANDARD DEVIATION OF THE FREQUENCY MAGNITUDE SPECTRUM FOR MIXED NON-STATIONARY NOISE

RECOMMENDATION ITU-R P Multipath propagation and parameterization of its characteristics

Calculation of the received voltage due to the radiation from multiple co-frequency sources

AIR FORCE INSTITUTE OF TECHNOLOGY

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Time-frequency Analysis Based State Diagnosis of Transformers Windings under the Short-Circuit Shock

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Learning Ensembles of Convolutional Neural Networks

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network

Adaptive System Control with PID Neural Networks

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

USE OF GPS MULTICORRELATOR RECEIVERS FOR MULTIPATH PARAMETERS ESTIMATION

Section 5. Signal Conditioning and Data Analysis

High Speed ADC Sampling Transients

Multichannel Frequency Comparator VCH-315. User Guide

THE USE OF CONVOLUTIONAL CODE FOR NARROWBAND INTERFERENCE SUPPRESSION IN OFDM-DVBT SYSTEM

NEUROMORPHIC NOISE ATTENUATION BASED ON PITCH IN HEARING AIDS

Th P5 13 Elastic Envelope Inversion SUMMARY. J.R. Luo* (Xi'an Jiaotong University), R.S. Wu (UC Santa Cruz) & J.H. Gao (Xi'an Jiaotong University)

Revision of Lecture Twenty-One

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Priority based Dynamic Multiple Robot Path Planning

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

INTERNATIONAL TELECOMMUNICATION UNION. SERIES P: TELEPHONE TRANSMISSION QUALITY Methods for objective and subjective assessment of quality

Performance Study of OFDMA vs. OFDM/SDMA

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Beam quality measurements with Shack-Hartmann wavefront sensor and M2-sensor: comparison of two methods

The Application of Interpolation Algorithms in OFDM Channel Estimation

Harmonic Balance of Nonlinear RF Circuits

Figure 1. DC-DC Boost Converter

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

DVB-T/H Digital Television Transmission and its Simulation over Ricean and Rayleigh Fading Channels

A method of improving SCR for millimeter wave FM-CW radar without knowledge of target and clutter statistics

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

Shunt Active Filters (SAF)

High Speed, Low Power And Area Efficient Carry-Select Adder

Digital Transmission

Radio Link Parameters Based QoE Measurement of Voice Service in GSM Network *

On Channel Estimation of OFDM-BPSK and -QPSK over Generalized Alpha-Mu Fading Distribution

FFT Spectrum Analyzer

AN EFFICIENT ITERATIVE DFT-BASED CHANNEL ESTIMATION FOR MIMO-OFDM SYSTEMS ON MULTIPATH CHANNELS

ETSI TS V8.4.0 ( )

Introduction. (Received 08 January 2009; accepted 10 March 2009)

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Detection of short circuit in pulse gas metal arc welding process

FEATURE SELECTION FOR SMALL-SIGNAL STABILITY ASSESSMENT

Study of the Improved Location Algorithm Based on Chan and Taylor

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

Information-Theoretic Comparison of Channel Capacity for FDMA and DS-CDMA in a Rayleigh Fading Environment

Figure 1. DC-DC Boost Converter

Research on Peak-detection Algorithm for High-precision Demodulation System of Fiber Bragg Grating

Application of Intelligent Voltage Control System to Korean Power Systems

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Subarray adaptive beamforming for reducing the impact of flow noise on sonar performance

Model mismatch and systematic errors in an optical FMCW distance measurement system

Chaotic Filter Bank for Computer Cryptography

PERFORMANCE COMPARISON OF THREE ALGORITHMS FOR TWO-CHANNEL SINEWAVE PARAMETER ESTIMATION: SEVEN PARAMETER SINE FIT, ELLIPSE FIT, SPECTRAL SINC FIT

Implementation of Digital Hearing Aid as a Smartphone Application

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Evaluation of Downlink Performance of a Multiple-Cell, Rake Receiver Assisted CDMA Mobile System

Time-Variant Least Squares Harmonic Modeling

A RF Source Localization and Tracking System

Performance Analysis of the Weighted Window CFAR Algorithms

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

POLYTECHNIC UNIVERSITY Electrical Engineering Department. EE SOPHOMORE LABORATORY Experiment 1 Laboratory Energy Sources

Appendix E: The Effect of Phase 2 Grants

On the Feasibility of Receive Collaboration in Wireless Sensor Networks

Speech Enhancement Based on Analysis Synthesis Framework With Improved Pitch Estimation and Spectral Envelope Enhancement

Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network

Exponential Effective SIR Metric for LTE Downlink

FAST ELECTRON IRRADIATION EFFECTS ON MOS TRANSISTOR MICROSCOPIC PARAMETERS EXPERIMENTAL DATA AND THEORETICAL MODELS

Understanding the Spike Algorithm

INSTANTANEOUS TORQUE CONTROL OF MICROSTEPPING BIPOLAR PWM DRIVE OF TWO-PHASE STEPPING MOTOR

Performance Analysis of Power Line Communication Using DS-CDMA Technique with Adaptive Laguerre Filters

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

THE INTERNET-BASED TELEOPERATION: MOTION AND FORCE PREDICTIONS USING THE PARTICLE FILTER METHOD

Webinar Series TMIP VISION

Generator of Time Series of Rain Attenuation: Results of Parameter Extraction

Chapter 13. Filters Introduction Ideal Filter

Transcription:

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT Evaluaton of short-tme speech-based ntellgblty metrcs Karen L Payton*, Mona Shrestha Unversty of Massachusetts Dartmouth, 85 Old Westport Rd, N Dartmouth, MA 747 *correspondng author: e-mal: kpayton@umassdedu INTRODUCTION The Speech Transmsson Index (STI) s based on acoustc measurements n envronments and has been shown to be correlated wth speech ntellgblty under a wde range of acoustc condtons (Houtgast & Steeneken 984) It s a weghted average of metrcs derved from envelope sgnals n multple frequency bands spannng the speech spectrum A varety of methods have been proposed to compute the STI (Houtgast & Steeneken 97; Steeneken & Houtgast 98; Ludvgsen 987; Drullman et al 994a, b; Payton et al 994; Drullman 995; IEC 998; Payton & Brada 999; Payton et al ; Goldsworthy & Greenberg 4) Some of these methods use speech as the test stmulus rather than artfcally modulated nose as orgnally proposed by Houtgast and Steeneken (985) Many of the speech-based technques have been shown to provde the same result as the tradtonal STI (Ludvgsen et al 99; Payton et al ), whch s based on modulaton reductons n ntenstymodulated nose and as a theoretcally derved STI whch s obtaned from weghted sgnal-to-nose ratos (SNRs) n seven octave bands and room reverberaton tme (RT) (Houtgast & Steeneken 985) To date, all speech-based approaches have used speech materals lastng at least a mnute or two to generate metrcs correlated wth long-term speech ntellgblty Consequently, they have not been used to predct short-tme changes n ntellgblty due to tme-varyng envronments such as fluctuatng background nose The current work nvestgates the ablty of two speech-based methods to track short-term STI results by usng speech segments of varous lengths to compute results for envronments wth statonary speech-shaped nose, speechshaped nose plus reverberaton or mult-talker babble The methods that wll be evaluated are the Envelope Regresson (ER) and the Normalzed Correlaton (NC) methods The ER method s based on the speech-based STI method proposed by Ludvgsen et al (99) The NC method was proposed by Goldsworthy and Greenberg (4) who also analyzed the long-term characterstcs of both metrcs METHODS Fgure depcts a block dagram of the sgnal processng steps used to obtan the results for the speech-based algorthms Specfcally, for both the ER and NC technques, the clean and the degraded sgnals, orgnally dgtzed at khz wth a 95 khz antalasng flter, were dgtally fltered usng a bank of 6 th order octave-wde Butterworth band-pass flters wth center frequences from 5 Hz 4 khz and a 6 th - order Butterworth hgh-pass flter wth a cutoff frequency of 6 khz For each band,, the clean and the degraded sgnals were then squared and low-pass fltered wth a cut off frequency of 5 Hz The lowpass flter mpulse response was a ms Hammng wndow The ntensty envelopes, x (t) and y (t), were down-sampled to 34 Hz (a factor of 49) to reduce computaton tme wthout rskng alasng Next, for each octave band, a modulaton metrc, M, was calculated from the ntensty envelopes Each approach used a dfferent algorthm to compute ths modulaton metrc

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT Octave BP Flters CF=5 Hz Clean Degraded Clean Degraded HP Flter Cutoff=6 khz Envelope Extracton Square LP Flter Cutoff=5 Hz X Y Modulaton Metrc X 7 M 7 Y 7 M asnr asnr 7 TI TI 7 STI Fgure : Block dagram of sgnal processng steps necessary to compute speech-based ntellgblty metrcs For the Envelope Regresson (ER) method, the modulaton metrc for each band was computed from the envelope sgnals usng Eqn (): M {( x ( k ) µ )( y ( k ) µ )} µ x E x y = () µ {( ( ) ) y E x k µ x } where µ x and µ y are the means of x (t) and y (t) respectvely For the Normalzed Correlaton (NC) method, M was computed usng Eqn (): M E { x ( ) ( )} k y k = () E { x ( k ) } E { y ( k ) } (Goldsworthy & Greenberg 4) Once the modulaton metrcs were computed, the apparent sgnal-to-nose rato n each band, asnr, was computed as asnr = M log (3) M and then clpped to the range of -5 to +5 db The apparent SNR n each band was converted to a transmsson ndex, TI, accordng to Eqn (4): asnr + 5 TI = (4) 3 Fnally, the overall STI value (rangng from to ) was calculated as a weghted sum of the TI values: STI = 7 6 α TI β TI TI (5) = = +

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT where the α s represent the octave weghtng factors and the β s represent the redundancy correcton factors gven n the IEC standard (IEC 998) Short-Tme Implementaton Issues For both the ER and NC methods, sample means of the wndowed envelope sgnals were calculated Correlatons were calculated as based estmates: E N N k = { x () k y () k } = [ x ( k) y ( k) ] and E x ( k) N { } = x () k N k = (6) where N was the wndow length (n samples) These correlaton values were used drectly n Eqn () for the NC method The cross- and auto-covarances needed for the ER method were calculated from the correlaton estmates of Eqns (6) as {( x( k ) µ x )( y( k ) µ y )} = E{ x( k ) y( k )} µ x y ( x ( k ) µ ) = E x k E µ and { } { ( ) } E x µ (7) x and used n Eqn () Wndow lengths were adjusted from 7 sec (length of 5 concatenated sentences) down to 78 ms for the analyses presented below Wndows were overlapped by 75 % Theoretcal STI In order to compare the short-tme metrcs wth the true STI, the theoretcal STI was also calculated over the same tme wndows as the short-tme metrcs The speech and the nose (as opposed to the degraded speech) were separately passed through the octave-band flter bank shown n Fgure and wthn-band powers used to get sgnal to nose rato (S /N n Eqn (8)) n each band The modulaton ndex n each band, M (F), was then calculated as specfed by Steeneken and Houtgast (98): M ( F) πft + 38 + = S / N The frst term n Eqn (8) estmates the modulaton reducton due to reverberaton The varable F corresponds to modulaton frequency (between 63 and 5 Hz) and T corresponds to the reverberaton tme of the envronment (T 6 ) The second term estmates the reducton due to addtve nose The theoretcal STI was computed by substtutng M (F) for M n Eqn (3), the varable asnr (F) was averaged across F after clppng to obtan asnr (8)

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT Stmul The stmul used n ths study were 5 concatenated nonsense sentences, spoken conversatonally by a male talker totalng 7 s of speech (Payton et al 994) These nonsense sentences are grammatcally correct but do not provde any semantc context to help word dentfcaton, eg, Hs guests could teach hs turnpke Each sentence conssts of four to eght key words (underlned n example) where the key words consst of the nouns, adjectves, verbs and adverbs n the sentence Degradaton Condtons Three envronmental degradatons were evaluated: statonary speech-shaped nose, statonary nose plus smulated reverberaton and mult-talker babble The speechshaped nose was generated by flterng whte Gaussan nose to approxmate the average long-term spectra of speech (Payton et al 994) The nose was added to the speech at an average SNR of db For the nose plus reverberaton condton, speech plus nose at db SNR was convolved wth a smulated conference room mpulse response (Peterson 986; Payton et al 994) The mult-talker babble was taken from a recordng of restaurant nose The babble also was added to the speech at db SNR RESULTS Results from both the ER and NC methods were compared wth the theoretcal STI for each degradaton condton as functons of wndow length Lnear regresson analyses also were carred out for the metrcs and theoretcal STI results For the regresson analyses, results for two wndow lengths are presented The 3 s wndow results are typcal of all the longer wndows The 78 ms wndow s presented to show a wndow for whch the metrcs devate from the theoretcal STI durng slent ntervals Zero db SNR wth Statonary Speech-Shaped Nose The results for each method over the length of one sentence are plotted as functons of tme n Fgure 5 Theoretcal STI STI 5 ER Method 5 NC Method 5 5 Tme (s) 7s s 6s 3s 6s 78ms LT STI Fgure : Metrc results vs wndow length (top) theoretcal STI (center) ER method and (bottom) NC method for db SNR statonary speech-shaped nose condton Dfferent curve types represent results wth dfferent wndow lengths as gven n the legend The black dotted lne n each plot represents the long-term STI

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT For vsual reference, an SNR of db corresponds to an STI value of about 5 (the exact value depends on the spectral characterstcs of the speech and nose) Both the ER and NC metrcs (center and bottom plots respectvely) generally matched local fluctuatons n the theoretcal STI (top plot) for each wndow length and the ER result for entre corpus (blue lne n center plot) matched the long-term STI (black dotted lne) exactly The ER method tracked the theoretcal STI more closely than the NC method for all wndow lengths analyzed For all wndow lengths, the NC method predcted slghtly hgher values than ether the ER method or the theoretcal STI n agreement wth long-term results of Goldsworthy and Greenberg (4) Once wndow length was decreased to 78 ms (tan dashed lnes), both the ER and NC methods devated greatly from the theoretcal STI at the begnnngs and ends of sentences Where the theoretcal STI was zero because only nose was present (SNR = - db) both metrcs often generated non-zero results Fgure 3 plots lnear regresson analyses of metrc results versus theoretcal STI for two wndow lengths: 3 s (top row) and 78 ms (bottom row) Each data pont represents the results for a sngle wndow Regresson lnes and the goodness of ft (R ) statstcs are also shown for each wndow length As can be seen from the fgure, the ER method results (left column) closely match the theoretcal STI for the 3 s wndow, ndcated by the R statstc of 99 The results are also close for the 78 ms wndow (R =9) However, for the 78 ms wndow, some of the ER results were above zero on the y-axs whch means that, durng the slent ntervals, when the theoretcal STI was zero the ER method sometmes generated values greater than zero (up to 4) Speech-Based Metrc 5 5 ER 3 s R =99 78 ms NC 3 s R =96 78 ms R =9 R =6 5 5 Theoretcal STI Fgure 3: Metrcs computed from ER (left column) and NC (rght column) methods vs theoretcal STI for db SNR usng 3 s wndows (top row) and 78 ms wndows (bottom row) The sold lnes represent best lnear fts to the data The NC method regresson analyss results are shown n the rght column of Fgure 3 Ths method predcted hgher values than the theoretcal STI for all wndow lengths as can be seen by the upward shft of the lnear regresson lnes from the man dagonal The R statstc of 96 for 3 s wndow shows that, despte ths shft, the NC method followed the theoretcal STI qute closely For the 78 ms wndow, the metrc dd not perform as well The R statstc s also reduced (6) n part because, when the theoretcal STI was zero, the NC method generated values rangng from to 8

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT In order to study how well, on average, the short-tme metrcs match the long-term theoretcal STI over the range of wndow lengths, the metrcs and theoretcal STI were averaged over the entre speech corpus (7 s) for each wndow length The averages are plotted n Fgure 4 as functons of wndow length In the fgure, the sold red lne represents ER method averages, the blue dash-dot lne represents the NC method averages and the black dotted lne represents the theoretcal STI It can be seen that ER method produced the same average value as the theoretcal STI over vrtually the entre wndow range studed The averages for all metrcs decreased as the wndow was decreased Ths s because voced speech segments domnated the metrc results and when the wndows were shortened to the pont that some wndows contaned prmarly unvoced and/or slent ntervals then the results for those wndows were sgnfcantly reduced The leftmost data ponts are for the 78 ms wndow For that wndow length, the ER dd not decrease qute as much as the theoretcal STI and the NC method actually ncreased slghtly Metrc Average 8 6 4 ER Method NC Method Theoretcal STI 4 6 8 Length of Wndow (s) Fgure 4: Metrc averages computed over entre speech corpus for speech n db statonary speechshaped nose, as functons of wndow lengths Zero db SNR Plus Reverberaton When reverberaton was added to the nosy speech, the metrcs generated values that vared more wdely when compared to the theoretcal STI In Fgure 5, metrcs are plotted (ER on the left and NC on the rght) versus the theoretcal STI for the two wndow lengths The 3 s wndow results are plotted n the top row and the 78 ms results n the bottom row As before, each symbol corresponds to a sngle wndow result, lnear regresson lnes are overlad on the data and the goodness of ft statstcs are shown

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT ER R =79 NC R =69 Speech-Based Metrc 5 3 s 3 s R =35 R =8 5 78 ms 78 ms 5 5 Theoretcal STI Fgure 5: Metrcs computed from ER (left column) and NC (rght column) methods vs theoretcal STI for db SNR plus reverberaton usng 3 s wndows (top row) and 78 ms wndows (bottom row) The sold lnes represent best lnear fts to the data It can be seen from Fgure 5 that, for the 3 s wndow, the results from both methods tracked the theoretcal STI farly closely although the ER method predcted values that were, on average, slghtly lower than the theoretcal STI across the range The NC method predcted hgher values than the theoretcal at the low STI end and lower values at the hgh STI end The correspondng R statstcs are 79 and 69 for the ER and NC methods respectvely For the 78 ms wndow, the results are much more dvergent (R =35 and 8 respectvely ndcatng very poor correlatons) In partcular, when the theoretcal STI was zero, both metrcs generated results that vared over a wde range ( to 4 for the ER method and to 8 for the NC method) Furthermore, there appears to be a nonlnear relaton such that the metrc values devated from the lnear regresson lne more at the hgher STI values Averages for both methods and the theoretcal STI as functons of wndow length are gven n Fgure 6 The sold red lne plots the ER method averages, the blue dash-dot lne shows the NC method and the black dotted lne represents the theoretcal STI Metrc Average 5 ER Method NC Method Theoretcal STI 4 6 8 Length of Wndow (s) Fgure 6: Metrc averages computed over entre speech corpus for speech n db statonary speechshaped nose plus reverberaton, as functons of wndow length

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT It can be seen from Fgure 6 that, for the nose plus reverberaton condton, the ER method generated values that paralleled but were consstently less than the theoretcal STI for all wndow lengths It should also be noted that, as for the speech plus nose condton, the NC method actually ncreased for the shortest wndows whle the ER and theoretcal STI contnued to decrease Zero db SNR wth Mult-Talker Babble As for the pror two condtons, metrc results are plotted aganst the theoretcal STI n Fgure 7 and a lnear regresson analyss s performed for each plot It can be seen from the left column n the fgure that the STI from ER method s hghly correlated wth the theoretcal STI for the 3 s wndow where R =93 whle data s much more scattered for the 78 ms wndow for whch R =84 As was observed for the other condtons, when the theoretcal STI produced values near zero, the ER values covered a wde range, n ths case from to 8 Speech-Based Metrc 5 5 ER 3 s NC 3 s 78 ms 78 ms 5 5 Theoretcal STI Fgure 7: Metrcs computed from ER (left column) and NC (rght column) methods vs theoretcal STI for db SNR mult-talker babble usng 3 s wndows (top row) and 78 ms wndows (bottom row) The sold lnes represent best lnear fts to the data Regresson analyss results for the NC method are shown n rght column of Fgure 7 The R statstc of 93 for the 3 s wndow ndcates that the NC method followed the theoretcal STI farly closely although the values t generated were consstently greater than the theoretcal STI For the 78 ms wndow, when the theoretcal STI generated values below, the NC method results ranged from up to 8 and R =74 When the asymptotc behavor of the metrcs was analyzed for speech plus mult-talker babble, the plots were dentcal n shape to Fgure 4, just shfted up slghtly to asymptote at 6 for the theoretcal STI and ER method and 8 for the NC method (plot not shown due to space constrants) CONCLUSIONS The data presented have demonstrated the ablty of two short-tme, speech-based, metrcs to accurately track short-term fluctuatons n STI down to wndow lengths of 3 s for two dfferent nose envronments and a nose plus reverberaton envronment Because these metrcs are speech based, they have the potental to be used n a wde varety of settngs to estmate speech ntellgblty under condtons not ame-

Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT nable to standard ntellgblty measurement technques such as durng lve performances Further nvestgaton s underway to analyze the 78 ms wndow results more thoroughly ACKNOWLEDGEMENTS Ths work was supported by NIDCD grant RO-DC75 REFERENCES Drullman R (995) Temporal envelope and fne structure cues for speech ntellgblty J Acoust Soc Am 97: 585-59 Drullman R, Festen JM, Plomp R (994a) Effect of reducng slow temporal modulatons on speech recepton J Acoust Soc Am 95: 67-68 Drullman R, Festen JM, Plomp R (994b) Effect of temporal envelope smearng on speech recepton J Acoust Soc Am 95: 53-64 Goldsworthy RL, Greenberg JE (4) Analyss of speech-based speech transmsson ndex methods wth mplcatons for nonlnear operatons J Acoust Soc Am 6: 3679-3689 Houtgast T, Steeneken HJM (97) Evaluaton of speech transmsson channels by usng artfcal sgnals Acustca 5: 355-367 Houtgast T, Steeneken HJM (984) A mult-language evaluaton of the RASTI-method for estmatng speech ntellgblty n audtora Acustca 54: 85-99 Houtgast T, Steeneken HJM (985) A revew of the MTF concept n room acoustcs and ts use for estmatng speech ntellgblty n audtora J Acoust Soc Am 77: 69-77 Houtgast T, Steeneken HJM, Plomp R (98) Predctng speech ntellgblty n rooms from the Modulaton Transfer Functon I General room acoustcs Acustca 46: 6-7 IEC (998) Sound system equpment - Part 6: Objectve ratng of speech ntellgblty by speech transmsson ndex nd Ed, Internat Standard No 668-6, Internatonal Electrotechncal Commsson Ludvgsen C (987) Predcton of speech ntellgblty for normal-hearng and cochlearly hearng-mpared lsteners J Acoust Soc Am 8: 6-7 Ludvgsen C, Elberlng C, Kedser G, Poulsen T (99) Predcton of ntellgblty of non-lnearly processed speech Acta Otolaryngol Suppl 469: 9-95 Payton KL, Brada LD (999) A method to determne the speech transmsson ndex from speech waveforms J Acoust Soc Am 6: 3637-3648 Payton KL, Uchansk RM, Brada LD (994) Intellgblty of conversatonal and clear speech n nose and reverberaton for lsteners wth normal and mpared hearng J Acoust Soc Am 95: 58-59 Payton KL, Brada LD, Chen S, Rosengard P, Goldsworthy R () Computng the STI usng speech as a probe stmulus In: v Wjngaarden, SJ (ed): Past, present and future of the SpeechTtransmsson Index (pp 97-9) TNO Human Factors, The Netherlands Peterson PM (986) Smulatng the response of multple mcrophones to a sngle acoustc source n a reverberant room J Acoust Soc Am 8: 57-59 Steeneken HJM, Houtgast T (98) A physcal method for measurng speech-transmsson qualty J Acoust Soc Am 67: 38-36