Speech Enhancement using Temporal Masking and Fractional Bark Gammatone Filters

Similar documents
FORWARD MASKING THRESHOLD ESTIMATION USING NEURAL NETWORKS AND ITS APPLICATION TO PARALLEL SPEECH ENHANCEMENT

Adaptive Harmonic IIR Notch Filter with Varying Notch Bandwidth and Convergence Factor

DSI3 Sensor to Master Current Threshold Adaptation for Pattern Recognition

Efficient Non-linear Changed Mel-filter Bank VAD Algorithm

Non-Linear Weighting Function for Non-stationary Signal Denoising

ELEC2202 Communications Engineering Laboratory Frequency Modulation (FM)

Relation between C/N Ratio and S/N Ratio

COMBINED FREQUENCY AND SPATIAL DOMAINS POWER DISTRIBUTION FOR MIMO-OFDM TRANSMISSION

RAKE Receiver. Tommi Heikkilä S Postgraduate Course in Radio Communications, Autumn II.

Auditory modelling for speech processing in the perceptual domain

TESTING OF ADCS BY FREQUENCY-DOMAIN ANALYSIS IN MULTI-TONE MODE

Energy-Efficient Cellular Communications Powered by Smart Grid Technology

A Novel TDS-FDMA Scheme for Multi-User Uplink Scenarios

Iterative Receiver Signal Processing for Joint Mitigation of Transmitter and Receiver Phase Noise in OFDM-Based Cognitive Radio Link

Implementation of Adaptive Viterbi Decoder

Improved Codebook-based Speech Enhancement based on MBE Model

EQUALIZED ALGORITHM FOR A TRUCK CABIN ACTIVE NOISE CONTROL SYSTEM

Cross-correlation tracking for Maximum Length Sequence based acoustic localisation

Fundamental study for measuring microflow with Michelson interferometer enhanced by external random signal

APPLICATION OF THE FAN-CHIRP TRANSFORM TO HYBRID SINUSOIDAL+NOISE MODELING OF POLYPHONIC AUDIO

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Mitigation of GPS L 2 signal in the H I observation based on NLMS algorithm Zhong Danmei 1, a, Wang zhan 1, a, Cheng zhu 1, a, Huang Da 1, a

Phase Noise Modelling and Mitigation Techniques in OFDM Communications Systems

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 3, Issue 9, September 2014

A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

A comparison of LSF and ISP representations for wideband LPC parameter coding using the switched split vector quantiser

A Novel NLOS Mitigation Approach for Wireless Positioning System

Power Improvement in 64-Bit Full Adder Using Embedded Technologies Er. Arun Gandhi 1, Dr. Rahul Malhotra 2, Er. Kulbhushan Singla 3

Overlapping Signal Separation in DPX Spectrum Based on EM Algorithm. Chuandang Liu 1, a, Luxi Lu 1, b

COMPARISON OF TOKEN HOLDING TIME STRATEGIES FOR A STATIC TOKEN PASSING BUS. M.E. Ulug

OTC Statistics of High- and Low-Frequency Motions of a Moored Tanker. sensitive to lateral loading such as the SAL5 and

Kalman Filtering for NLOS Mitigation and Target Tracking in Indoor Wireless Environment

Using Adaptive Modulation in a LEO Satellite Communication System

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

SECURITY AND BER PERFORMANCE TRADE-OFF IN WIRELESS COMMUNICATION SYSTEMS APPLICATIONS

Radio Resource Management in a Coordinated Cellular Distributed Antenna System By Using Particle Swarm Optimization

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Secondary-side-only Simultaneous Power and Efficiency Control in Dynamic Wireless Power Transfer System

ECE 6560 Multirate Signal Processing Analysis & Synthesis Notes

AccuBridge TOWARDS THE DEVELOPMENT OF A DC CURRENT COMPARATOR RATIO STANDARD

Additive Synthesis, Amplitude Modulation and Frequency Modulation

Different Approaches of Spectral Subtraction Method for Speech Enhancement

ES 442 Homework #8 Solutions (Spring 2018 Due April 16, 2018 ) Print out homework and do work on the printed pages.. Problem 1 ASCII Code (20 points)

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

PARAMETER OPTIMIZATION OF THE ADAPTIVE MVDR QR-BASED BEAMFORMER FOR JAMMING AND MULTIPATH SUPRESSION IN GPS/GLONASS RECEIVERS

Comparison of Fourier Bessel (FB) and EMD-FB Based Noise Removal Techniques for Underwater Acoustic Signals

Intermediate-Node Initiated Reservation (IIR): A New Signaling Scheme for Wavelength-Routed Networks with Sparse Conversion

HIGH FREQUENCY LASER BASED ACOUSTIC MICROSCOPY USING A CW GENERATION SOURCE

Interference Management in LTE Femtocell Systems Using Fractional Frequency Reuse

Alternative Encoding Techniques for Digital Loudspeaker Arrays

Performance Analysis of an AMC System with an Iterative V-BLAST Decoding Algorithm

Statistical Singing Voice Conversion with Direct Waveform Modification based on the Spectrum Differential

Overlapped frequency-time division multiplexing

Real Time Etch-depth Measurement Using Surface Acoustic Wave Sensor

DIRECT MAPPING OVSF-BASED TRANSMISSION SCHEME FOR UNDERWATER ACOUSTIC MULTIMEDIA COMMUNICATION

A soft decision decoding of product BCH and Reed-Müller codes for error control and peak-factor reduction in OFDM

Statistical Singing Voice Conversion based on Direct Waveform Modification with Global Variance

Keywords: International Mobile Telecommunication (IMT) Systems, evaluating the usage of frequency bands, evaluation indicators

PREDICTING SOUND LEVELS BEHIND BUILDINGS - HOW MANY REFLECTIONS SHOULD I USE? Apex Acoustics Ltd, Gateshead, UK

NINTH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION, ICSV9 PASSIVE CONTROL OF LAUNCH NOISE IN ROCKET PAYLOAD BAYS

Power-Efficient Resource Allocation for MC-NOMA with Statistical Channel State Information

Transmit Power and Bit Allocations for OFDM Systems in a Fading Channel

A 1.2V rail-to-rail 100MHz amplifier.

Distributed Resource Allocation for Proportional Fairness in Multi-Band Wireless Systems

Introduction Traditionally, studying outage or cellular systes has been based on the signal-to-intererence ratio (SIR) dropping below a required thres

WIPL-D Pro: What is New in v12.0?

Optimal Modulation Index of the Mach-Zehnder Modulator in a Coherent Optical OFDM System Employing Digital Predistortion

SAMPLING PERIOD ASSIGNMENT FOR NETWORKED CONTROL SYSTEMS BASED ON THE PLANT OPERATION MODE

This is an author-deposited version published in: Eprints ID: 5737

EFFECTS OF MASKING ANGLE AND MULTIPATH ON GALILEO PERFORMANCES IN DIFFERENT ENVIRONMENTS

Waveform Design and Receive Processing for Nonrecurrent Nonlinear FMCW Radar

INTERNATIONAL TELECOMMUNICATION UNION

Boris Krnic Nov 15, ECE 1352F. Phase Noise of VCOs

DESIGN OF TRANSFORMER BASED CMOS ACTIVE INDUCTANCES

POWER QUALITY ASSESSMENT USING TWO STAGE NONLINEAR ESTIMATION NUMERICAL ALGORITHM

Allocation of Multiple Services in Multi-Access Wireless Systems

Direct F 0 Control of an Electrolarynx based on Statistical Excitation Feature Prediction and its Evaluation through Simulation

Quality-enhanced Voice Morphing using Maximum Likelihood Transformations

Analysis on DV-Hop Algorithm and its variants by considering threshold

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

International Journal of Electronics and Electrical Engineering Vol. 1, No. 3, September, 2013 MC-DS-CDMA

An orthogonal multi-beam based MIMO scheme. for multi-user wireless systems

NONLINEAR WAVELET PACKET DENOISING OF IMPULSIVE VIBRATION SIGNALS NIKOLAOS G. NIKOLAOU, IOANNIS A. ANTONIADIS

Power Optimal Signaling for Fading Multi-access Channel in Presence of Coding Gap

The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016

A Preprocessing Method to Increase High Frequency Response of A Parametric Loudspeaker

ARCING HIGH IMPEDANCE FAULT DETECTION USING REAL CODED GENETIC ALGORITHM

A New Localization and Tracking Algorithm for Wireless Sensor Networks Based on Internet of Things

Sound recording with the application of microphone arrays

Keywords: Equivalent Instantaneous Inductance, Finite Element, Inrush Current.

Mismatch error correction for time interleaved analog-to-digital converter over a wide frequency range

Notes on Orthogonal Frequency Division Multiplexing (OFDM)

Amplifiers and Feedback

A NEW APPROACH TO UNGROUNDED FAULT LOCATION IN A THREE-PHASE UNDERGROUND DISTRIBUTION SYSTEM USING COMBINED NEURAL NETWORKS & WAVELET ANALYSIS

Precise Indoor Localization System For a Mobile Robot Using Auto Calibration Algorithm

REPORT ITU-R SA Telecommunication characteristics and requirements for space VLBI systems

Uplink blocking probability calculation for cellular systems with WCDMA radio interface and finite source population

Performance Analysis of Atmospheric Field Conjugation Adaptive Arrays

LOW COST PRODUCTION PHASE NOISE MEASUREMENTS ON MICROWAVE AND MILLIMETRE WAVE FREQUENCY SOURCES

UWB System for Time-Domain Near-Field Antenna Measurement

Transcription:

PAGE 420 Speech Enhanceent using Teporal Masking and Fractional Bark Gaatone Filters Teddy Surya Gunawan, Eliathaby Abikairajah School of Electrical Engineering and Telecounications The University of New South Wales, NSW 2052, Australia tsgunawan@ee.unsw.edu.au; abi@ee.unsw.edu.au Abstract A speech enhanceent technique based on the teporal asking properties of the huan auditory syste is presented. The noisy signal is divided into a nuber of sub-bands with fractional bark accuracy, and the sub-band signals are individually and adaptively weighted in the tie doain according to a short-ter teporal asking threshold to noise ratio estiate in each subband. Objective easures and inforal listening tests deonstrate significant iproveents over three well-known existing ethods when tested with speech signals corrupted by various noises at signal to noise ratios of 0, 0, and 20 db.. Introduction The purpose of speech enhanceent is to iprove the perforance of speech counication systes in noisy environents. Speech enhanceent can be applied in any applications, such as in obile counication systes, speech recognition, or hearing aids. The additive noise source ay be wideband noise, in the for of a white or colored noise, or a periodic signal, such as hu noise or roo reverberations. Single channel speech enhanceent is a ore difficult task than ultiple channel enhanceent, since there is no independent source of inforation with which to help separate the speech and noise signals. The spectral subtraction algorith is a well known solution to the speech enhanceent (Boll 979; Gustafsson, Nordhol, and Claesson 200; Martin 994; Tsoukalas, Mourjopoulos, and Kokkinakis 997), in which noise is usually estiated during speech pauses. Spectral subtraction is widely known to suffer fro perceptible artifacts resulting fro usical residual noise that is introduced into the enhanced speech by the ethod. In order to reduce the usical noise, various algoriths have been developed (Gustafsson et al. 200; Tsoukalas et al. 997; Virag 999). In (Virag 999) and (Tsoukalas et al. 997), huan auditory asking properties, i.e. siultaneous asking, were used to reduce the usical noise. Recently, a new speech enhanceent ethod known as speech boosting has been reported (Westerlund 2003). Instead of focusing on suppressing the noise, the ethod increases the relative power of the speech, thus acting as a speech booster. It is only active when speech is present, and reains idle when noise is present. As stated in (Westerlund 2003), the algorith has proven to be robust, flexible, and versatile. Functional odels of the teporal asking effect of the huan auditory syste have recently been used with success in speech and audio coding to provide ore efficient signal copression (Gunawan, Abikairajah, and Sen 2003; Sinaga, Gunawan, and Abikairajah 2003). Furtherore, a fractional bark filterbank resolution, i.e. 0.25 and 0.5 bark (Basic and Advanced Version), has been reported in (ITU 998) to provide ore accurate objective easureent of perceived audio quality (PEAQ). Therefore, it is expected that the use of fractional bark accuracy will provide ore accurate teporal asking calculation in speech enhanceent. In this paper, we propose a novel speech enhanceent ethod that eploys a functional odel of teporal asking, eploying a fractional bark gaatone filterbank, based upon odifications to the speech boosting technique (Westerlund 2003). To evaluate the perforance of our algorith, three other algoriths were ipleented: spectral subtraction (Boll 979), spectral subtraction with iniu statistics (Martin 994), and speech boosting (Westerlund 2003). The PESQ (Perceptual Evaluation of Speech Quality, ITU-T P.862) easure was used here to benchark the various ethods. 2. Proposed Speech Enhanceent Algorith Speech that has been containated by noise can be expressed as Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.

s v x = + () where x is the noisy speech, signal and s is the clean speech v is the additive noise source, all in the discrete tie doain. As entioned in section, the objective in speech enhanceent is to suppress the y n with a higher noise resulting in an output signal ( ) signal-to-noise ratio (SNR). We propose a new speech enhanceent algorith that incorporates teporal asking, as shown in Fig.. By filtering the input signal analysis filters, h x using a bank of M, the signal is divided into M subbands, each denoted by x, where is the sub-band index. Figure : Speech enhanceent using teporal asking This filtering operation can be described in the tie doain as x x h = (2) where =, K, M. The global teporal asking threshold, GTM, and the teporal asking threshold in each sub-band, TM, are calculated fro the noisy speech signal x and sub-band signal x ( ), respectively. The GTM and TM are used to calculate the gain ( Γ ) in each sub-band. The gain, Γ, is a weighting function that aplifies the signal in band during speech activity. The enhanced speech, y, is then obtained by applying the synthesis filters, g, and copensating the delay ( y M ) in each sub-band as follows M = y ( n ) = Γ x ( n ) g ( n ) = = Our objective is now to find a gain function, Γ, that, based on weighs the input signal sub-bands, x teporal asking threshold to noise ratio (MNR). The MNR in each sub-band can be calculated by using the ratio of a short-ter average teporal asking threshold, P, and an estiate of the noise floor level, Q as given in equation (6). The short-ter average teporal asking threshold in sub-band is calculated as P = ( α ) P ( n ) + α TM (3) (4) where α is a sall positive constant (i.e. α = 0.0042, ) controlling the sensitivity of the algorith to changes in teporal asking threshold, and acts as a soothing factor. The slowly varying noise floor estiate for the th sub-band, Q, is calculated as Q = ( + β ) Q ( n ), Q ( n ) P P, Q ( n ) > P where β is a sall positive constant (i.e. β = 0.05, ) controlling how fast the noise floor level estiate in sub-band adapts to changes in the noise environent. The variables P, Q, TM and GTM Γ n as follows, are used to calculate the gain function ( ) TM ( ) P ( ) n = γ + γ GTM Q (5) Γ (6) where 0 γ is a positive constant controlling the contribution of the teporal asking threshold ratio and the short ter MNR. Hence, the proposed algorith still acts as a speech booster but the gain calculation Γ differs fro (Westerlund 2003), which calculates the gain function fro the short-ter SNR. In order to find the optiu γ, we evaluated the average quality iproveent (see δ calculation in equation (8)) for a speech file (feale English speaker) containated with car noise at 0, 0, and 20 db SNRs at various γ. Fro the results of this experient, shown in Figure 2, we found the optiu value to be γ = 0.8,. PAGE 42 Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.

involves a division, care ust be taken to ensure that the quotient does not becoe excessively large due to a sall Q. In a situation with a very high MNR, Γ will becoe very large if no liit is iposed on this function. Since the calculation of Γ Figure 2: Quality iproveent for various γ Therefore, a liiter can be applied on Γ Γ = C, Γ Γ C > C Γ as follows: where C is soe positive constant. By using the sae experient to find the optiu γ, setting C = 8 db 2.5 provides a suitable liiter for the gain function. 3. Fractional Bark Gaatone Filterbank In this paper, a fractional bark gaatone filterbank was eployed to filter the signal x into its sub-band. A DC rejection filter was applied to signals x reove the subsonic coponents of the input signals. In addition, the optiu nuber of filter coefficients required was evaluated and the delay copensation for each sub band was calculated. 3.. DC Rejection Filter We designed a fourth order Butterworth high pass filter with a cut-off frequency of 20 H to reove the subsonic coponents of the input signals. The filter was ipleented as a cascade of two second order IIRfilters. 2 + 2 + H DC (8) ( ) = + a + b + c + d where a = -.9878047, b = 0.98804997, c = -.97486, and d = 0.97398, for fs = 8000 H. 3.2. Gaatone Filters For the analysis filter, we used gaatone filters as they reseble the shape of huan auditory filters (Kubin and Kleijn 999). These were ipleented using FIR filters. To achieve perfect reconstruction, n, are the tie reverse of the analysis filters, g ( ) (7) h. The analysis filter for each sub-band is obtained using the following expression, h N πbbwnt = a ( nt ) e cos( 2πf nt + ϕ ) where f c is the centre frequency for each sub-band, T is the sapling period, and N is the gaatone filter order ( N = 4 ). For fs = 8000 H, the total nuber of sub-bands, M, is dependent on the bark resolution, d. The paraeter n is the discrete tie saple index, and n = 0K Nf where Nf is the length of each filter within the filterbank. BW is the critical bandwidth at a particular center frequency, b =.65, and the a were selected for each filter so as to noralie the filter gain to 0 db. 3.3. Spacing of the Filters The gaatone filters were spaced linearly on the Bark scale, or critical-band rate scale. The critical band nuber (in Bark) is related to the linear frequency f (in H), as follows (Schroeder, Atal, and Hall 979) c (9) f ( f ) = 7 a sinh, f ( ) = 650 sinh (0) 650 7 The frequency borders of the filters range fro f L = 80 H to f U = 4000 H. The widths and spacing of the filter bands correspond to a resolution of d. The nuber of sub-bands M is then calculated as follows, M = ( f ) ( f ) U d L () A spacing of d = 0. 5 Bark required 34 filters, while a spacing of d = 0. 25 required 68 filters in order to cover the frequency range of 0 to 4 kh. The lower, upper, and center frequency for each sub band in Bark scale can be calculated as follows, l u c = ( f L ) + d, ( ( f ) + d, ( f )), = in L = ( + ), 2 l u U (2) where =, K, M. Subsequently, the center frequency and the bandwidth in H can be deterined as follows, f c ( ) BW = f ( ) f ( ) = f, (3) c In order to find the optiu value of d for our speech enhanceent ethod, we evaluated the average quality iproveent and processing tie for various d values at 0, 0, and 20 db SNRs, as seen in Figure 3. Fro Figure 3, we found that setting d = 0.25 provides the optiu value in ters of speech quality and processing tie. Hence, d = 0. 25 u l PAGE 422 Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.

was used throughout our experients. The frequency responses of gaatone filters for this value of d = 0.25 are shown in Figure 4. Figure 3: Fractional bark spacing versus quality and processing tie Figure 4: ¼ Bark spacing (68 filters) 3.4. Optiu Nuber of Filter Coefficients ( Nf ) The nuber of coefficients required to ipleent the analysis/synthesis filter bank depends on the ipulse response of the gaatone filters. The low frequency filters need ore coefficients as copared with the high frequency filters. The length of each filter within the filterbank, Nf, can be optiised by evaluating the non-ero gaatone filter response in each sub-band. The optiu length of the filter Nf in saples for each sub-band is given by ( Nf,round( fs f ) 25) Nf in ax (4) = c where fc is the centre frequency of the filter in H and Nfax =024 is the axiu length of filter coefficients. 3.5. Delay Copensation By eploying the optiu length of the filter in each sub-band, Nf, the aount of filter delay accuulated by each sub-band is different. Without copensation for this delay, the reconstruction of the sub-band signal coponents will lead to an incoherent output signal. The total aount of delay copensation necessary for subband is siply = Nf, where Nf is the optiu filter order calculated as in equation 4. 4. Teporal Masking Teporal asking is a tie doain phenoenon in which two stiuli occur within a sall interval of tie, and plays an iportant role in huan auditory perception. Forward teporal asking occurs when a asker precedes the signal in tie, while backward asking occurs when the signal precedes the asker in tie. Forward asking is the ore iportant effect since the duration of the asking effect can be uch longer, depending on the duration of the asker. The forward asking odel used in this paper is based on (Jesteadt, Bacon, and Lehan 982), and has been used and optiised in our previous papers for speech and audio coding (Gunawan et al. 2003; Sinaga et al. 2003). Based on the forward asking experients carried out by (Jesteadt et al. 982), forward asking level FM can be well-fitted to psychoacoustic data using the following equation: ( b t)( L c) FM = a log 0 (5) where FM is the aount of forward asking in db, t is the tie difference between the asker and the askee in illiseconds, L is the asker level in db, and a, b, and c, are paraeters that can be derived fro psychoacoustic data. To siplify the asking calculation, a, b, and c were set to 0.7, 2.3, and 20, respectively. Note that these paraeters can be further optiised. To evaluate the aount of forward asking, the current frae of 32 s was subdivided into four subfraes as shown in Figure 5. The forward asking level FM j was calculated for the jth sub-frae using the energy, L j, accuulated over the previous frae and all sub-fraes up to the current sub-frae. Figure 5: Calculation of forward asking The teporal aount of asking TM is then chosen as follows 0 ax{ FM, FM 2, FM 3, FM 4 } TM = 0 (6) Note that the calculation of a teporal asking threshold every 8 s was considered adequate since PAGE 423 Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.

this provides a good approxiation to the decay effect that lasts around 200 s. The teporal asking thresholds are calculated for each sub-band, TM, K,TM M, fro x and GTM fro x. 5. Perforance Evaluation In order to assess the perforance of the proposed algorith to enhance noisy signals, a large nuber of siulations were perfored. Six speech files were taken fro EBU SQAM data set including English feale and ale speakers, French feale and ale speakers, and Geran feale and ale speakers. The length of the files is between 7 and 20 seconds. The sapling frequency was 8 kh, and the frae sie was 256 saples (32 s). Several algoriths were ipleented and copared including spectral subtraction, SS, (Boll 979), spectral subtraction with iniu statistics, SSMS, (Martin 994), speech boosting, SB, (Westerlund 2003), and the proposed ethod speech boosting exploiting teporal asking, SBTM. 5.. Addition of Noise to Test Data Different types of background noises fro the NOISEX-92 database have been used including car noise, white noise, pink noise, F6 noise, factory noise, and babble noise. The variance of noise has been adjusted to obtain SNRs in the recorded signals ranging fro 0 db to 20 db, as follows: x = s + 5.2. Objective Measures Var Var ( s ) ( v ) 0 SNR 0 v (7) The PESQ (Perceptual Evaluation of Speech Quality) easure (ITU 200), which was recently adopted as an ITU-T recoendation (P.862), was utilised for the objective evaluation. Other objective easures such as Itakura-Saito distortion, Articulation Index, Segental SNR, and SNR have been correlated to subjective tests at 59%, 67%, 77%, and 24%, respectively (Quackenbush, Barnwell, and Cleents 988), while the PESQ has a 93.5% correlation with subjective tests (ITU 200), although obviously these figures were obtained using different data sets and subjective experients. To evaluate the perforance of the speech enhanceent algoriths, we developed a new easure to assess the iproveent achieved. Suppose that we have PESQ which is the PESQ score for the ref reference clean speech, s x. The PESQ score of the enhanced speech,, and the corrupted speech, y, was also easured and denoted as PESQ. proc Therefore, we can derive a new value, δ, which easures the PESQ iproveent achieved by the algorith as follows PESQ proc PESQref δ = 00% (8) PESQ ref A total of 08 data sets fro six speech files, six noises, and three SNRs for each ethod were siulated. The average quality iproveent, δ, achieved by various speech enhanceent ethods is shown in Figure 6. Note that the δ results for various speech files and noises were averaged for 0, 0, and 20 db SNRs. Fro these results, the proposed teporal asking-based speech boosting ethod sees to outperfor other ethods for all SNRs. Figure 6: Average δ (%) for various algoriths In order to analye the perforance of our proposed ethod in ore detail, the average of quality iproveent at 0, 0, and 20 db SNRs for various noises is shown in Table. Table : Average PESQ iproveent δ (%) for various noise types using spectral subtraction (SS), spectral subtraction with iniu statistics (SSMS), speech boosting (SB), and speech boosting with teporal asking (SBTM). Noise SS SSMS SB SBTM Car noise 3.27 5.26 0.49 7.56 White noise 6.22 24.8 6.39 29.76 Pink noise 6.43 22.28 5.40 26.60 F6 noise.2 6.23 2.8 22.5 Factory noise 2.70.84 2.65 20.20 Babble noise 2.5 4.20 7.44 9.2 The best δ result for each type of noise condition is shown in italics, fro which it can be seen that our proposed ethod provides a better PESQ iproveent than the three other ethods. The best iproveent is PAGE 424 Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.

achieved for the white noise while the least iproveent is achieved for the babble noise. The babble noise is a speech conversation in the background. Therefore, our algorith ight also isclassify and boost the babble noise as speech. Table 2: Average PESQ iproveent δ (%) for different speech files using spectral subtraction (SS), spectral subtraction with iniu statistics (SSMS), speech boosting (SB), and speech boosting with teporal asking (SBTM). Speech SS SSMS SB SBTM English ale 8.66 2.69 9.78 20.70 English feale.7 5.6.55 8.58 French ale 3.82 7.3.7 9.8 French feale 0.09 3.42 9.35 4.42 Geran ale 8.3 25.85 9.65 34.0 Geran feale 9.93 9.73 3.4 8.5 Table 2 shows the average of quality iproveent at 0, 0, and 20 db SNRs for various speech files. The best δ result for each individual speech files is shown in italics. While the table shows that our proposed algorith outperfors other algoriths, it is also reveals that our algorith iproves ale speech better than feale speech. 6. Conclusion We have presented a fractional bark gaatone filter for speech enhanceent based on a short-ter teporal asking threshold to noise ratio (MNR). The perforance of our proposed algorith was copared with three other standard speech enhanceent ethods over six different noise types and three SNRs. PESQ results reveal that the proposed algorith outperfors the other algoriths by 7-5% depending on the SNR. In the particularly deanding 0 db SNR condition, the new technique achieves at least a 40% relative iproveent in delta PESQ over any of the existing ethods copared. Hence, it appears that the teporal asking threshold based algorith with fractional bark accuracy has good potential for speech enhanceent applications across any types and intensities of environental noise. Further research is required to fine tune the paraeters for different speech and/or noise characteristics. 7. References Beerends, J. G., Hekstra, A. P., Rix, A. W., & Hollier, M. P. (2002). Perceptual Evaluation of Speech Quality. Gunawan, T. S., Abikairajah, E., & Sen, D. (2003, Deceber). Coparison of Teporal Masking Models for Speech and Audio Coding Applications. Paper presented at the International Syposiu on Digital Signal Processing and Counication Systes, pp. 99-03. Gustafsson, H., Nordhol, S. E., & Claesson, I. (200). Spectral Subtraction Using Reduced Delay Convolution and Adaptive Averaging. IEEE Transactions on Speech and Audio Processing, 9(8), pp. 799-807. ITU. (998). ITU-R BS.387, Method for the Objective Measureents of Perceived Audio Quality. Geneva: International Telecounications Union. ITU. (200). ITU-T P.862, Perceptual evaluation of speech quality (PESQ), an objective ethod for end-to-end speech quality assessent of narrow-band telephone networks and speech codecs. Geneva: International Telecounication Union. Jesteadt, W., Bacon, S. P., & Lehan, J. R. (982). Forward asking as a function of frequency, asker level, and signal delay. Journal of Acoustic Society of Aerica, 7(4), pp. 950-962. Kubin, G., & Kleijn, W. B. (999). On speech coding in a perceptual doain. Paper presented at the International Conference on Acoustic, Speech, and Signal Processing, pp. 205-208. Martin, R. (994). Spectral Subtraction Based on Miniu Statistics. Paper presented at the Europe Signal Processing Conference, Edinburgh, Scotland, pp. 82-85. Quackenbush, S. R., Barnwell, T. P., & Cleents, M. A. (988). Objective Measures of Speech Quality. Englewood Cliffs: Prentice Hall. Schroeder, M. R., Atal, B. S., & Hall, J. L. (979). Optiiing digital speech coders by exploiting asking properties of the huan ear. Journal of Acoustic Society of Aerica, 66, pp. 647-652. Sinaga, F., Gunawan, T. S., & Abikairajah, E. (2003). Wavelet Packet Based Audio Coding Using Teporal Masking. Paper presented at the Int. Conf. on Inforation, Counications and Signal Processing, Singapore. Tsoukalas, D. E., Mourjopoulos, J. N., & Kokkinakis, G. (997). Speech Enhanceent Based on Audible Noise Suppression. IEEE Transactions on Speech and Audio Processing, 5(6), pp. 497-54. Virag, N. (999). Single Channel Speech Enhanceent Based on Masking Properties of the Huan Auditory Syste. IEEE Transactions on Speech and Audio Processing, 7(2), pp. 26-37. Westerlund, N. (2003). Applied Speech Enhanceent for Personal Counication. PhD Thesis, Blekinge Institute of Technology. PAGE 425 Boll, S. F. (979). Suppresion of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), pp. 3-20. Macquarie University, Sydney, Deceber 8 to 0, 2004. Copyright, Australian Speech Science & Technology Association Inc.