Efficient Non-linear Changed Mel-filter Bank VAD Algorithm

Similar documents
Relation between C/N Ratio and S/N Ratio

Speech Enhancement using Temporal Masking and Fractional Bark Gammatone Filters

CONFIDENCE FEATURES EXTRACTION FOR WYNER-ZIV VIDEO DECODING

PREDICTING SOUND LEVELS BEHIND BUILDINGS - HOW MANY REFLECTIONS SHOULD I USE? Apex Acoustics Ltd, Gateshead, UK

DSI3 Sensor to Master Current Threshold Adaptation for Pattern Recognition

SECURITY AND BER PERFORMANCE TRADE-OFF IN WIRELESS COMMUNICATION SYSTEMS APPLICATIONS

FORWARD MASKING THRESHOLD ESTIMATION USING NEURAL NETWORKS AND ITS APPLICATION TO PARALLEL SPEECH ENHANCEMENT

Adaptive Harmonic IIR Notch Filter with Varying Notch Bandwidth and Convergence Factor

ES 442 Homework #8 Solutions (Spring 2018 Due April 16, 2018 ) Print out homework and do work on the printed pages.. Problem 1 ASCII Code (20 points)

Keywords: International Mobile Telecommunication (IMT) Systems, evaluating the usage of frequency bands, evaluation indicators

Smarter Balanced Assessment Consortium Claims, Targets, and Standard Alignment for Math

Direct F 0 Control of an Electrolarynx based on Statistical Excitation Feature Prediction and its Evaluation through Simulation

Transmit Power and Bit Allocations for OFDM Systems in a Fading Channel

ELEC2202 Communications Engineering Laboratory Frequency Modulation (FM)

NINTH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION, ICSV9 PASSIVE CONTROL OF LAUNCH NOISE IN ROCKET PAYLOAD BAYS

WIPL-D Pro: What is New in v12.0?

TESTING OF ADCS BY FREQUENCY-DOMAIN ANALYSIS IN MULTI-TONE MODE

Mitigation of GPS L 2 signal in the H I observation based on NLMS algorithm Zhong Danmei 1, a, Wang zhan 1, a, Cheng zhu 1, a, Huang Da 1, a

Implementation of Adaptive Viterbi Decoder

ACCURATE DISPLACEMENT MEASUREMENT BASED ON THE FREQUENCY VARIATION MONITORING OF ULTRASONIC SIGNALS

Windowing High-Resolution ADC Data Part 2

RAKE Receiver. Tommi Heikkilä S Postgraduate Course in Radio Communications, Autumn II.

Statistical Singing Voice Conversion with Direct Waveform Modification based on the Spectrum Differential

Using Adaptive Modulation in a LEO Satellite Communication System

Precise Indoor Localization System For a Mobile Robot Using Auto Calibration Algorithm

Non-Linear Weighting Function for Non-stationary Signal Denoising

COMPARISON OF TOKEN HOLDING TIME STRATEGIES FOR A STATIC TOKEN PASSING BUS. M.E. Ulug

EQUALIZED ALGORITHM FOR A TRUCK CABIN ACTIVE NOISE CONTROL SYSTEM

Kalman Filtering for NLOS Mitigation and Target Tracking in Indoor Wireless Environment

Overlapping Signal Separation in DPX Spectrum Based on EM Algorithm. Chuandang Liu 1, a, Luxi Lu 1, b

Parameter Identification of Transfer Functions Using MATLAB

Energy-Efficient Cellular Communications Powered by Smart Grid Technology

CH 24 SLOPE. rise = run. Ch 24 Slope. Introduction

ANALOGUE & DIGITAL COMMUNICATION

Analysis on DV-Hop Algorithm and its variants by considering threshold

Power Improvement in 64-Bit Full Adder Using Embedded Technologies Er. Arun Gandhi 1, Dr. Rahul Malhotra 2, Er. Kulbhushan Singla 3

Statistical Singing Voice Conversion based on Direct Waveform Modification with Global Variance

EFFECTS OF MASKING ANGLE AND MULTIPATH ON GALILEO PERFORMANCES IN DIFFERENT ENVIRONMENTS

Investigating Multiple Alternating Cooperative Broadcasts to Enhance Network Longevity

Overlapped frequency-time division multiplexing

Ultrasonic Beamforming with Delta-Sigma Modulators

A New Localization and Tracking Algorithm for Wireless Sensor Networks Based on Internet of Things

4G Communication Resource Analysis with Adaptive Physical Layer Technique

Iterative Receiver Signal Processing for Joint Mitigation of Transmitter and Receiver Phase Noise in OFDM-Based Cognitive Radio Link

Real Time Etch-depth Measurement Using Surface Acoustic Wave Sensor

OTC Statistics of High- and Low-Frequency Motions of a Moored Tanker. sensitive to lateral loading such as the SAL5 and

DIRECT MAPPING OVSF-BASED TRANSMISSION SCHEME FOR UNDERWATER ACOUSTIC MULTIMEDIA COMMUNICATION

Track-Before-Detect for an Active Towed Array Sonar

Optimal Modulation Index of the Mach-Zehnder Modulator in a Coherent Optical OFDM System Employing Digital Predistortion

Transmit Beamforming and Iterative Water-Filling Based on SLNR for OFDMA Systems

Interference Management in LTE Femtocell Systems Using Fractional Frequency Reuse

APPLICATION OF THE FAN-CHIRP TRANSFORM TO HYBRID SINUSOIDAL+NOISE MODELING OF POLYPHONIC AUDIO

A Novel TDS-FDMA Scheme for Multi-User Uplink Scenarios

An orthogonal multi-beam based MIMO scheme. for multi-user wireless systems

Quality-enhanced Voice Morphing using Maximum Likelihood Transformations

Radio Resource Management in a Coordinated Cellular Distributed Antenna System By Using Particle Swarm Optimization

Performance Analysis of an AMC System with an Iterative V-BLAST Decoding Algorithm

A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

Design and Implementation of Serial Port Ultrasonic Distance Measurement System Based on STC12 Jian Huang

Torsion System. Encoder #3 ( 3 ) Third encoder/disk for Model 205a only. Figure 1: ECP Torsion Experiment

REPORT ITU-R SA Telecommunication characteristics and requirements for space VLBI systems

Hand Gesture Recognition and Its Application in Robot Control

Fundamental study for measuring microflow with Michelson interferometer enhanced by external random signal

A Frequency Domain Approach to Design Constrained Amplitude Spreading Sequences for DS-CDMA Systems for Frequency Selective Fading Channels

A NEW APPROACH TO UNGROUNDED FAULT LOCATION IN A THREE-PHASE UNDERGROUND DISTRIBUTION SYSTEM USING COMBINED NEURAL NETWORKS & WAVELET ANALYSIS

COMBINED FREQUENCY AND SPATIAL DOMAINS POWER DISTRIBUTION FOR MIMO-OFDM TRANSMISSION

AN OPTIMAL DESIGN PROCESS FOR AN ADEQUATE PRODUCT?

The PAPR and Simple PAPR Reduction of the 2D Spreading Based Communication Systems

International Journal of Electronics and Electrical Engineering Vol. 1, No. 3, September, 2013 MC-DS-CDMA

HIGH FREQUENCY LASER BASED ACOUSTIC MICROSCOPY USING A CW GENERATION SOURCE

Keywords: Equivalent Instantaneous Inductance, Finite Element, Inrush Current.

Power Optimal Signaling for Fading Multi-access Channel in Presence of Coding Gap

Air Absorption Error in Room Acoustical Modeling

Allocation of Multiple Services in Multi-Access Wireless Systems

Phase Noise Modelling and Mitigation Techniques in OFDM Communications Systems

Keywords Frequency-domain equalization, antenna diversity, multicode DS-CDMA, frequency-selective fading

Performance of Multiuser MIMO System Employing Block Diagonalization with Antenna Selection at Mobile Stations

ANALYSIS AND OPTIMIZATION OF SYNTHETIC APERTURE ULTRASOUND IMAGING USING THE EFFECTIVE APERTURE APPROACH. Milen Nikolov, Vera Behar

Additive Synthesis, Amplitude Modulation and Frequency Modulation

Boris Krnic Nov 15, ECE 1352F. Phase Noise of VCOs

Path-Loss Estimation for Wireless Cellular Networks Using Okumura/Hata Model

Intermediate-Node Initiated Reservation (IIR): A New Signaling Scheme for Wavelength-Routed Networks with Sparse Conversion

Introduction Traditionally, studying outage or cellular systes has been based on the signal-to-intererence ratio (SIR) dropping below a required thres

A Digital Signal Processor Implementation of Silent/Electrolaryngeal Speech Enhancement based on Real-Time Statistical Voice Conversion

Multicarrier Interleave-Division Multiple Access Communication in Multipath Channels

INTERNATIONAL TELECOMMUNICATION UNION

A Wireless Transmission Technique for Remote Monitoring and Recording System on Power Devices by GPRS Network

THE IMPLEMENTATION OF THE HARTEBEESTHOEK94 CO-ORDINATE SYSTEM IN SOUTH AFRICA

ROBUST UNDERWATER LOCALISATION OF ULTRA LOW FREQUENCY SOURCES IN OPERATIONAL CONTEXT

Simplified Analysis and Design of MIMO Ad Hoc Networks

Cross-correlation tracking for Maximum Length Sequence based acoustic localisation

ECE 6560 Multirate Signal Processing Analysis & Synthesis Notes

Multiresolution MBMS Transmissions for MIMO UTRA LTE Systems

Distributed Resource Allocation for Proportional Fairness in Multi-Band Wireless Systems

A soft decision decoding of product BCH and Reed-Müller codes for error control and peak-factor reduction in OFDM

This is an author-deposited version published in: Eprints ID: 5737

Model Development for the Wideband Vehicle-to-vehicle 2.4 GHz Channel

Ruohua Zhou, Josh D Reiss ABSTRACT KEYWORDS INTRODUCTION

Comparison Between PLAXIS Output and Neural Network in the Guard Walls

A comparison of LSF and ISP representations for wideband LPC parameter coding using the switched split vector quantiser

Transcription:

Matheatical Models and Methods in Modern Science Efficient on-linear Changed Mel-filter Bank VAD Algorith DAMJA VLAJ, ZDRAVKO KAČIČ, MARKO KOS Faculty of Electrical Engineering and Coputer Science University of Maribor Setanova ulica 17, 2 Maribor SLOVEIA dajan.vlaj@uni-b.si, kacic@uni-b.si, arko.kos@uni-b.si Abstract: - This paper introduces efficient non-linear changed el-filter bank (MFB) voice activity detection (VAD) algorith. on-linear changed el-filter bank outputs iprove detection of parts in the speech signal, where vowels, diphthongs and seivowels are present. To ake voice activity detection of consonants in the speech signal as good as possible, the hangover and hangbefore criteria are used. For this reason the phonee duration analysis was ade. The duration of vowels, diphthongs and seivowels defines how any fraes ust be detected as speech, so that it can be decided if hangover and hangbefore criteria will be used at all. The duration of consonants defines how any fraes will be used for hangover and hangbefore criteria. Coparative tests were ade between the MFB VAD algorith, where non-linear function was used and where it was not used. The experients were also ade on four VAD algoriths used in the ITU G.729, ITU G.723.1, DSR ETSI ES 22 5, and DSR ETSI ES 22 211 standards. The introduction of non-linear function in to the MFB VAD algorith reduces errors obtained by incorrect voice activity detection. Key-Words: - voice activity detection, signal processing, hangover criterion, hangbefore criterion 1 Introduction An effective Voice Activity detection (VAD) algorith can distinguish speech fro silence in the captured speech signal. The speech region can also contain background noise, which is present in the whole speech signal. The silence parts contain only background noise. VAD algorith plays an iportant role in telecounication systes. VAD algorith and two other algoriths, Discontinuous Transission (DTX) and Cofort oise Generator (CG) [1, 2], are used to reduce transission rate during the silent periods of the speech signal. When DTX is in operation, the transitter is switched off if no speech is present. This increases the syste capacity by reducing the co-channel interface and also reduces transitter power consuption (an iportant consideration for obile phones). During a typical conversation, each speaker talks for less than 4% of the tie, and it has been estiated that DTX with VAD decision could approxiately double the transission channels capacity [3]. Very iportant part of the telecounication systes is also autoatic speech recognition (ASR) syste. VAD algorith plays an iportant role in such systes. A ajor cause of errors in autoatic speech recognition systes is incorrect detection at the beginning and ending boundaries of the test, and the reference patterns [4]. Correct deterination of the endpoints is fairly easy if the signal to noise ratio (SR) is high (e.g., greater than 35 db). Unfortunately, the ajority of applicable recognizers have to work with a uch saller SR (typically between 25 db and 15 db, and as low as 5 db). Under such conditions it becoes very difficult to detect soe consonants like unvoiced fricatives, unvoiced stops, and nasals at the beginning or the end of the utterances [4]. In order to correct the incorrect VAD decision, hangover and hangbefore criteria can be added at the end of the basic VAD algorith [5]. Using the hangover criterion, the proble of incorrect detection of consonants that occur at the end and in the iddle of words can be solved. With the hangbefore criterion, the proble of incorrect detection of consonants that occur at the beginning and in the iddle of words can be bee solved. In this paper coputationally efficient VAD algorith based on el-filter bank (MFB) outputs will be used [6]. In order to iprove the detection of voiced phonees, a non-linear function based on iniu and axiu statistics is introduced into the el-filter bank outputs of a speech signal. The detection of vowels, diphthongs, and seivowels is iproved when using this. The proble of consonant detection will be solved by ipleenting the hangover and hangbefore criteria. In the following section we give short descriptions of the MFB VAD algorith [6]. This ISB: 978-1-6184-16-7 113

Matheatical Models and Methods in Modern Science section will also include newly presented non-linear function based on iniu and axiu statistics, which is introduced into the el-filter bank outputs. Phonee duration analysis, which is used for defining the constants for hangover and hangbefore criteria, is also presented in Section 2. Section 3 presents the experients, which were ade on the Aurora 2 database [7] using different signal to noise ratios to ascertain how robust the tested VAD algoriths are to the background noise, and to perfor coparative analysis. The results are presented and discussed in Section 4. Conclusions are given in Section 5. 2 Voice activity detection algorith In the next three subsections an efficient non-linear changed el-filter bank (MFB) voice activity detection (VAD) algorith will be presented. 2.1 MFB VAD algorith The MFB VAD algorith was presented in [6] and is based on el-filter bank outputs. The presented VAD algorith classifies frae as speech σ[] = 1 (speech with background noise) or non-speech σ[] = (background noise only) by coparing the SR of the current frae to the threshold. The SR of the current frae corresponds to the difference between the short-ter and the long-ter spectral energy estiates. The long-ter estiate is updated when the VAD decides that the current frae corresponds to the noise only and the el-filter bank output of the current frae is used as a short-ter estiate. An estiate of the el-filter bank outputs short-ter energy E est [] is calculated for the first 1 fraes and the procedure is given by the following equation: E est ln fbank, i 1 i1, Eest 1ln fbank, i i1 1 1 2 (1) where = 23 represents the nuber of channels (el-filter bank outputs) and f bank [,i] is the i-th el-filter bank output of the -th frae. The estiates of short-ter energy E est [] are used to define the weighting factor q of the input signal, which is defined by following equation: est 32 Eest 6 / 9MAX q 64 6 / 9MAX E 7 / 9 MAX, 128 Eest 7 / 9MAX (2) where MAX[] represents the logarithic su of the el-filter bank outputs axiu possible values in the frequency spectru. For the -th frae the MAX[] is calculated as: cbink 1cbink12 MAX ln YMAX, k, (3) k 1 2 where is the nuber of el-filter bank outputs and Y MAX [,k] is the axiu possible value of the k-th bin of the frequency spectru in the -th frae. The axiu possible value of the frequency spectru is 2 Q /2, where Q represents the resolution of the input signal quantization. In our case the quantization resolution is 16. The different values for the weighting factor q were defined by considering the fact that the lower q was ore suitable for the higher SR s and vice versa. The interdependence between the total frae errors and weighting factor q can be found in [6]. The weighted short-ter el-filter bank output energy E f [], which is used in the VAD algorith, is defined by following equation: Ef q f bank 1 ln i 1, w i, (4) where = 23 represents a nuber of channels (elfilter bank outputs) and f bank [,i] the el-filter bank output of the i-th el-filter bank output and -th frae. The weighting factor q is used to increase the slope of the E f [] function in the logarithic doain. The constant w is set to 1 and is used to change the shape of the logarithic function, which gives optial attenuation for the su of el-filter bank outputs. The influence of the constant w in the logarithic doain can be found in [6]. The shortter el-filter bank output energy of the current frae E f [] is used in the update of the long-ter ean energy E [] as: IF ( E f[ ] E[ 1]) EnergyUpdate THE Ef[ ] E[ 1] E[ ] E[ 1] EnergyReduction, ELSE E [ ] E [ 1] (5) where the EnergyUpdate constant is set to 2 and the EnergyReduction constant to 1. The EnergyReduction constant is set to odify longter ean energy E []. If the EnergyReduction is saller than 1 the odification is bigger and vice versa. EnergyReduction was set to 1 after several analyses, which can be found in [6]. The basic VAD ISB: 978-1-6184-16-7 114

Matheatical Models and Methods in Modern Science decision procedure of the current frae can begin after initial calculations on the first frae to deterine the short-ter spectral energy E f [] and the long-ter ean spectral energy E []. The equation (6) shows the basic VAD algorith, which is used in this paper. IF ( E [ ] E [ ]) EnergyRatio THE ELSE f BasicVAD 1 BasicVAD (6) The constant EnergyRatio is set to optial value 4.5, the definition of which can be found in [6]. The constants EnergyUpdate and EnergyRatio were defined together. The long-ter ean energy E [] is updated and the frae is declared as speech between both values. When the difference between the short-ter spectral energy E f [] and the longter ean energy E [] is saller than 4.5 only the long-ter ean energy E [] is updated and when the difference is bigger than 2 the current frae is declared as speech without long-ter ean energy E [] update. In the next subsection, we are going to introduce the non-linear function based on iniu and axiu statistics, which iproves detection of vowels, diphthongs and seivowels in the speech signal. 2.2 on-linear changed MFB outputs The detection of phonees in the speech signal is accurate at high SR, especially when the SR is ore than 25 db. As soon as the SR is reduced, the perception of the consonants is uch ore difficult, or at very low SR (less than 5 db) it is alost ipossible. Even the detection of the vowels, diphthongs, and seivowels can lead to errors in the VAD decision theselves. To iprove the detection of voiced phonees a non-linear function based on iniu and axiu statistic is introduced into el-filter bank outputs. The definition of the non-linear function is derived fro the el-filter bank outputs f bank [,i]. The first step is to calculate the el-filter bank outputs' energy FBankE[] for each frae of the speech signal, which is given by the following equation: FBankE ln fbank, i, (7) i1 where = 23 represents the nuber of channels (el-filter bank outputs) and f bank [,i] is the i-th el-filter bank output of the -th frae. uber of words with the sae duration 14 12 1 8 6 4 2 1 2 3 4 5 6 7 8 9 1 Duration of words [s] Fig. 1: Analysis of the words duration and the nuber of words with the sae duration. The next step is to deterine the iniu and axiu values of the el-filter bank outputs' energy FBankE[] for each frae. The iniu and axiu values were deterined on statistical approach as presented in [8, 9]. The following two equations define calculation of the iniu and axiu values: in,, infbanke FBankE I D (8) I ax,, axfbanke FBankE I D (9) I where I represents the statistical interval and D the nuber of fraes over which the iniu or axiu value of the el-filter bank outputs' energy is calculated. The definition of optial nuber of fraes D was based on the analysis of the words duration in Aurora 2 database [7] and was set to 9. Fig. 1 shows the duration of words and the nuber of words with the sae duration. As it can be seen in the chart in Fig. 1, the ajority of words have the duration below 9 s. The length of the statistical interval for the iniu and axiu statistics is derived fro the fact that between the words there are short breaks. In this way we can deterine the iniu value. The el-filter bank outputs' energy of the vowels, diphthongs and seivowels has in ost cases a axiu value of energy. In ost cases, vowels, diphthongs and seivowels are alternated with consonants in the speech signal. In deterining the optial statistical interval, it was established, that the chosen interval for iniu statistics is also appropriate for axiu statistics. Thus, we chose the statistical interval size of 9 s or 9 fraes of short tie analysis, if the fraes are shifted for 1 s. Fro the resulting iniu and axiu values of the el-filter bank outputs' energy, we define the ean energy of the el-filter bank ISB: 978-1-6184-16-7 115

Matheatical Models and Methods in Modern Science weight 1.5 1.5.2.4.6.8 1 1.2 1.4 1.6 1.8 2 FBankE / eanfbanke Fig. 2: The path of non-linear weight factor. outputs eanfbanke[] for each frae using the following equation: infbanke eanfbanke 2 axfbanke (1) The odification of the el-filter bank outputs is ade with non-linear weight factor weight[], which is defined for each frae. Fig. 2 shows the path of non-linear weight factor, which is calculated by equation (11). on-linear weight factor changes the el-filter bank outputs to ephasize the larger and reduce saller values of el-filter bank outputs' energy. eanfbanke FBankE FBankE weight eanfbanke (11) Equation (12) represents the use of non-linear weight factor in changing the el-filter bank outputs. The iniu values of the el-filter bank outputs' energy infbanke[] are included in the function in order to reduce the values of the basic el-filter bank outputs f bank [,i]. With the use of natural logarith in equation (12), the sall values of the basic el-filter bank outputs are attenuated, however this does not have an effect on higher absolute values of the basic el-filter bank outputs. To avoid calculation of natural logarith saller than one, we added the value 1 to quotient in equation (12)., f i f, iexpweightln 1 bank onlin infbanke (12) In the next subsection we are going to present hangover and hangbefore criteria, which correct incorrect basic VAD decision, especially when low aplitude consonants are present. 2.3 Definition of optial hangover and hangbefore constants As already entioned, the VAD decision is not always correct, especially in the presence of unvoiced fricatives, unvoiced stops, and nasals at the beginning or the end of the utterances. In order to correct incorrect VAD decision, hangover and hangbefore criteria were added at the end of the basic VAD decision achieved by MFB VAD algorith presented in Subsection 2.1. Fig. 3 presents the use of hangover and hangbefore criteria. Using the hangover criterion, the proble of incorrect detection of consonants that occur at the end and in the iddle of words has been solved. Using the hangbefore criterion, the proble of incorrect detection of consonants that occur at the beginning and in the iddle of words has been solved. Having introduced both criteria at the end of the basic VAD decision, we have anaged to iprove final VAD decision. Phonee duration analysis, which is used for defining the constants for hangover and hangbefore criteria, was ade as presented in [5]. Classification of phonees in Aerican English was taken fro [1]. Only phonees present in recorded words of the Aurora 2 database [7] were used for phonee duration analysis. The analysis was carried out only on clean aterial fro the Aurora 2 database. The duration of vowels, diphthongs, and seivowels is iportant to define how any fraes ust be detected as speech, so that we can decide if hangover and hangbefore criteria will be used at all. The duration of consonants is iportant to define the constants, which will be used for the hangover 1 1 1 1 1 Speech signal in tie doain Speech signal in frequency doain Without hangover and hangbefore With hangover With hangbefore With hangover and hangbefore Fig. 3: The use of hangover and hangbefore criteria. ISB: 978-1-6184-16-7 116

Matheatical Models and Methods in Modern Science uber of iterations 5 45 4 35 3 25 2 15 1 5 Vowels, diphthongs and seivowels 1 2 Consonants and hang before criteria. Phonee duration analysis shows that the ajority of vowels, diphthongs and seivowels have the duration of 8 illiseconds. This represents 8 fraes which ust be detected as speech, so that we can use hangover and hangbefore criteria. The ajority of consonants have the duration of 11 illiseconds. This represents 11 fraes, which were used for the constants for the hangover and hangbefore criteria. The duration of vowels, diphthongs, and seivowels and duration of consonants is presented in Fig. 4. 3 Experients Experients were ade using the Aurora 2 database [7], which is designed to evaluate the perforance of speech recognition algoriths under noisy conditions. The developent of the VAD reference for the VAD experients is described in [6]. The tests based on frae errors were ade on in this paper presented MFB VAD algorith, where non-linear function was used and where it was not used. In both cases hangover and hangbefore criteria were used. The statistics were obtained on five types of errors [3]: clipping rate at the front of the speech segent (FEC), clipping in the iddle of the speech segent (MSC), noise detected as speech in the silent region (DS), quantity of tie during which the output of the tested VAD is on, after the reference VAD has switched off (OVER), and total nuber of all the frae errors (Total). 3 4 Durations of phonees [s] Fig. 4: The duration of vowels, diphthongs, and seivowels and consonants. 5 6 The experients were also ade on four VAD algoriths used in the ITU G.729 [1], ITU G.723.1 [2], DSR ETSI ES 22 5 (Advanced front-end) [11], and DSR ETSI ES 22 211 (Extended frontend) [12] standards. 4 Results and discussion The tests based on frae errors are presented in the following three tables. Percentage of frae errors obtained by both MFB VAD algoriths is presented in Tables 1 and 2. In both cases hangover and hangbefore criteria were used as presented in Subsection 2.3. In Table 3 the overview of percentage of total frae errors obtained by six tested algoriths on Aurora 2 database is ade. The Table 1 shows that for the MFB VAD algorith without non-linear changed el-filter bank outputs, the ajority of errors in voice activity detection are caused by the error when noise is detected as speech (DS) and when noise is recognized as speech after the end of the speech segent (OVER). This error occurs, when according to the reference silence or noise should be detected, but the VAD algorith decides that speech is present. For the MFB VAD algoriths with non-linear changed el-filter bank outputs, the results of which are given in the Table 2, any of errors occur when speech is recognized as noise in the iddle of the speech segent (MSC) and when noise is detected as speech (DS). Table 1: Percentage of frae errors obtained by the MFB VAD algorith without non-linear changed elfilter bank outputs. FEC MSC DS OVER Total Clean.18.3 4.17 3.86 8.51 SR 2.23.59 7.73 6.68 15.22 SR 15.29.85 8.8 6.89 16.1 SR 1.36 1.34 8.25 7. 16.95 SR 5.42 1.98 8.58 7.5 18.48 SR.73 4.38 8.71 7.46 21.27 SR -5 1.48 9.53 8.57 7.56 27.13 Average.53 2.71 7.72 6.71 17.67 Table 2: Percentage of frae errors obtained by the MFB VAD algorith with non-linear changed el-filter bank outputs. FEC MSC DS OVER Total Clean.77 2.92 2.53 1.3 7.24 SR 2 1.38 6.3 1.81.31 9.54 SR 15 1.21 6.3 2.98.69 1.9 SR 1.92 5.65 4.93 1.73 13.22 SR 5.71 5.63 6.78 3.21 16.33 SR.84 8.22 7.81 4.71 21.58 SR -5 1.38 12.73 8.36 5.9 28.36 Average 1.3 6.74 5.3 2.51 15.31 ISB: 978-1-6184-16-7 117

Matheatical Models and Methods in Modern Science Table 3: Percentage of total frae errors obtained by several VAD algoriths on Aurora 2 database. Algorith ITU G.729 ITU G.723.1 ETSI DSR ETSI DSR MFB without MFB with Advanced FE Extended FE non-linear function non-linear function Clean 12.84 19.45 18.57 13.82 8.51 7.24 SR 2 24.53 21.31 15.33 16.17 15.22 9.54 SR 15 26.13 23.29 15.25 17.48 16.1 1.9 SR 1 27.38 24.44 14.89 18.33 16.95 13.22 SR 5 29.13 26.3 14.93 19.26 18.48 16.33 SR 32.23 26.56 16.3 2.73 21.27 21.58 SR -5 35.21 26.58 21.59 23.8 27.13 28.36 Average 26.78 23.99 16.66 18.41 17.67 15.31 Fro results given in Table 3, we can conclude, that MFB VAD algorith with non-linear changed el-filter bank outputs achieved the sallest average percentage of frae errors. 5 Conclusion This paper presents efficient non-linear changed el-filter bank (MFB) voice activity detection (VAD) algorith. The introduction of non-linear function in to the MFB VAD algorith reduces errors obtained by incorrect voice activity detection. In the paper also the results of total frae errors obtained by four VAD algoriths used in the ITU and ETSI standards are presented. The best perforance with the sallest average percentage of frae errors was achieved by the proposed MFB VAD algorith with non-linear changed el-filter bank outputs. Acknowledgeents: This work was partially funded by Slovenian Research Agency (ARRS), under contract nuber: o. P2-69. References: [1] ITU recoendation: Coding of speech at 8 kbit/s using conjugate structure algebraiccode-excited linear-prediction (CS-ACELP) Annex B: A silence copression schee, G.729, Geneva, Switzerland, 1996. [2] ITU recoendation: Dual rate speech coder for ultiedia counications transitting at 5.3 and 6.3 kbit/s. Annex A: Silence copression schee, G.723.1, Geneva, Switzerland, 1996. [3] Freean, D.K., Cosier, G., Southcott, C.B., and Boyd, I., The voice activity detector for the pan European digital cellular obile telephone service, Proceedings of the International Conference on Acoustics, Speech and Signal Processing ICASSP'89, pp. 369-372, Glasgow, Scotland, 1989. [4] Junqua, J.C. and Haton, J.P., Robustness in autoatic speech recognition, Kluwer Acadeic Publishers, orwell, Massachusetts, USA, p.173, 1996. [5] Vlaj, D., Kos, M., Grašič, M., and Kačič, Z., Influence of hangover and hangbefore criteria on autoatic speech recognition. Proceedings of the 16th International Conference on Systes, Signals and Iage IWSSIP'9, Chalkida, Greece, 29. [6] Vlaj, D., Kotnik, B., Horvat, B., and Kačič, Z., A Coputationally Efficient Mel-Filter Bank VAD Algorith for Distributed Speech Recognition Systes. EURASIP Journal on Applied Signal Processing, 4, pp. 487-497, 25. [7] Hirsch, H.G. and Pearce, D., The Aurora experiental fraework for the perforance evaluation of speech recognition systes under noisy conditions, Proceedings of the ISCA ITRW ASR', Paris, France, 2. [8] Kotnik, B., Vlaj, D., and Horvat, B., Efficient oise Robust Feature Extraction Algoriths for Distributed Speech Recognition (DSR) Systes, International Journal of Speech Technology, Vol. 6, o. 3, pp. 25-219, 23. [9] Martin, R., Spectral subtraction based on iniu statistics, Proceedings of the EUSIPCO'94, pp. 1182-1185, Edinburgh, Scotland, UK, 1994. [1] Yasin, A., Speech enhanceent using voice source odels, PhD thesis, University of Waterloo, Ontario, Canada, 1998. [11] ETSI standard docuent: Speech Processing, Transission and Quality aspects (STQ), Distributed speech recognition, Advanced front-end feature extraction algorith, Copression algorith. ETSI ES 22 5 v1.1.1, Valbonne, France, 22. [12] ETSI standard docuent: Speech Processing, Transission and Quality aspects (STQ), Distributed speech recognition, Extended frontend feature extraction algorith, Copression algorith, Back-end speech reconstruction algorith. ETSI ES 22 211 V1.1.1, Valbonne, France, 23. ISB: 978-1-6184-16-7 118