Real time noise-speech discrimination in time domain for speech recognition application

Similar documents
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Correspondence. Voice Activity Detection in Nonstationary Noise. S. Gökhun Tanyer and Hamza Özer

Automotive three-microphone voice activity detector and noise-canceller

EE 6422 Adaptive Signal Processing

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Mikko Myllymäki and Tuomas Virtanen

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Speech/Music Change Point Detection using Sonogram and AANN

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

EE482: Digital Signal Processing Applications

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Audio Fingerprinting using Fractional Fourier Transform

Basic Characteristics of Speech Signal Analysis

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Audio processing methods on marine mammal vocalizations

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Chapter 4 SPEECH ENHANCEMENT

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

Measuring the complexity of sound

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Study of Different Adaptive Filter Algorithms for Noise Cancellation in Real-Time Environment

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies

Noise Reduction for L-3 Nautronix Receivers

Introduction of Audio and Music

Automatic Transcription of Monophonic Audio to MIDI

Speech Recognition using FIR Wiener Filter

IN REVERBERANT and noisy environments, multi-channel

Speech Enhancement Using a Mixture-Maximum Model

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

INDOOR USER ZONING AND TRACKING IN PASSIVE INFRARED SENSING SYSTEMS. Gianluca Monaci, Ashish Pandharipande

High-speed Noise Cancellation with Microphone Array

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Audio Restoration Based on DSP Tools

REAL TIME DIGITAL SIGNAL PROCESSING

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

A multi-class method for detecting audio events in news broadcasts

Using RASTA in task independent TANDEM feature extraction

REAL-TIME BROADBAND NOISE REDUCTION

Wavelet Speech Enhancement based on the Teager Energy Operator

Empirical Mode Decomposition: Theory & Applications

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

NOISE ESTIMATION IN A SINGLE CHANNEL

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice

Implementation of Optimized Proportionate Adaptive Algorithm for Acoustic Echo Cancellation in Speech Signals

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Comparison of LMS Adaptive Beamforming Techniques in Microphone Arrays

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

OFDM AS AN ACCESS TECHNIQUE FOR NEXT GENERATION NETWORK

Speech Enhancement for Nonstationary Noise Environments

Corona noise on the 400 kv overhead power line - measurements and computer modeling

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Single-Channel Speech Enhancement in Variable Noise-Level Environment

Fatigue Life Assessment Using Signal Processing Techniques

DERIVATION OF TRAPS IN AUDITORY DOMAIN

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Adaptive Filters Wiener Filter

Dimensional analysis of the audio signal/noise power in a FM system

Performance Analysis of Feedforward Adaptive Noise Canceller Using Nfxlms Algorithm

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

Background Pixel Classification for Motion Detection in Video Image Sequences

MITIGATION OF VOLTAGE SAGS/SWELLS USING DYNAMIC VOLTAGE RESTORER (DVR)

Voiced/nonvoiced detection based on robustness of voiced epochs

Pushpraj Tanwar Research Scholar in ECE Dept. Maulana Azad National Institute of Technology Bhopal, India

Voice Activity Detection for Speech Enhancement Applications

Robust Low-Resource Sound Localization in Correlated Noise

SGN Audio and Speech Processing

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Abrupt Changes Detection in Fatigue Data Using the Cumulative Sum Method

Impulsive Noise Reduction Method Based on Clipping and Adaptive Filters in AWGN Channel

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

A LPC-PEV Based VAD for Word Boundary Detection

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

Application Notes on Direct Time-Domain Noise Analysis using Virtuoso Spectre

Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection

On the Estimation of Interleaved Pulse Train Phases

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

ECE Digital Signal Processing

A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER. Axel Röbel. IRCAM, Analysis-Synthesis Team, France

GSM Interference Cancellation For Forensic Audio

Speech Enhancement in Noisy Environment using Kalman Filter

ABSTRACT. Introduction. Keywords: Powerline communication, wideband measurements, Indian powerline network

Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio

Speech Compression Using Voice Excited Linear Predictive Coding

Lab 8. Signal Analysis Using Matlab Simulink

Pattern Recognition Part 2: Noise Suppression

Transcription:

University of Malaya From the SelectedWorks of Mokhtar Norrima January 4, 2011 Real time noise-speech discrimination in time domain for speech recognition application Norrima Mokhtar, University of Malaya Available at: https://works.bepress.com/mokhtar_norrima/6/

Scientific Research and Essays Vol. 6(1), pp. 18-22, 4 January, 2011 Available online at http://www.academicjournals.org/sre DOI: 10.5897/SRE09.600 ISSN 1992-2248 2011 Academic Journals Full Length Research Paper Real time noise-speech discrimination in time domain for speech recognition application N. Mokhtar*, H. Arof, F. R. Mahamd Adikan and M. Mubin Department of Electrical Engineering, Faculty of Engineering, University of Malaya, 50603 Lembah Pantai, Kuala Lumpur, Malaysia. Accepted 16 December, 2010 A simple noise-speech discrimination method in time domain is presented. The random signal noise characteristics were studied in time domain. Using the characteristics information, in real time processing, a simple algorithm to detect starting and ending point of speech samples in time domain is being demonstrated. Out of 100 attempts, about 94% of successful attempts of noise-speech discrimination have been obtained with white noise background by choosing the critical thresholds values. Key words: Noise-speech discrimination, noisy signal, local mean. INTRODUCTION In successful automatic speech recognition systems, it is essential to detect the starting and ending point samples of test and reference patterns (Junqua et al., 1994, Rabiner and Juang, 1993). In order to perform the discrimination of noise-speech in real time processing, important criteria that need to be considered are simplicity and reliability. In this case, simplicity is to minimize unnecessary operation to enable the performance of real time analysis in time domain. Reliability is the ability to detect the beginning and ending point of speech samples in time domain with high accuracy. Approach used by Junqua et al. (1994) and Rahmani et al. (2009), for speech-noise discrimination is done in the frequency domain by selecting the energy in the frequency band 250 3500 Hz. Cohen obtained the noise estimation by averaging past spectral power recursively using time-varying frequency-dependent to differentiate speech absence and speech presence (Israel, 2003). Voice activity detection by Tanyer combined two methods which also used energy threshold and overlapping frames of 16 ms as vital information to differentiate the presence of speech and noise (Gökhun and İzer, 2000). Online noise estimation by Zhao has *Corresponding author. E-mail: norrimamokhtar@um.edu.my. Tel: +603-79676806. Fax: +603-79675316. also been conducted in the frequency domain and using overlapping frames of 16 ms (Zhao et al., 2008). In this work, real time noise-speech discrimination is successfully demonstrated in time domain without using any overlapping frames. The noise characteristics under noisy environment such as amplitude and mean of the amplitudes are studied for thresholds values determination. Based on the noise characteristics, an algorithm to perform noise-speech discrimination in starting and ending point of speech samples is proposed. Speech samples with background noise are demonstrated. The extracted speech samples are discussed in the experiments and results section. Signal and noise models Total input signal from unidirectional microphone can be defined as: s ( y( + w( = (1) where y( is the clean speech samples and w( is the background noise. Background noise can consist of white, pink, blue, red and other types of noises. White noise is defined to be a stationary random process having a constant spectral density (Zhao et al., 2008). In this work, background noise w( is white noise which has uniform amplitudes over 10000 samples per second. Statistically, white noise has zero mean value over samples taken in a frame (Brown, 1983).

Mokhtar et al. 19 99 µ wn = s( / N n= 0 µ wn > µ threshold Figure 1. Proposed algorithm for noise-speech processing, and starting and ending point detection of speech samples. 99999 µ wn = s( / N = 0 n= 0 where µ wn is the mean for white noise, N is number of samples per second, which was 10,000 samples per second used in this experiment. The proposed method utilizes the white noise characteristics in order to differentiate between noise-speech signals and capture the speech samples segment. However, characterization of white noise does not provide sufficient information. Therefore, an algorithm and simple statistical method are proposed, which is shown in Figure 1. METHODOLOGY Basically, from Figure 2, white noise characteristics is clearly illustrated which has uniform amplitude across a 1 s frame. By using this characteristic, the threshold (amplitude threshold) is set to 0.1. If the amplitude is greater than the threshold value, the program will trigger the starting point of speech samples. The threshold value is obtained under noisy environment with aircondition and radio background. Theoretically, the mean for all samples in a frame is zero for white noise. Equation 2 is modified to Equation 3. It was used as a trigger to detect ending point detection in speech samples. (2) 99 µ wn = s( / N n= 0 (N=100) By taking the mean of every 100 samples locally, the ending point of speech samples can be determined correctly in time the domain. µ threshold is set to 0.05, which was obtained under testing of various noise conditions such as air-condition and radio background. After the noise-speech discrimination and starting point of speech samples are successfully done, the 100 samples that successfully pass the thresholds, will be saved in memory and the next 100 samples will be appended together with the previous 100 samples of speech signals until ending point of speech samples is detected. Flowchart of the algorithm is shown in Figure 1, which described the whole process involved. EXPERIMENTS AND RESULTS Experimental setup, test conditions and software information are shown in Table 1. Figure 2 demonstrates the background noise with two conditions. It was clearly noticed that background noises with air-condition and (3)

20 Sci. Res. Essays Background Noise: Air-condition Background Noise: Air-condition and radio Figure 2. Observation of white noise with air-condition noise and radio background. Table 1. Input setup, test conditions and software information. Input setup Test conditions Software Unidirectional microphone Silence with air-condition noise One channel, resolution: 16 bits Sampling rate: 10KHz Silence with air-condition noise and slow radio background Labview version 8.2 by National Instruments radio background have about 15-20% higher amplitude as compared to background noise with only air-condition contribution. From Figure 3, examples of speech samples and background noises were illustrated. Two types of speech samples were tested which are forward and stop. Forward speech samples have duration of 0.35 s which was extracted correctly in the trimmed graph from the all waveform graph. Stop speech samples have duration of 0.19 s which was also extracted correctly in the trimmed graph. Mean for all samples was -0.04. By taking the mean of modulus s( locally for every 100 samples, it was demonstrated that the last µ wn after the ending point of speech samples detected was 0.05 for forward samples and 0.04 for stop samples. These values are the triggered value set by the algorithm to detect the ending point of the speech samples. Conclusion Most speech processing techniques involving noisespeech discrimination were done in frequency domain and they used overlapping frames. In this work, although the algorithm was simple, it has been successfully demonstrated that this process can be done in time domain without using overlapping frames in order to save processing time and enable real time speech processing.

Mokhtar et al. 21 Speech samples forward and background noise Cropped speech samples forward White noise Speech samples segment 0.35 s 0.35 s speech samples Speech samples stop and background noise Cropped speech samples stop Radio background noise 0.19 s speech samples 0.19 s Figure 3. Examples of speech samples with air-condition and radio background.

22 Sci. Res. Essays Auto-threshold and speech recognition are future project of this work. REFERENCES Junqua J-C, Mak B, Reaves B (1994). A Robust Algorithm for Word Boundary Detection in the Presence of Noise. IEEE Trans. Speech Audio Processing, 2(3): 406-412. Rahmani M, Yousefian N, Akbari A (2009). Energy-based speech enhancement technique for hands-free commun. Electron. Lett., 45(1): 1-2. Israel C (2003). Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging. IEEE Trans. Speech Audio Processing, 11(5): 466-475. Gökhun S, Tanyer Hİ (2000). Voice Activity Detection in Nonstationary Noise. IEEE Trans. Speech Audio Processing, 8(4): 478-482. Zhao DY, Kleijn WB, Ypma A, De Vries B (2008). Online Noise Estimation Using Stochastic-Gain HMM for Speech Enhancement. IEEE Trans. on Speech Audio Processing, 16(4): 835-846. Brown RG (1983). Introduction to Random Signal Analysis and Kalman Filtering. John Wiley & Sons. Rabiner L, Juang B-H (1993). Fundamentals of Speech Recognition. Prentice-Hall International.