NOISE ESTIMATION IN A SINGLE CHANNEL

Similar documents
Speech Enhancement using Wiener filtering

Speech Enhancement Based On Noise Reduction

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Signal Enhancement Techniques

The proposal should be accepted as part of PHY standard for BWA.

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Different Approaches of Spectral Subtraction Method for Speech Enhancement

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Speech Enhancement in Noisy Environment using Kalman Filter

EE482: Digital Signal Processing Applications

Speech Enhancement Using a Mixture-Maximum Model

Abstract Dual-tone Multi-frequency (DTMF) Signals are used in touch-tone telephones as well as many other areas. Since analog devices are rapidly chan

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Auditory modelling for speech processing in the perceptual domain

Robust Low-Resource Sound Localization in Correlated Noise

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Robust telephone speech recognition based on channel compensation

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM

REAL-TIME BROADBAND NOISE REDUCTION

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

ROBUST echo cancellation requires a method for adjusting

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research

Audio Restoration Based on DSP Tools

PHASE DIVISION MULTIPLEX

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

A NOVEL VOICED SPEECH ENHANCEMENT APPROACH BASED ON MODULATED PERIODIC SIGNAL EXTRACTION. Mahdi Triki y, Dirk T.M. Slock Λ

AM Limitations. Amplitude Modulation II. DSB-SC Modulation. AM Modifications

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Performance Optimization in Wireless Channel Using Adaptive Fractional Space CMA

Amplitude Modulation II

Biosignal filtering and artifact rejection. Biosignal processing, S Autumn 2012

Acoustic Echo Cancellation using LMS Algorithm

Speech Synthesis using Mel-Cepstral Coefficient Feature

Noise Plus Interference Power Estimation in Adaptive OFDM Systems

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

IMPULSE NOISE CANCELLATION ON POWER LINES

Online Version Only. Book made by this file is ILLEGAL. 2. Mathematical Description

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

THERE are numerous areas where it is necessary to enhance

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Linguistic Phonetics. Spectral Analysis

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

B.Tech II Year II Semester (R13) Supplementary Examinations May/June 2017 ANALOG COMMUNICATION SYSTEMS (Electronics and Communication Engineering)

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Enhancement of Speech Communication Technology Performance Using Adaptive-Control Factor Based Spectral Subtraction Method

Architecture design for Adaptive Noise Cancellation

Multi Modulus Blind Equalizations for Quadrature Amplitude Modulation

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

RECENTLY, there has been an increasing interest in noisy

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Phase estimation in speech enhancement unimportant, important, or impossible?

QUESTION BANK SUBJECT: DIGITAL COMMUNICATION (15EC61)

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Blind Equalization Using Constant Modulus Algorithm and Multi-Modulus Algorithm in Wireless Communication Systems

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Speech Enhancement Based on Audible Noise Suppression

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Exploring QAM using LabView Simulation *

Data Communications & Computer Networks

(Refer Slide Time: 2:23)

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

Code No: R Set No. 1

Can binary masks improve intelligibility?

1. Motivation. 2. Periodic non-gaussian noise

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

COMM 601: Modulation I

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

OFDM Systems For Different Modulation Technique

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication

ORTHOGONAL frequency division multiplexing

GSM Interference Cancellation For Forensic Audio

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

AM, PM and FM mo m dula l ti t o i n

Department of Electronics and Communication Engineering 1

Performance Evaluation of different α value for OFDM System

Speech, music, images, and video are examples of analog signals. Each of these signals is characterized by its bandwidth, dynamic range, and the

CME312- LAB Manual DSB-SC Modulation and Demodulation Experiment 6. Experiment 6. Experiment. DSB-SC Modulation and Demodulation

Local Oscillators Phase Noise Cancellation Methods

Outline. Communications Engineering 1

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Enhancement of Speech in Noisy Conditions

A Novel Hybrid Technique for Acoustic Echo Cancellation and Noise reduction Using LMS Filter and ANFIS Based Nonlinear Filter

1. Introduction. 2. OFDM Primer

Transcription:

SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina 778-9 Technical Report DSPL-96-3 Abstract Speech signals are often degraded by additive interference over single channel communication systems. For stationary and well dened noise sources, eective solutions exist. However, it is often dicult to formulate a model for non-stationary and speech-like noise sources such as cross-talk or multi-speaker babble, which exist in real scenarios. In this paper, we propose a solution to this problem under the assumption that we have access to the clean speech signal prior to transmission. A novel method for tracking transmission noise characteristics is described. Based on this noise estimate, a new speech enhancement technique is proposed. The enhancement method is evaluated for multi-speaker babble noise, and shown to substantially improve both the quality and intelligibility of the processed speech signal. Mail All Correspondence To: Prof. John H.L. Hansen Duke University Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Durham, North Carolina 778-9 U.S.A. internet email: jhlh@ee.duke.edu Phone: 99-66-556 FAX: 99-66-593 IEEE SA EDICS Code: SPL.SA..5 Speech Enhancement submitted Jan. 9, 996 to IEEE Signal Processing Letters. Revised July, 996.

Introduction In single channel voice communication systems, it is often dicult to characterize background interfering noise. The channel distortion normally possesses nonstationary statistics, and can contain correlated interference (e.g., another speaker's voice). However, most models developed for single channel speech enhancement systems assume that background noise is stationary and/or uncorrelated[,, 4, 7]. Although the fundamental principles behind these enhancement methods are well dened, in practice, the limitations set by their assumptions play a major role in their performance across actual distortions. The reason for this is that they rely on a good estimate of the noise characteristics, which can have serious consequences when the assumptions are violated. Another limitation of traditional methods is that they rely on a short-time stationarity assumption, which is not valid for some speech classes such as stop consonants. As a result, these enhancement algorithms can introduce artifacts which reduce overall speech intelligibility. Methods have been proposed which seek to preprocess clean speech prior to transmission across a channel in an eort to increase intelligibility [8, 9]. Unfortunately, these methods generally compromise overall speech quality as a result of their processing. In this paper, we propose a time-division multiplexing based scheme to track the channel noise characteristics without imposing any constraints on the noise type. The proposed method is very simple, however it requires access to the clean speech signal prior to transmission as in [8, 9]. The method is based on padding the signal with zeros at the transmitter prior to transmission across the channel, and estimating the noise characteristics from the original zero samples that are now degraded when collected at the receiver. Since most noise signals have correlation between successive samples (especially speech-like interference), the noise samples that are added to a signal sample will be very similar to those noise samples that are added to the closest zero sample. Therefore, even if the degraded zero samples from neighboring signal samples are subtracted at the output, the resulting speech will possess both higher quality and intelligibility. The outline of this paper is as follows: In Sec., the zero-padding procedure for channel noise estimation is presented. In Sec. 3 the evaluations including multi-speaker babble noise interference are presented. Finally, in Sec. 4 the conclusions are presented. Zero-padding procedure The procedure for obtaining the noise estimate is shown in Fig., where the top plot shows the transmitted signal s(n), which is padded with zeros at every other sample. The second plot corresponds

to the interference signal d(n), which is assumed to be an additive noise distortion due to the channel. The resulting signal at the receiver y(n) is shown at the bottom plot. In this procedure, the noise is estimated from the original zero samples which are marked with dashed lines. One approach for enhancing the output signal is to simply subtract the noise estimate (marked with dashed lines) from the noisy speech signal (marked with solid lines in the bottom plot). This method will be referred to as sample subtraction. To improve upon the enhancement procedure, interpolation techniques can be applied in order to obtain a better estimate of the noise signal which interferes with non-zero samples. It should be noted that the resampling process at the receiver should be synchronized to the transmitter sampling via a phase-locked loop. An error analysis of the phase estimation process is described as follows: Suppose we have an amplitude-modulated signal of the form s(t) = A(t) cos(f c t + ) If we demodulate the signal by multiplying s(t) with the carrier reference c(t) = cos(f c t + ^) we obtain c(t)s(t) = A(t) cos(? ^) Note that the eect of the phase error? ^ is to reduce the signal level in voltage by the amount cos(? ^), and in power by the amount cos (? ^). Hence, a phase error of results in a signal power loss of.3 db, and a phase error of 3 results in a signal power loss of.5 db in an amplitude modulated signal. The level of signal power loss for pulse amplitude modulated signals is not signicant as the above analysis suggests. However, the phase error can become a critical factor for quadrature amplitude modulation (QAM) and M-phase-shift keying (M-PSK) signals. One disadvantage of the zero-padding sample subtraction procedure is that it requires twice the data rate, or two times the size of the original channel bandwidth for transmission. In order to reduce the bandwidth requirement, zero padding can be based on the degree of correlation between successive noise samples, so that zeros may be padded every second sample, third sample, etc. This will reduce the bandwidth requirement from / to 3/, 4/3 times, etc., respectively. However, reducing the bandwidth will result in a less accurate estimate of the noise characteristics. Based on the particular voice communication application, and available channel bandwidth, an appropriate value can be estimated experimentally.

Another issue is the non-ideal lter characteristics of the band limited channel. Under ideal conditions, the channel can be modeled as a lter with perfect pass-band/stop-band characteristics, and therefore each sampled pulse of speech spaced f s apart will produce a sinc function ( sin(x) ) type re- x sponse, but will still maintain a null at the intermediate point in time between samples. Since these null sample locations correspond to noise estimate samples in our formulation, the ideal channel lter characteristics will not result in distortion of the noise estimate. However, in practice, the channel lter characteristics may not be ideal. This will produce a smearing of the speech pulses which would result in leakage into the zero valued samples reserved for the noise estimate. This problem can be resolved to some extent by employing an adaptive lter to remove the smeared component of the speech signal from the noise signal. A number of techniques for echo cancellation found in the literature [5, 6] could be employed to address this issue. It is important to note that the sample subtraction method will be more eective if the successive noise samples are correlated. However, if the successive noise samples are uncorrelated, then speech enhancement could be performed in the frequency domain using one of the traditional approaches, such as Spectral Subtraction or Wiener ltering, on a frame-by-frame basis. Since both of these methods require a good noise estimate, the proposed noise estimation procedure will increase frequency domain speech enhancement performance as well. One of the most important advantages of the proposed method is that it does not require any stationarity assumption, which is a major problem for existing speech enhancement techniques. The reason for this is that the noise estimate is updated automatically for every other input sample. The degree of correlation between successive noise samples plays a major role in deciding between sample subtraction or traditional speech enhancement methods for receiver-end speech enhancement. As mentioned above, for either case, zero-padding based noise estimation will improve the performance substantially. However, in order to achieve the highest level of performance, a decision mechanism between the two processing methods can be embedded in the speech enhancement structure at the receiver. The criterion for switching between the methods would depend on the degree of correlation between successive noise samples. In order to formulate a mathematical expression for the degree of correlation, we dene the sequences X and Y as: X = d n d n+ d n+ :::d n+n Y = d n? d n d n+ :::d n+n? () where N is a predened frame length. Next, the correlation coecient between X and Y is obtained 3

using this expression: = K XY X Y ; () where K XY is the covariance which is dened as follows, K XY = E[(X? m X )(Y? m Y )]; (3) where m X and m Y are the means of X and Y respectively. The correlation coecient can be updated every other input sample in order to direct the enhancement decision mechanism as needed. NOISE ESTIMATION IN A SINGLE CHANNEL s(n) n d(n) n y(n) n Noise estimate Figure : Zero-padding procedure for accurate noise estimation, where s(n) is the transmitted zero-padded clean speech signal, d(n) is the interference signal that is added in the channel, and y(n) is the output noisy signal at the receiver. 3 Evaluations In these evaluations, speech consisted of continuous sentences from the TIMIT speech database, downsampled to an 8 khz sample rate. For the rst evaluation, the proposed speech enhancement method is applied to the problem of enhancing multi-speaker babble noise interference [3]. Fig. (a) shows the time waveform and corresponding spectrogram for the utterance \Often you'll" which is part of the TIMIT sentence \Often you'll get back more than you put in" spoken by a male speaker. Fig. (b) corresponds to the degraded waveform and its spectrogram with - db SNR of babble noise. At this level of noise, the original speech signal is not distinguishable. Competing speaker formant tracks 4

are also clearly visible in the adjoining speech spectrogram. However, after applying the proposed method of enhancement, the original clean signal is recovered with virtually no perceived residual noise. A portion of the recovered signal is shown in Fig. (c). The mean square error between the degraded signal and the original signal improved from 653 to 37 after applying the sample subtraction enhancement procedure. Next, a degrading sinusoidal interference was considered. As previously mentioned, the eectiveness of the algorithm is more pronounced when the noise interference is more highly correlated, which is the case for the sinusoidal interference. Fig 3(a) shows the original utterance \Often you'll". Fig 3(b) corresponds to the degraded waveform and its spectrogram with -3 db sinusoidal interference at 7 Hz. At this level of noise, listener evaluation indicate that only the single tone is heard, and virtually no speech signal can be perceived. However, after applying the proposed enhancement method, the original signal is completely recovered, as can be seen in Fig. 3(c). Here the mean square error drops from 4775 to 4 after applying the sample subtraction enhancement procedure. (a) (b) (c) 4 x 4 3 4 5 6 7 4 x 4 3 4 5 6 7 4 x 4.8.6.4. 5 5 5 3.8.6.4. 5 5 5 3.8.6.4. 3 4 5 6 7 5 5 5 3 Figure : The time waveforms and spectrograms for the utterance \Often you'll" from the TIMIT sentence \Often you'll get back more than you put in". (a) Original utterance (b) degraded with - db multi-speaker babble noise (c) enhanced using the zero-padding procedure. 5

(a) (b) (c) 4 x 4 3 4 5 6 7 x 5.5.5 3 4 5 6 7 4 x 4.8.6.4. 5 5 5 3.8.6.4. 5 5 5 3.8.6.4. 3 4 5 6 7 5 5 5 3 Figure 3: The time waveforms and spectrograms for the utterance \Often you'll" from the TIMIT sentence \Often you'll get back more than you put in". (a) Original utterance (b) degraded with -3 db sinusoidal interference (c) enhanced using the zero-padding procedure. 6

4 Conclusions In this paper, a new method for estimating degrading noise characteristics was proposed, and integrated into a speech enhancement scheme. Our proposed method assumed access to the clean speech signal prior to transmission. The method is based on simply padding the signal with zeros at every other sample in order to characterize the background noise in the communications system at the receiver. Using the proposed method, it has been shown that the original speech can be easily reconstructed in the presence of such noise sources as multi-speaker babble noise or sinusoidal interference. The usage of the method is illustrated here for nonstationary and correlated noise types, since this noise type normally causes traditional speech enhancement algorithms to fail. In closing, it should be mentioned that the method is exible enough to accommodate many typical noise sources, and quite appropriate for real-time implementation. References [] L.M. Arslan, A. McCree, and V. Viswanathan. \New Methods for Adaptive Noise Suppression". In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, volume, pages 8{85, Detroit, USA, May 995. [] S.F Boll. \Suppression of Acoustic Noise in Speech Using Spectral Subtraction". IEEE Trans. on Acoustics, Speech, and Signal Processing, pages 3{, 979. [3] J.H.L. Hansen and L.M. Arslan. \Robust Feature Estimation and Objective Quality Assessment for Noisy Speech Recognition using Credit Card Corpus". IEEE Trans. on Speech & Audio Proc., 3(3):69{84, 995. [4] J.H.L. Hansen and M.A. Clements. \Constrained iterative speech enhancement with application to speech recognition". IEEE Trans. on Signal Processing, 39(4):795{85, 99. [5] S. Haykin. Adaptive Filter Theory (nd Edition). Prentice-Hall, Englewood Clis, N.J., 99. [6] J.S. Lim. Speech Enhancement. Prentice-Hall, Englewood Clis, N.J., 983. [7] J.S. Lim and A.V. Oppenheim. \All-pole modeling of degraded speech". IEEE Trans. on Acoust., Speech and Signal Processing, 6:97{, 978. [8] R.J. Niederjohn and J.H. Grotelueschen. \The enhancement of speech intelligibility in high noise levels by high-pass ltering followed by rapid amplitude compression". IEEE Trans. on Acoustics, Speech, and Signal Processing, 4(4), August 976. [9] I.B. Thomas and R.J. Niederjohn. \The Intelligibility of Filtered-Clipped Speech in Noise". The Journal of the Audio Engineering Society, 8(3):99{33, June 97. 7