SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Similar documents
Different Approaches of Spectral Subtraction Method for Speech Enhancement

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Auditory modelling for speech processing in the perceptual domain

Chapter 4 SPEECH ENHANCEMENT

/$ IEEE

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Enhancement of Speech Communication Technology Performance Using Adaptive-Control Factor Based Spectral Subtraction Method

REAL-TIME BROADBAND NOISE REDUCTION

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

NOISE ESTIMATION IN A SINGLE CHANNEL

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Adaptive f-xy Hankel matrix rank reduction filter to attenuate coherent noise Nirupama (Pam) Nagarajappa*, CGGVeritas

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

/$ IEEE

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

A Novel Adaptive Method For The Blind Channel Estimation And Equalization Via Sub Space Method

Fundamental frequency estimation of speech signals using MUSIC algorithm

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Efficient utilization of Spectral Mask in OFDM based Cognitive Radio Networks

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

FPGA implementation of DWT for Audio Watermarking Application

RECENTLY, there has been an increasing interest in noisy

EE482: Digital Signal Processing Applications

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Removal of ocular artifacts from EEG signals using adaptive threshold PCA and Wavelet transforms

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Adaptive Noise Reduction Algorithm for Speech Enhancement

Speech Enhancement for Nonstationary Noise Environments

Interpolation of CFA Color Images with Hybrid Image Denoising

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Evoked Potentials (EPs)

Quality Estimation of Alaryngeal Speech

Speech/Music Change Point Detection using Sonogram and AANN

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

HUMAN speech is frequently encountered in several

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

Audio Restoration Based on DSP Tools

Audio Imputation Using the Non-negative Hidden Markov Model

Introduction to Audio Watermarking Schemes

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

A New Approach for Speech Enhancement Based On Singular Value Decomposition and Wavelet Transform

A Spatial Mean and Median Filter For Noise Removal in Digital Images

Modern spectral analysis of non-stationary signals in power electronics

IN RECENT YEARS, there has been a great deal of interest

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Speech Signal Enhancement Techniques

Local Oscillators Phase Noise Cancellation Methods

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

ORTHOGONAL frequency division multiplexing (OFDM)

Evaluation of Audio Compression Artifacts M. Herrera Martinez

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER

DWT BASED AUDIO WATERMARKING USING ENERGY COMPARISON

Digital Audio Watermarking With Discrete Wavelet Transform Using Fibonacci Numbers

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Analysis on Acoustic Attenuation by Periodic Array Structure EH KWEE DOE 1, WIN PA PA MYO 2

Multi Modulus Blind Equalizations for Quadrature Amplitude Modulation

Wavelet Speech Enhancement based on the Teager Energy Operator

Estimation of Non-stationary Noise Power Spectrum using DWT

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Microphone Array Design and Beamforming

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

Smart antenna for doa using music and esprit

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Performance Improvement of OFDM System using Raised Cosine Windowing with Variable FFT Sizes

An SVD Approach for Data Compression in Emitter Location Systems

Location of Remote Harmonics in a Power System Using SVD *

Robust Low-Resource Sound Localization in Correlated Noise

Audio Signal Compression using DCT and LPC Techniques

MULTIPLE transmit-and-receive antennas can be used

WAVELET OFDM WAVELET OFDM

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

A Novel Approach for MRI Image De-noising and Resolution Enhancement

Blind Pilot Decontamination

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

SIGNAL MODEL AND PARAMETER ESTIMATION FOR COLOCATED MIMO RADAR

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

612 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 48, NO. 4, APRIL 2000

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Speech Enhancement in Noisy Environment using Kalman Filter

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

VQ Source Models: Perceptual & Phase Issues

Audio Fingerprinting using Fractional Fourier Transform

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich *

Sound pressure level calculation methodology investigation of corona noise in AC substations

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Live multi-track audio recording

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

Transcription:

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING K.Ramalakshmi Assistant Professor, Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore R.N.Devendra Kumar Assistant Professor, Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore Abstract-A novel technique is presented to design the signal subspace speech enhancement based on perceptual post filtering. Firstly, by subspace filter the noisy speech is enhanced. The underlying principle is to decompose the vector space of the noisy signal into a signal plus noise subspace and a noise subspace. The decomposition can theoretically be performed by applying the Karhunen-Loeve transform to the noisy signal. Then for reducing stationary noise added to speech in noise Environments spectral subtraction is used. Finally, by a perceptual filter based on hearing masking effect the enhanced speech is smoothed, the clean speech is gained. Keywords-Karhunen loeve transform, Spectral subtraction, Perceptual filter. I. INTRODUCTION In most speech enhancement systems, musical noise can be attributed to errors in measuring noise statistics. This auditory annoyance resembles a sum of sinusoids of changing frequencies, turning off and on over successive frames. Signal subspace techniques eliminate musical noise originating from fluctuating energy estimates by averaging over long windows. However, other artefact sources exist. These include rapid changes of model order and subspace swapping. The latter condition refers to noise basis vectors being incorrectly employed to describe the signal subspace. This paper presents a methodology to quell artefacts produced by signal subspace techniques. A perceptual postfilter is placed at the output of the signal subspace filters to smooth the enhanced signal. It will be shown that psychoacoustic knowledge can attenuate imperfections with minimal distortion to the speech signal being recovered. Perception has been employed to the speech enhancement problem on several occasions. In [1, 2, 3], it was shown that the utilization of properties of the human auditory system has the capability to attenuate noise without distortion. Rezayee and Gazor [5] incorporated coloured noise handling into their algorithm by diagonalizing the noise correlation matrix using the estimated eigenvalues of the clean speech and nulling any off-diagonal elements. In addition, they incorporated subspace using the projection approximation algorithm developed by Yang[6].Jabloun showed in [8] that knowledge of the ear can improve parameter estimates for signal subspace techniques. In this work, filter coefficients are derived using eigen values which are calculated by projecting the excitation pattern of the noisy signal onto the squared magnitude of the individual eigenvectors.. Limiting the attenuation in an enhancement scheme can decrease distortion. In this application, the perceptual filter accomplishes this by attenuating artefacts until they lie close to the masking threshold. As such, some of the artefact which is imperceptible is retained. By attenuating less, it is expected that fewer disturbances will be produced. Spectral averaging increases the width of tones within the noise residual according to the resolution of the ear. Temporal averaging, by limiting magnitude changes of the noise residual over several frames, effectively attenuates musical noise. Rapid frame to- frame spectrum variations are with high probability, the product of noise. By considering human perception, artefacts can be smoothed without noticeably altering the underlying speech signal. Vol. 2 Issue 1 January 2013 28 ISSN: 2278-621X

Signal subspace filter Perceptual post filter Fig 1.1 Proposed Hybrid System The proposed hybrid system is illustrated by Fig. 1.1. It is a two-stage approach which is used to enhance the noisy speech in actual environment. This paper will possess the following structure: Section 2 deals with the principle of the method, Section 3 deals with the Methodology of the work done, Section 4 with the Experimental results, Section 5 with the performance evaluation, and the conclusion is given in Section 6. II. PRINCIPLE OF THE METHOD 2.1 SIGNAL SUBSPACE FILTER Signal subspace based speech enhancement techniques decompose M-dimensional spaces into two subspaces: a signal subspace and a noise subspace. It is assumed that the speech signal can lie only within the signal subspace while the noise spans the entire space. Only the contents of the signal subspace are used to estimate the original speech signal. This chapter will describe the process of decomposing the complex space into orthogonal subspaces. The speech enhancement problem will be described as a speech signal x being transmitted through a distortionless channel that is corrupted by additive noise w. The resulting noisy speech signal y can be expressed as y = x + w where x = [,...,, w = [,..., and y = [,...,. The observation period has been denoted as M. Henceforth, the vectors w, x, y will be considered as part of. The speech enhancement system will attempt to estimate the original signal using a single channel of received speech. 2.1.1 Karhunen-Lo`eve Expansion It has been shown in many applications that the KL expansion is an excellent basis for dimensionality reduction. The following definition is from Haykin: Definition 1 (Karhunen-Lo`eve Expansion) Let the M-by-1 vector u denote a data sequence drawn from a wide-sense stationary process of zero mean and correlation matrix Let q1, q2,..., be eigenvectors associated with the M eigenvalues of the matrix. The vector u may be expanded as a linear combination of these eigenvectors as follows The coefficients of the expansion are zero-mean, uncorrelated random variables defined by the inner product = u It can be shown that the KL expansion will always exist for a WSS random process using the spectral theorem.clearly, as all WSS processes have Hermitian correlation matrices, they are diagonalizable. Even, if the correlation matrix is singular, the KL expansion will still exist. However, the column vectors of Q will not be linearly independent. 2.1.2 Subspace Decomposition Using Karhunen-Lo`eve Expansion Vol. 2 Issue 1 January 2013 29 ISSN: 2278-621X

If an eigendecomposition is performed on the correlation matrix of the speech signal x, the following form is obtained The eigenvector matrix Q has been partitioned into two sub-matrices, Q1 and Q2. The matrix Q1 contains eigenvectors corresponding to non-zero eigenvalues. These eigenvectors form a basis for the signal subspace. Meanwhile, Q2 contains the eigenvectors which span the noise subspace. The matrix Q1QH1 is idempotent (P2 = P), Hermitian and span(q1) = span(v ). Thus, Q1QH1 is a projector onto the signal subspace. Similarly, Q2QH2 is the projector onto the noise subspace. As both subspaces complete CM, any input vector can be represented as The expected power of a Karhunen-Lo eve coefficient can be shown to be equal to u= As the eigenvectors which make up Q2 have null eigenvalues, they contribute no energy to the speech signal. As such, they can be omitted in a KL expansion without introducing error. The noise subspace eigenvectors, corresponding to a zero eigenvalue with multiplicity M K, apart from being orthogonal to each other, are arbitrary.thus, a reduced rank representation for the signal u will have the form { 2.2 PERCEPTUAL POST FILTERING The subspace filter described to be effective in improving the Signal-to-Noise Ratio (SNR) of an speech signal. Though, this method has also been found to introduce artefacts into the enhanced signal. These artefacts are known as musical noise and have often been evaluated as being more disturbing than the original corrupting noise. To remove these annoyances, a perceptual post-filter will be employed. 2.2.1 Spectral Subtraction An estimate of the clean speech signal is required for an accurate masking threshold. This coarse approximation will be obtained from the generalized spectral subtraction algorithm.spectral subtraction is based on the relationship for signals corrupted by uncorrelated noise sy = sx + sw Clearly, the magnitude response of the speech signal can be estimated from power subtraction. The noisy phase is retained in the enhancement system. 2.2.2 Masking threshold Masking[10] is the phenomenon where the perception of one sound is obscured by the perception of another. A masker obscures a weaker signal known as the maskee. It is common to also refer to the maskee as the probe, target or signal. The threshold level above which a signal becomes audible in the presence of a masker is known as the masking threshold. Masking effects occur when two sounds occur at the same time or when separated by a small delay. The former is known as simultaneous masking while the latter is known as temporal masking. As the masking threshold is insensitive to phase, this approximation should not affect the performance of the perceptual post-filter. This system will smooth the output of the signal subspace filter and reduce the prominence of the musical noise. By utilizing properties of the human auditory system, the underlying speech signal should remain largely undistorted. Vol. 2 Issue 1 January 2013 30 ISSN: 2278-621X

2.2.3 Psychoacoustic Filter The psychoacoustic filter eliminates audible noise using a perceptual criterion. It is designed in the frequency domain to allow the vast sums of knowledge related to auditory perception to be applied. It will be shown that the incorporation of the principle of masking into an auditory post-filter will reduce these audible artefacts. Finally, an algorithm based on signal subspace methods utilizing an auditory post-filter will be outlined. It is the goal of the perceptual post-filter to remove all traces of musical noise. Its strengths are two-fold: (1) distortion is minimized by attenuating only what is audible, and (2) peaks within the noise residual are smoothed by spectral and temporal averaging. However, the underlying speech should not be affected. Such systems have been used successfully in for speech enhancement. Limiting the attenuation in an enhancement scheme can decrease the production of artefacts. Perceptual filters accomplish this by suppressing until the residual noise lies below the masking threshold. As such, some noise which is imperceptible is retained. By attenuating less, it is expected that fewer disturbances will be produced. For the listener, there should not be a discernible increase in residual noise as compared with conventional algorithms. III. METHODOLOGY The speech signal is sampled at a rate of 8000HZ. The signal is decomposed into a fixed size frames. Each framed values are transformed using karhunen loeve transformation which decompose the subspace into signal subspace and noise subspace. A rectangular analysis window is applied to the data prior to signal subspace filtering. After application of the post-filter, a sine-squared synthesis window is utilized for reconstruction. The signal subspace is given to the signal subspace filter to suppress the further noise. The noise correlation matrix and the output of signal subspace filter is given as input to the perceptual post filter for attenuating the noise. The signal subspace filter will be modified to suppress musical noise by appending a perceptual post-filter to the output of the signal subspace filter. It should be stressed that this filter does not significantly attenuate the noise. Rather, it smoothes its input in a manner that musical noise is diminished and speech is unaffected. Fig.3.1 Block diagram of the hybrid system A flow-chart describing the operation of the modified speech enhancement scheme can be found in Fig. 3.1. The signal subspace filter operates most effectively when utilizing very short frames (< 15 ms). The psychoacoustic filter attempts to conceal the salient noise using the perceptual properties of the ear while minimizing the distortion Vol. 2 Issue 1 January 2013 31 ISSN: 2278-621X

to the underlying speech. This block is signal dependent, requiring an estimate of the noise correlation matrix and the masking threshold of the speech signal,to calculate an appropriate gain. The input to the psychoacoustic filter is output frames from the signal subspace filter. The frames are combined by the overlap-add block which utilizes appropriate windows and overlap length.as the clean speech signal is unavailable, it is necessary to estimate the masking threshold of the speech signal from noisy data. Thus, the spectra of the clean speech is estimated using the spectral subtraction technique. IV. EXPERIMENTAL RESULTS The proposed algorithm for signal subspace speech enhancement is implemented and tested using speech files sampled at a frequency of 8KHZ at 16 bit rates. The speech wave file is converted into 16bit ASCII values. The raw values are applied to karhunen loeve transform to separate the speech and noise signal.the sample input signal with speech and noise is shown in Fig 4.1 and the sample output signal is shown in Fig 4.2 Fig 4.1 Clean and Noisy Speech Signal Fig 4.2 Output Signal Vol. 2 Issue 1 January 2013 32 ISSN: 2278-621X

V. PERFORMANCE EVALUATION As an objective measure, segmental signal-to-noise ratio (SNRseg) and weighted spectral slope (WSS) are used in the evaluation. The weighted spectral slope measure is calculated,using the formula where k and are related to overall sound pressure level of the original and enhanced utterances, and is a parameter which can be varied to increase overall performance. Signal-to-noise ratio is used for evaluation of the Quality of random signal transmission. signal-to-noise ratio in decibels can be calculated, using the formula. Where x (n) and y (n) are speech signals in discrete time. Both, SVD based signal subspace and spectral subtraction noise reduction schemes were tested and compared in enhancing speech signals, which have been degraded by computer generated additive white Gaussian noise at different SNR Table 5.1 levels. SNR(db) Table 5.1 signal to noise ratio calculation KLT based Signal Subspace 0 10.44 10.4 Spectral subtraction(ss) 5 8.66 7.91 10 6.95 5.86 15 5.32 4.20 VI. CONCLUSION In this work, a frame-work to attenuate musical noise produced by signal subspace speech enhancement methods was presented. This speech restoration system incorporates the auditory concept of masking to smooth spectral parameters. Through informal listening tests, it has been shown that this algorithm is effective at attenuating musical noise while leaving speech relatively undistorted. It has been further ascertained that the speech enhancement algorithm is well suited for many adverse noise environments. Their performance is evaluated using measures segmental signal-to-noise ratio (SNRseg) and weighted spectral slope (WSS). REFERENCES [1] M. Dendrinos, S. Bakamidis, and G. Carayannis, Speech enhancement from noise: A regenerative approach, Speech Communication, vol. 10, pp. 45 57, Feb. 1991. [2] Y. Ephraim and H. L. V. Trees, A signal subspace approach for speech enhancement, IEEE Trans. Speech and Audio Processing, vol. 3, pp. 251 266, July 1995. Vol. 2 Issue 1 January 2013 33 ISSN: 2278-621X

[3] J. Huang and Y. Zhao, An energy-constrained signal subspace method for speech enhancement and recognition in colored noise, in Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, vol. 1, (Seattle, WA), pp. 377 380, May 1998. [4] J. Huang and Y. Zhao, A DCT-based fast signal subspace technique for robust speech recognition, IEEE Trans. Speech and Audio Processing, vol. 8, pp. 747 751, Nov. 2000. References 87 [5] A. Rezayee and S. Gazor, An adaptive KLT approach for speech enhancement, IEEE Trans. Speech and Audio Processing, vol. 9, pp. 87 95, Feb. 2001. [6] B. Yang, Projection approximation subspace tracking, IEEE Trans. Signal Processing, vol. 43, pp. 95 107, Jan. 1995. [7] U. Mittal and N. Phamdo, Signal/noise KLT based approach for enhancing speech degraded by colored noise, IEEE Trans. Speech and Audio Processing, vol. 8, pp. 159 167, Mar. 2000. [8] F. Jabloun and B. Champagne, On the use of masking properties of the human ear in the signal subspace speech enhancement approach, in Int. Workshop on Acoustic Echo and Noise Control, (Darmstadt, Germany), Sept. 2001. [9] G. A. Soulodre, Camera Noise from Film Soundtracks. Ph.D. thesis, McGill University, Department of Electrical Engineering, Nov. 1998. [10] N. Virag, Signal channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech and Audio Processing, vol. 7, pp. 126 137, Mar. 1999. Vol. 2 Issue 1 January 2013 34 ISSN: 2278-621X