Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Similar documents
An evaluation on comfortable sound design of unpleasant sounds based on chord-forming with bandlimited sound

Multiple Audio Spots Design Based on Separating Emission of Carrier and Sideband Waves

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

An evaluation of discomfort reduction based on auditory masking for railway brake sounds

The Steering for Distance Perception with Reflective Audio Spot

EE482: Digital Signal Processing Applications

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Fundamental frequency estimation of speech signals using MUSIC algorithm

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Speech Synthesis using Mel-Cepstral Coefficient Feature

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Enhancement using Wiener filtering

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

Speech Enhancement Based On Noise Reduction

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises

Rec. ITU-R F RECOMMENDATION ITU-R F *,**

Improving Sound Quality by Bandwidth Extension

Linguistic Phonetics. Spectral Analysis

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Chapter 4 SPEECH ENHANCEMENT

Adaptive Noise Reduction Algorithm for Speech Enhancement

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Signal Enhancement Techniques

VQ Source Models: Perceptual & Phase Issues

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Modulation Domain Spectral Subtraction for Speech Enhancement

QUANTIZATION NOISE ESTIMATION FOR LOG-PCM. Mohamed Konaté and Peter Kabal

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

HISTOGRAM BASED APPROACH FOR NON- INTRUSIVE SPEECH QUALITY MEASUREMENT IN NETWORKS

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

INTERNATIONAL TELECOMMUNICATION UNION

Voice Activity Detection

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Bandwidth Extension for Speech Enhancement

Digitally controlled Active Noise Reduction with integrated Speech Communication

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Synthesis Algorithms and Validation

An Adaptive Adjacent Channel Interference Cancellation Technique

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Near-sound-field Propagation Based on Individual Beam-steering for Carrier and Sideband Waves with Parametric Array Loudspeaker

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

High-speed Noise Cancellation with Microphone Array

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Audio Restoration Based on DSP Tools

REAL-TIME BROADBAND NOISE REDUCTION

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

Research Article DOA Estimation with Local-Peak-Weighted CSP

ZLS38500 Firmware for Handsfree Car Kits

Mikko Myllymäki and Tuomas Virtanen

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

C/N Ratio at Low Carrier Frequencies in SFQ

CHAPTER 3 Noise in Amplitude Modulation Systems

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

NOISE ESTIMATION IN A SINGLE CHANNEL

Machine recognition of speech trained on data from New Jersey Labs

Estimation of Non-stationary Noise Power Spectrum using DWT

Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation

Performance Analysis of Impulsive Noise Blanking for Multi-Carrier PLC Systems

Experimental study of traffic noise and human response in an urban area: deviations from standard annoyance predictions

ORIGINAL ARTICLE A COMPARATIVE STUDY OF QUALITY ANALYSIS ON VARIOUS IMAGE FORMATS

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

Statistics, Probability and Noise

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

COM 12 C 288 E October 2011 English only Original: English

Cepstrum alanysis of speech signals

Speech Enhancement for Nonstationary Noise Environments

DWT based high capacity audio watermarking

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Adaptive noise level estimation

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

PHASE DIVISION MULTIPLEX

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

RECOMMENDATION ITU-R P The prediction of the time and the spatial profile for broadband land mobile services using UHF and SHF bands

Speech Signal Analysis

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning

Response spectrum Time history Power Spectral Density, PSD

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Voice Activity Detection for Speech Enhancement Applications

Chapter IV THEORY OF CELP CODING

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

SPEECH AND SPECTRAL ANALYSIS

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Transcription:

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu, Japan 2 College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Japan ABSTRACT Development of communication systems allows people to easily record and distribute their speech. The clipping-noise, however, degrades the sound quality in the speech recording when gain level of input signals is excessive in the maximum range of an amplitude. In this case, it is necessary to suppress the clippingnoise in the observed speech for improving its sound quality. Although a linear prediction method has been conventionally proposed for suppressing the clipping-noise, it has a problem with degradation of the restoration performance by cumulating error when the speech includes a large amount of the clipping-noise. This paper describes a method for the clipping-noise suppression for the stationary-noisy speech based on the spectral compensation in a noisy environment. In this method, to suppress the clipping-noise, the Gaussian mixture models are utilized for modeling the power spectral envelope of the speech on each frame in the lower frequency band. The clean speech signals in a database are also utilized for restoring the clipping speech in the higher frequency band. We carried out evaluation experiments with a speech quality, and confirmed the effectiveness of the proposed method for the speech which includes a large amount of the clipping-noise. Keywords: Clipping-noise, Spectral envelope, Spectral compensation I-INCE Classification of Subjects Numbers: 0.4. INTRODUCTION Recent speech communication systems help people to easily record their speech with high-quality. It is necessary for accurately recording the speech to properly set gain level of input signals. In the recording, the clipping-noise is one of the problem which deteriorates the sound quality of a speech signal. It is generated when the amplitude of an input signal unnecessarily exceeds the maximum allowance range (MAR) of an amplitude. In addition, the clipping-noise is also generated due to smaller rated current than the maximum allowance one of an amplifier. The noise also makes listeners uncomfortable due to a loss of the original amplitude in the clipped speech signals. It is required to re-record the speech with the proper gain level if a recorded speech was clipped. It is however necessary to apply a method for the clipping-noise suppression if it is difficult to re-record it in the situation with the speech communication systems in real time. A conventional method has been proposed for suppressing the clipping-noise by using a linear prediction model (). The method suppresses the clipping-noise by restoring clipped samples using the linear prediction with the past unclipped samples in the speech. In this method, the restoration performance is however degraded by cumulating prediction error when the clipping-noises are continuously generated in two samples or more of the speech signal. For addressing this problem, it is necessary to process a method without the past speech signals. We have therefore proposed the clipping-noise suppression method that requires no past signals on the basis of the spectral compensation (2). In this method, the spectral envelope of a target speech signal in each analysis frame is approximated to that of an original speech signal to remove the influence of the clipping-noise. In particular, the envelope on the higher frequency band which includes a static characteristic of the speaker is replaced with that of the unclipped speech signal which is prepared in advance. After that, the envelope on the lower frequency which includes a characteristic of a phoneme is approximated with Gaussian mixture models (3). {cm0306,is033080}@ed.ritsumei.ac.jp 2 {mnaka@fc,nishiura@is,yama@media}.ritsumei.ac.jp Inter-noise 204 Page of 5

Page 2 of 5 Inter-noise 204 Amplitude 3000 2000 000 600 0-600 -000-2000 Original -3000 Clipped 0 2 4 6 8 0 2 4 Time [msec] Figure Waveforms of the original and clipped speech (MAR (A c ): 600) In this paper, we evaluate the method for the clipping-noise suppression for the stationary-noisy speech in a real noisy environment. We carry out experiments to evaluate the sound quality of the speech signals that are processed by the proposed method. 2. FORMULATION OF CLIPPED SPEECH SIGNAL This section is described the effect of the clipping-noise in speech. Clipped speech loses the higher or lower amplitude when the absolute one is over the maximum allowance range (MAR). The clipping process is derived from Eq. (). A c (s(n) > A c ) s c (n) = s(n) ( s(n) A c ), () A c (s(n) < A c ) where s(n) and s c (n) are an original speech and a clipped speech signal at time n, respectively. A c indicates the MAR of the clipped speech signal. The clipping-noise is generated when the absolute value of the input speech signal s(n) exceeds the MAR A c. Figure shows an example of the clipped speech under the condition that the MAR A c is set as 600. The clipping ratio (CR) has been conventionally proposed as the evaluation index for the amount of a clipping-noise. The CR C i of the clipped speech in each frame is derived from Eq. (2). C i = N i A c, (2) N i s i (n) 2 n=0 where s i (n) is the original speech signal in the i-th frame before clipping, and N i is also the number of samples in s i (n). The CR expresses the ratio between the MAR and the root mean square of a speech signal before clipping. The CR becomes lower under the condition with the larger gain level of the clipping-noise. 3. CONVENTIONAL METHOD (LINEAR PREDICTION METHOD) A linear prediction method () has been proposed as the conventional method for the clipping-noise suppression. This method is used the linear prediction model as follows. S(n) = p i= a i s(n i) + ε(n), (3) where s(n) is the input speech signal at time n, and ε(n) is the difference between the original amplitude and the predicted amplitude of the speech signal. E[ε(n)] becomes zero under the condition that the original speech is a random signal. Equation (3) shows that the obtained amplitude S(n) is predicted by using the amplitude p of the past speech signals from s(n ) to s(n p). a i ( i p) are called prediction coefficients, and they are calculated so that the expectation value E[ε(n)] becomes the minimum. The linear prediction method Page 2 of 5 Inter-noise 204

Inter-noise 204 Page 3 of 5 Input speech frame CR estimation Phase information Spectral fine structure FFT Power spectrum Spectral analysis Spectral envelope Peaks suppression based on GMM in the LFB Compensation of envelope in the HFB Spectral synthesis IFFT Output speech frame Estimated CR CR: Clipping ratio LFB: Lower frequency band HFB: Higher frequency band Figure 2 Flowchart of the proposed method restores the clipped amplitude with the prediction coefficients which are calculated by using the unclipped speech section. In the method, the restoration performance is however degraded by cumulating prediction error when the clipping-noises are continuously generated in two samples or more of the speech signal. 4. PROPOSED METHOD This section describes a method for the suppression of the clipping-noise in an observed speech signal based on the spectral compensation. The previous study (2) has clarified some characteristics of the spectral envelope in the clipped speech. There are new some peaks in the spectral envelope of the clipped speech signal in the lower frequency band (LFB). On the other hand, the power of the clipping-noise rises and its spectral envelope becomes a flat shape in the higher frequency band (HFB). The proposed method attempts to suppress the clipping-noise by transforming the spectral envelope on each frequency band on the basis of the difference of characteristics in each LFB and HFB. Figure 2 shows the flowchart of the proposed method. 4. Estimation of the clipping ratio The CR is initially estimated for compensation of the LFB and HFB in the "CR estimation" shown in Fig. 2. The preliminary experiments have confirmed a higher correlation between the CR and the logarithmic clipping incidence (LCI). The LCI L i logarithmically shows the incidence of the clipped signals in the speech as follows. L i = log e N i D i, (4) where D i is also the number of samples whose absolute amplitudes are the same as A c in the ith analysis frame of the clipped speech signal. The LCI becomes lower under the condition with lower CR. The proposed method then estimates the CR using the LCI as follows. Ĉ i = α L i, (5) where Ĉ i is the estimated CR, and α is also the regression coefficient. As stated above, the compensation strength of the clipped speech signal is controlled on the basis of the estimated CR. 4.2 Peaks suppression of the spectral envelope in the lower frequency band In the "Peaks suppression based on GMM in LFB" shown in Fig. 2, the peaks of the spectral envelope in the LFB are controlled with the approximated ones on the basis of Gaussian mixture models (GMMs) (3) Inter-noise 204 Page 3 of 5

Page 4 of 5 Inter-noise 204 which are expressed as follows. S l (k) = M w m N(k µ m, σm) 2 (w > w 2 > > w M ), (6) m= where S l (k) is the normalized spectral envelope in the LFB, N(k µ m, σm) 2 is Gaussian function, M is the mixture number of Gaussian functions, and w m, µ m, and σm 2 are the weight, mean, and variance of each Gaussian function, respectively. The first and second formants which have large powers are approximated using two Gaussian functions with the higher weights when the spectral envelope in the LFB is approximated with GMM. In the proposed method, the spectral envelope of the clipped speech is multiplied by the peaks suppression function on the basis of the Gaussian functions with the M 2 lower weights as follows. W(k) = M m=3 [ β exp { (k µ m) 2 }] 2σ 2 m (0 < β < ), (7) where W(k) is the peaks suppression function, and β is also the suppression coefficient based on the estimated CR. The peaks generated by the clipping-noise are suppressed by multiplying the spectral envelope using the peak suppression function W(k). 4.3 Spectral compensation with the clean speech in the higher frequency band In the "Compensation of envelope in the HFB" shown in Fig. 2, the clipped spectral envelope in the HFB is compensated with that of the clean speech prepared in advance as follows. S h (k) = η S a (k) + ( η) S c (k) (0 < η < ), (8) where S h (k) is the spectral envelope in the HFB after the compensation, S a (k) is the spectral envelope of the clean speech, S c (k) is the spectral envelope of the clipped speech, and η is also the compensation coefficient on the basis of the estimated CR. The higher CR gives the smaller compensation amount. The clean spectral envelope is also prepared in each phoneme of the target speaker because the characteristics of the envelope in the HFB greatly depend on the speaker and phoneme. 5. EVALUATION The objective and subjective experiments were carried out to evaluate the performance of the clipping-noise suppression using the proposed method for the stationary-noisy speech in a noisy environment. The sound quality of the speech signals was evaluated in these experiments under the conditions that are shown in Tab.. As the objective index for evaluating the sound quality, the logarithmic spectral distance (LSD) (4) was employed and it is expressed as follows. LSD = K K k=0 ( 20log 0 S r (k) S d (k) ) 2, (9) where S r (k) and S d (k) are the spectra of an original speech and a degraded speech, respectively. k also indicates the frequency bin index. The LSD becomes higher under the condition with the higher sound quality. On the other hand, the mean opinion score (MOS) (5) for five subjects was used as the subjective index for evaluating the sound quality. The subjects evaluated how the speech signal was degraded with five grades (5: imperceptible, 4: perceptible but not annoying, 3: slightly annoying, 2: annoying, : very annoying). The experimental results are shown in Fig. 3. Horizontal axises in these two figures represent SNR between a clean speech sample and a stationary-noise, and vertical axises in Figs.3 (a) and 3 (b) represent LSD and MOS, respectively. In Fig. 3, the propose method achieved the higher LSD and MOS under the higher SNR condition (higher than 35 db). These results indicated that the clipping-noise was suppressed using the proposed method in comparison with the conventional one. On the other hand, the performance using the proposed method degraded under the lower SNR conditions. It may be caused by simultaneously suppressing the clipping-noise and the white noise when the spectral envelope of the speech is compensated by the proposed method. We considered that the suppression performance would be improved by switching the conventional and proposed methods, depending on the SNR condition. 6. CONCLUSIONS In this paper, we evaluate the method for the clipping-noise suppression for the stationary-noisy speech based on the spectral compensation in a noisy environment. We carry out evaluation experiments to evaluate Page 4 of 5 Inter-noise 204

Inter-noise 204 Page 5 of 5 Table Experimental conditions Number of speaker Two female and three male speakers Content of speech Isolated vowels (/a/, /i/, /u/, /e/, /o/) Sampling 6 khz / 6 bit Clipping ratio 0.5 FFT length 024 points Frame length 32 ms (52 points) Shift length 4 ms (64 points) Noise White noise SNR 5 60 db LSD [db] -4-6 -8-0 -2-4 -6 Clipped speech Conventional method Proposed method 5 0 5 20 25 30 35 40 45 50 55 60 SNR [db] (a) Result of objective evaluation Score Figure 3 Experimental results for noisy speech 5 4 3 2 Clipped speech Conventional method Proposed method 5 35 60 SNR [db] (b) Result of subjective evaluation Good Sound quality Bad the sound quality of the speech signal that is processed by the proposed method. As a result, we confirmed that the clipping-noise was efficiently suppressed under the lower SNR condition using the proposed method in comparison with the conventional one. In the future, we intend to propose the method by switching the conventional and proposed methods, depending on the SNR condition. ACKNOWLEDGEMENTS This work was partly supported by a Grant-in-Aid for Scientific Research funded by MEXT and a Grantin-Aid for JSPS Fellows funded by JSPS. REFERENCES. A. Dahimene, M. Noureddine and A. Azrar, A simple algorithm for the restoration of clipped speech signal, Informatica, vol. 32, pp. 83-88, 2008. 2. M. Hayakawa, M. Morise, M. Nakayama and T. Nishiura, Restoring Clipped Speech Signal Based on Spectral Transformation of Each Frequency Band, Acoustics 202, Paper Number: 4aSP0, May 202. 3. P. Zolfaghari and T. Robinson, Formant analysis using mixture of Gaussians, Proc. ICSLP, pp. 229-232, 996. 4. T. T. Vu, M. Unoki and M. Akagi, An LP-based blind model for restoring bone-conducted speech, Proc. ICCE2008, pp. 22-27, 2008. 5. ITU-T Recommendation P. 800, Methods for subjective determination of transmission quality, 996. Inter-noise 204 Page 5 of 5