PROSE: Perceptual Risk Optimization for Speech Enhancement

Size: px

Start display at page:

Download "PROSE: Perceptual Risk Optimization for Speech Enhancement"

Milo Hill
6 years ago
Views:

PROSE: Perceptual Ris Optimization for Speech Enhancement Jishnu Sadasivan and Chandra Sehar Seelamantula Department of Electrical Communication Engineering, Department of Electrical Engineering

Overview We address the problem of suppressing noise from noisy speech within a ris minimization framewor. The clean signal is estimated by minimizing an unbiased estimate of the ris function.

For input SNR greater than 5 db, the proposed algorithms outperform three benchmaring algorithms in terms of PESQ and SSNR scores.. Ris estimation principle Observation model: x n = s n + w n n =,,, N.

Ris estimation approach: Since R depends on s, we estimate R and minimize it. SURE: An unbiased estimate of the MSE under i.i.d. gaussian assumption [].

1 PROSE: Perceptual Ris Optimization for Speech Enhancement Jishnu Sadasivan and Chandra Sehar Seelamantula Department of Electrical Communication Engineering, Department of Electrical Engineering Indian Institute of Science, Bangalore 56, India s : jishnus@ece.iisc.ernet.in, chandra.sehar@ee.iisc.ernet.in Spectrum Lab. Overview We address the problem of suppressing noise from noisy speech within a ris minimization framewor. The clean signal is estimated by minimizing an unbiased estimate of the ris function. We develop unbiased estimates of perceptual distortion functions. Minimize ris estimates to obtain the optimal denoising functions. For input SNR greater than 5 db, the proposed algorithms outperform three benchmaring algorithms in terms of PESQ and SSNR scores.. Ris estimation principle Observation model: x n = s n + w n n =,,, N. Parameter estimation: Obtain an estimate ŝ, of the (non-random) parameter that minimizes the ris: R = E {d (s, ŝ)}, d measures the closeness between s and ŝ. Ris estimation approach: Since R depends on s, we estimate R and minimize it. SURE: An unbiased estimate of the MSE under i.i.d. gaussian assumption []. Our contribution: Under the assumption a priori SNR is high and additive noise is a truncated gaussian, we develop perceptual ris estimates. Perceptual ris estimate is minimized to obtain the optimum shrinage estimator. 3. Perceptual ris estimation Itaura-Saito distortion: R IS := E d IS (s, ŝ ) w < x where ŝ d IS (s, ŝ )=ŝ log s s = ŝ w log (ŝ ) + log (s ) x x n = ŝ w log (ŝ ) + log (s ). x x n= Shrinage estimator: ŝ = a x Truncating the series beyond n = yields R IS n= E a w n x n E {log (a x )} + log (s ). Generalized Stein s Lemma: Let W be a real random variable with p.d.f p (w; c, c, σ) = πσk exp w σ { c σ<w<c σ} where K= c σ exp u πσ c σ σ du and let f : R R be an n-fold indefinite integral of the Lebesgue measurable function f (n), which is the n th derivative of f. Suppose also that E W (n ) f () (W) <, c σ, c σ>>σ, and f () (W) belongs to a class of functions such that σ f () (w)p (w; c, c, σ) Then, c σ c σ, =,,, n. E{W n f (W)} σ E{f (W)W n } + σ (n )f (W) W n }. Using Lemma, the R IS is a R IS = E + 6 σ6 x σ8 x 8 The unbiased estimate of R IS is ˆR IS = a + 6 σ6 x σ8 x 8 log(a x ) log(s ). log(a x ) log(s ). Differentiating R IS with respect to a and equating to zero, we get that a,opt = + 6 ξ ξ where ξ = x σ. Table : Optimal shrinage parameters corresponds to different perceptual ris estimate, where [x] + = max(, x). Ris d(s, ŝ ) a,opt MSE (ŝ s ) ξ WE IS-II ŝ s log ŝ.5 exp.75 s ξ ξ (ŝ s ) s log ŝ s s + ŝ ŝ s s + ŝ ŝ s s p Implementation details: + ξ ξ ξ + ξ + 8 ξ ξ 3 + ξ ξ + ξ ξ ξ ξ ξ + 3 ξ ξ + ξ ξ + We apply shrinage estimator in DCT domain. Framewise processing: Frame length = ms, 75% Overlap, Fs=8 Hz. Benchmaring denoising algorithms: [3], LMSE [], and BNMF [5].. Performance Comparison Results averaged over different speech files and 5 different noise realizations (NOIZEUS database) SSNR GAIN (db) PESQ GAIN SSNR GAIN (db) 8 6 White noise MSE WE IS IS II BNMF LMSE 5 5 INPUT SNR (db) MSE.5 WE..35 IS IS II.3 BNMF.5 LMSE 5 5 INPUT SNR (db) 5 6 Train noise MSE WE IS IS II BNMF LMSE 5 5 INPUT SNR (db) 5 + PESQ GAIN MSE. WE.5 IS. IS II.5 BNMF LMSE INPUT SNR (db) Figure : Performance comparison of the denoising algorithms. FREQUENCY (Hz) FREQUENCY (Hz) FREQUENCY (Hz) 3 Clean speech 6 TIME (s) 3 6 TIME (s) 3 BNMF 6 TIME (s) db db db FREQUENCY (Hz) FREQUENCY (Hz) 3 Noisy speech 6 TIME (s) 3 IS-II 6 TIME (s) 6 TIME (s) Figure : Spectrograms of denoised speech signals where noise corrupted is train noise with db input SNR. Demo available online at FREQUENCY (Hz) 5. Conclusion Introduced the notion of ris estimation for single-channel speech enhancement. Proposed unbiased estimates for perceptual distortion functions. Minimize ris estimates to obtain the optimum denoising functions. For SNR greater than 5 db, the proposed approach resulted in better denoising performance than the benchmaring techniques. 6. References [] C. M. Stein, Estimation of the mean of a multivariate normal distribution, Ann. Stat., vol. 9, no. 6, pp. 35 5, Nov. 98. [] R. M. Gray, A. Buzo, A. H. Gray, Jr., and Y. Matsuyama, Distortion measures for speech processing, IEEE Trans. Acoust. Speech Sig. Proc., vol. ASSP-8, pp , Aug. 98. [3] P. Scalart, and J. V. Filho, Speech enhancement based on a priori signal to noise estimation, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol., pp , May [] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-squared error log-spectral amplitude estimator, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no., pp. 3 5, Apr [5] N. Mohammadiha, P. Smaragdis, and A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Trans. Audio, Speech, Lang. Process., vol., no., pp. 5, Oct. 3. [6] ITU-T Rec. P.86, Perceptual Evaluation Of Speech Quality (PESQ), An objective method for end-to-end speech quality assessment of narrowband telephone networs and speech codecs, International Telecommunication Union, Feb db db db

2 PROSE: Perceptual Ris Optimization for Speech Enhancement Jishnu S. Supervisor : Dr. Chandra Sehar Seelamantula Department of Electrical Communication Engineering Department of Electrical Engineering Indian Institute of Science Bangalore 56, India jishnus@ece.iisc.ernet.in April 7, 7

3 Outline Problem statement SURE Perceptual ris estimation Perceptual ris optimization for speech enhncement Conclusions

4 Problem statement Problem statement Consider samples of a signal s n, distorted by additive random noise w n. The observation model is given by: x n = s n + w n. n =, Goal: To estimate s n from x n, by minimizing a suitable distortion metric.

5 Ris estimation Ris estimation Conventional method : Obtain an estimate of s by minimizing the distortion function (ris) between estimate ŝ = h(x) and s, ŝ =argmine{d (h (x), s)}, h(x) R where d measure the closeness between h(x) and s. Direct minimization of cost requires the nowledge of underlying clean signal. Ris Estimation : Minimize an unbiased estimate of R to obtain ŝ.

6 Ris estimation Basic SURE formulation Basic SURE formulation Consider MSE R = E{d (h(x), s)} = E (h (x) s) = E s E {h (x) s} + E h (x). where x N(s,σ ). SURE is an unbiased estimate of MSE obtained using Stein s lemma. (Stein, 98) Let Y be a real random variable N (,σ )andleth : R R be an indefinite integral of the Lebesgue measurable function h,essentially the derivative of h. Suppose also that E Y { h (Y ) } <. Then E Y {Yh(Y )} = σ E Y h (Y )

7 SURE Ris estimation SURE Using Stein s lemma: E{h (x) s} = E{h (x) x} σ E{h (x)}. Unbiased estimate of R becomes ˆR = s h (x) x +σ h (x)+h (x) i.e. R = E[ ˆR]. Minimize ˆR to obtain h (x). Clean speech DCT coefficient estimate, h (x )=a x,where a [, ] and x is noisy DCT coefficient. Optimum pointwise shrinage parameter a,opt =argmin a ˆR a,opt = σ x where [x] + =max(, x). +

8 Perceptual Ris Optimization for Speech Enhancement Perceptual ris estimation Perceptual distortion functions: Itaura-Saito distortion, hyperbolic-cosine (cosh) distortion, weighted cosh distortion, etc. []. Practical noise types are bounded, hence one can model the noise using a truncated Gaussian distribution. Assuming observation distribution is truncated gaussian and SNR is high, we propose ris estimate for perceptual distortion functions. Minimize perceptual ris estimates to obtain optimum shrinage estimators.

9 Perceptual Ris Optimization for Speech Enhancement Itaura Saito(IS) Distortion Perceptual Ris Estimation R IS := E d IS (s, ŝ ) w < x where ŝ d IS (s, ŝ )=ŝ log s s = ŝ w log (ŝ )+log (s ) x x n = ŝ w log (ŝ )+log (s ). x x n= Truncating the series beyond n= using ŝ = a x yields w n R IS E a x n E{log (a x )} + log (s ). n=

10 Perceptual Ris Optimization for Speech Enhancement Lemma Perceptual Ris Estimation Let W be a real random variable with p.d.f p (w; c, c,σ)= exp w πσk σ where K= πσ c σ c σ exp u σ { c σ<w<c σ} du and let f : R R be an n-fold indefinite integral of the Lebesgue measurable function f (n),whichisthe n th derivative of f. Suppose also that E W (n ) f () (W ) <, c σ, c σ>>σ,andf () (W ) belongs to a class of functions such that σ f () (w)p (w; c, c,σ) c σ c σ, =,,, n. Then E{W n f (W )} σ E{f (W )W n } + σ (n )E{f (W ) W n }.

11 Perceptual Ris Optimization for Speech Enhancement Perceptual Ris Estimation Using Lemma, the ris R IS is R IS = E a + 6 σ6 x σ8 x 8 log(a x ) log(s ). The unbiased estimate of R IS is ˆR IS = a + 6 σ6 x σ8 x 8 log(a x ) log(s ). Differentiating R IS with respect to a and equating to zero, we get that a,opt = + 6 ξ ξ where ξ = x σ.

12 Perceptual Ris Optimization for Speech Enhancement Optimum shrinage parameter Table: Optimal shrinage parameters for different perceptual ris estimates. ris d(s, ŝ ) a opt log ŝ exp WE IS-II ŝ s s (ŝ s ) s log ŝ s s ŝ + ŝ s s + ŝ ŝ s s p.5.75 ξ ξ ξ 3 + ξ ξ + 8 ξ ξ + ξ ξ + 36 ξ 3 + ξ ξ ξ ξ ξ + 3 ξ ξ + ξ ξ + + where ξ = x σ.

13 Perceptual Ris Optimization for Speech Enhancement Performance Evaluation SSNR GAIN (db) MSE WE IS IS II BNMF LMSE INPUT SNR (db) SSNR GAIN (db) MSE WE IS IS II BNMF LMSE INPUT SNR (db).7.5 PESQ GAIN MSE.5 WE. IS.35 IS II.3 BNMF LMSE INPUT SNR (db) PESQ GAIN MSE. WE.5 IS IS II. BNMF.5 LMSE INPUT SNR (db) White noise Train noise Figure: Performance comparison of different denoising algorithms.

14 Conclusion Conclusion Introduced the notion of ris estimation for single-channel speech enhancement. We proposed ris estimates for perceptual distortion metrics and minimize to obtain the optimum denoising function. For SNR greater than 5 db, the proposed approach resulted in better denoising performance than the benchmaring techniques.

15 References References [] C.M Stein, Estimation of the mean of a multivariate normal distribution, Ann. Stat., vol. 9, no. 6, pp. 35-5, Nov. 98. [] R. M. Gray, A. Buzo, A. H. Gray, Jr., and Y. Matsuyama, Distortion measures for speech processing, IEEE Trans. Acoust. Speech Sig. Proc., vol. ASSP-8, pp , Aug. 98. [3] P. Scalart, and J. V. Filho, Speech enhancement based on a priori signal to noise estimation, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol., pp , May [] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-squared error log-spectral amplitude estimator, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no., pp. 3 5, Apr [5] N. Mohammadiha, P. Smaragdis, and A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Trans. Audio, Speech, Lang. Process., vol., no., pp. 5, Oct. 3.

16 THANK YOU THANK YOU

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr