Bandwidth Extension for Speech Enhancement

Similar documents
Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Bandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding?

Improving Sound Quality by Bandwidth Extension

Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation

EE482: Digital Signal Processing Applications

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

EFFICIENT SUPER-WIDE BANDWIDTH EXTENSION USING LINEAR PREDICTION BASED ANALYSIS-SYNTHESIS. Pramod Bachhav, Massimiliano Todisco and Nicholas Evans

Can binary masks improve intelligibility?

Transcoding of Narrowband to Wideband Speech

Speech Enhancement Using a Mixture-Maximum Model

VQ Source Models: Perceptual & Phase Issues

BANDWIDTH EXTENSION OF NARROWBAND SPEECH BASED ON BLIND MODEL ADAPTATION

COM 12 C 288 E October 2011 English only Original: English

Overview of Code Excited Linear Predictive Coder

An audio watermark-based speech bandwidth extension method

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

Enhanced Waveform Interpolative Coding at 4 kbps

ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Speech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions

Gerhard Schmidt / Tim Haulick Recent Tends for Improving Automotive Speech Enhancement Systems. Geneva, 5-7 March 2008

Using RASTA in task independent TANDEM feature extraction

Perceptual wideband speech and audio quality measurement. Dr Antony Rix Psytechnics Limited

Speech Enhancement Based On Noise Reduction

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Bandwidth Efficient Mixed Pseudo Analogue-Digital Speech Transmission

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

The Channel Vocoder (analyzer):

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

DERIVATION OF TRAPS IN AUDITORY DOMAIN

A Spectral Conversion Approach to Single- Channel Speech Enhancement

Auditory modelling for speech processing in the perceptual domain

NOISE ESTIMATION IN A SINGLE CHANNEL

Speech Quality Assessment for Wideband Communication Scenarios

Speech Synthesis using Mel-Cepstral Coefficient Feature

Adaptive Filters Application of Linear Prediction

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Speech Coding in the Frequency Domain

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

core signal feature extractor feature signal estimator adding additional frequency content frequency enhanced audio signal 112 selection side info.

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

ON THE POTENTIAL FOR ARTIFICIAL BANDWIDTH EXTENSION OF BONE AND TISSUE CONDUCTED SPEECH: A MUTUAL INFORMATION STUDY

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Nonuniform multi level crossing for signal reconstruction

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

HCS 7367 Speech Perception

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

SGN Audio and Speech Processing

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

Chapter IV THEORY OF CELP CODING

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Recent Advances in Acoustic Signal Extraction and Dereverberation

Enhancing 3D Audio Using Blind Bandwidth Extension

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Audio Signal Compression using DCT and LPC Techniques

Speech Enhancement in the. Modulation Domain

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B.

Speech/Music Discrimination via Energy Density Analysis

EC209 - Improving Signal-To-Noise Ratio (SNR) for Optimizing Repeatable Auditory Brainstem Responses

6/29 Vol.7, No.2, February 2012

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Technical Report Speech and multimedia Transmission Quality (STQ); Speech samples and their usage for QoS testing

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

A New Framework for Supervised Speech Enhancement in the Time Domain

Adaptive Noise Reduction Algorithm for Speech Enhancement

BLIND BANDWIDTH EXTENSION USING K-MEANS AND SUPPORT VECTOR REGRESSION. Chih-Wei Wu 1 and Mark Vinton 2

Subjective Voice Quality Evaluation of Artificial Bandwidth Extension: Comparing Different Audio Bandwidths and Speech Codecs

Effect of bandwidth extension to telephone speech recognition in cochlear implant users

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Single channel noise reduction

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Auditory System For a Mobile Robot

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

HUMAN speech is frequently encountered in several

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

651 Analysis of LSF frame selection in voice conversion

SGN Audio and Speech Processing

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Wideband Speech Coding & Its Application

Transcription:

Bandwidth Extension for Speech Enhancement F. Mustiere, M. Bouchard, M. Bolic University of Ottawa Tuesday, May 4 th 2010 CCECE 2010: Signal and Multimedia Processing

1 2 3 4

Current Topic 1 2 3 4

Context Speech enhancement or de-noising now found in several applications (speech transmission, recognition, hearing aids, etc.) Noisy speech has frequency dependent SNR Higher SNR in lowband (0-5 khz in this work), lower SNR in highband (5-10 khz in this work)

Context Speech enhancement in highband: because of lower SNR, higher risk of damaging speech (i.e. distortion) when attempting to remove noise Moreover, total complexity of lowband + highband enhancement can be significantly more costly than lowband enhancement only

Objectives To illustrate that a simple Bandwidth Extension scheme (BWE, details in next slide) can be both: a competitive speech enhancement or de-noising tool in the highband (as good as fairly advanced schemes) a way to reduce the complexity of advanced enhancement schemes, by computing enhancement only in the lowband (less bins or lower model order) Using BWE could also allow to use a more complex lowband enhancement scheme, using the computations freed by the BWE scheme

Bandwidth extension Some background on classic bandwidth extension: production of missing frequency bands with or without additional information generic audio bandwidth extension versus source-filter model-based speech bandwidth extension

Main techniques in classical BWE Excitation signal extension: using non-linearities on time sequence using spectral shifting or modulation techniques using artificial function generators (e.g. harmonic sines) Spectral envelope extension: using codebooks from parameters (LPC, cepstral coeffs.) using neural network mapping using linear mapping (sometimes combined with codebooks) using Bayesian estimation methods (GMMs, HMMs)

Bandwidth extension BWE and spectral band replication (SBR) techniques are found in several speech codecs (GSM full-rate, AMR-WB, AMR-WB+, G.729EV/G.729.1) and audio codecs (MP3pro, Enhanced AACplus, HE-AAC) Different frequency bands present different challenges, e.g. bandwidth extension 300Hz-3.4kHz to 0Hz-5.5kHz is different from 0Hz-11kHz to 0Hz-22kHz BWE has received little attention in the literature so far as an approach for speech enhancement or denoising

Current Topic 1 2 3 4

Application of BWE to Speech Enhancement In contrast with classical model-based BWE, here we have access to a coarse envelope estimate: the noisy signal envelope. In our particular context (0-10 khz speech enhancement; SNR > 10 db; non-synthetic recorded noise) it was found that if a good narrowband excitation signal can be obtained and extended, then the spectral envelope plays a fairly minor role in the resulting quality.

Application of BWE to Speech Enhancement Thus, for simplicity/efficiency, in this work LPC coefficients of noisy fullband spectral envelope are used: for predicting the enhanced lowband excitation for synthesizing the fullband enhanced signal. For the excitation signal extension, simple spectral shifting is used (spectral band replication, spectral folding).

Application of BWE to Speech Enhancement

Summary of method 1 Obtain analysis/synthesis filter by LPC analysis of wb noisy signal z(k) 2 From z(k), downsample to nb signal z n (k) 3 Enhance downsampled z n (k), upsample to ˆx n (k) 4 Filter ˆx n (k) with analysis filter to get nb enhanced excitation ê n (k) 5 Bandwidth extend ê n (k) by modulation to get ê w (k) 6 Filter ê w (k) with synthesis filter to obtain wb enhanced speech

Current Topic 1 2 3 4

Experimental setup Speech content from TIMIT database (several male and female speakers), upsampled to 20 khz Noise from the NOISEX-92 database (babble, factory, tank, car), at different levels i.e. SNRs. Assessment using a mixture of SNR, speech quality and speech intelligibility objective measures (SNR, ASNR, CSII, WPESQ, Csig, Cbak, Covl)

Experimental setup Assessment of subjective quality using informal listening tests Three different fairly advanced speech enhancement algorithms were used, each in fullband and BWE modes: Kalman + EM, multi-band spectral subtractive algorithm, generalized subspace approach.

Results In large majority of cases, objectives measures results using the BWE approach were better that those using fullband enhancement, for either low or high input SNR. To fully quantify perceptual improvement would require more formal listening tests, but this is not the point here. Informal listening tests easily confirm that the BWE approach is at least perceptually similar to the fullband enhancement case, but at lower cost or complexity.

Sound demos, for Kalman + EM algorithm, 5 db input SNR Stop Noisy Enhanced wb Enhanced nb Enhanced nb + BWE Factory

Current Topic 1 2 3 4

Simple BWE-based speech enhancement can reduce complexity of fairly advanced enhancement algorithms, with equivalent quality Further quality improvements could likely be obtained by allocating the freed resources on improved narrowband enhancement If reduction of complexity is not the main factor, an alternative would be to seek an even better enhancement performance by using a more complex BWE scheme than the one used here

Thank you. Questions? Frederic Mustiere, Martin Bouchard, Miodrag Bolic {mustiere,bouchard,mbolic}@site.uottawa.ca