A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

Similar documents
A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis

Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization

Improved Depiction of Tissue Boundaries in Vocal Tract Real-time MRI using Automatic Off-resonance Correction

Recording and post-processing speech signals from magnetic resonance imaging experiments

NOISE ESTIMATION IN A SINGLE CHANNEL

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

EE482: Digital Signal Processing Applications

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Speech Synthesis using Mel-Cepstral Coefficient Feature

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Audio Imputation Using the Non-negative Hidden Markov Model

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Auditory modelling for speech processing in the perceptual domain

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Wavelet Speech Enhancement based on the Teager Energy Operator

CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR

Applications of Music Processing

Epoch Extraction From Emotional Speech

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Speaker and Noise Independent Voice Activity Detection

EC 2301 Digital communication Question bank

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Digital Speech Processing and Coding

Voiced/nonvoiced detection based on robustness of voiced epochs

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Quality Estimation of Alaryngeal Speech

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Almost Perfect Reconstruction Filter Bank for Non-redundant, Approximately Shift-Invariant, Complex Wavelet Transforms

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Complex Sounds. Reading: Yost Ch. 4

Advanced audio analysis. Martin Gasser

Speech Enhancement for Nonstationary Noise Environments

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

RECENTLY, there has been an increasing interest in noisy

Speech Signal Enhancement Techniques

ICA & Wavelet as a Method for Speech Signal Denoising

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

6.S02 MRI Lab Acquire MR signals. 2.1 Free Induction decay (FID)

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

PROSE: Perceptual Risk Optimization for Speech Enhancement

Speech Enhancement using Wiener filtering

SGN Audio and Speech Processing

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Single-channel Mixture Decomposition using Bayesian Harmonic Models

Can binary masks improve intelligibility?

ROTATIONAL RESET STRATEGY FOR ONLINE SEMI-SUPERVISED NMF-BASED SPEECH ENHANCEMENT FOR LONG RECORDINGS

ROBUST echo cancellation requires a method for adjusting

Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

CS 188: Artificial Intelligence Spring Speech in an Hour

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

A New Framework for Supervised Speech Enhancement in the Time Domain

Modulation Domain Spectral Subtraction for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

WIND NOISE REDUCTION USING NON-NEGATIVE SPARSE CODING

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Speech Enhancement Using a Mixture-Maximum Model

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

S PG Course in Radio Communications. Orthogonal Frequency Division Multiplexing Yu, Chia-Hao. Yu, Chia-Hao 7.2.

Spectral Methods for Single and Multi Channel Speech Enhancement in Multi Source Environment

Original Research Articles

Speech Enhancement Based On Noise Reduction

Wavelet Based Adaptive Speech Enhancement

Speech Coding in the Frequency Domain

Practical Applications of the Wavelet Analysis

A New Delay-less Sub-band Adaptive Kalman Filtering Algorithm for Speech Enhancement on Active Noise Control Systems

Communications Theory and Engineering

EC 554 Data Communications

Estimation of Non-stationary Noise Power Spectrum using DWT

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Hierarchical spike coding of sound

A New Method to Remove Noise in Magnetic Resonance and Ultrasound Images

Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication

L19: Prosodic modification of speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

VQ Source Models: Perceptual & Phase Issues

Enhancement of Speech in Noisy Conditions

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH AND SPECTRAL ANALYSIS

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION

Image Quality/Artifacts Frequency (MHz)

IN REVERBERANT and noisy environments, multi-channel

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Motion Estimation from a Single Blurred Image

Digital Signal Processing

Drum Transcription Based on Independent Subspace Analysis

Transcription:

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan USC SAIL Lab INTERSPEECH Articulatory Data Acquisition and Processing Special Session Aug. 27, 2013

Motivation Use of MRI for Speech Research Non-invasive method for imaging the vocal tract. View structural details of the vocal tract. 2

Motivation Problem: MRI scanners produce high-energy broadband noise. Goal: suppress MRI noise in audio recordings without distorting the speech. 3

MRI Background Time-varying electromagnetic field produced by a pulse sequence. Causes gradient coils to vibrate. S. Narayanan, et al, An approach to real-time magnetic resonance imaging for speech production, J. Acoust. Soc. Am., 115:1771-1776, 2004. 4

MRI Background seq1 pulse sequence Currently used for acquiring real-time MRI of speech Periodic sequence f 0 = 1 repetition time number of interleaves How often you sample the image in the Fourier domain Number of samples used to reconstruct an image 5

MRI Background Acquisition timing used in upper airway imaging S. Narayanan, et al, An approach to real-time magnetic resonance imaging for speech production, J. Acoust. Soc. Am., 115:1771-1776, 2004. 7

MRI Background Golden ratio pulse sequence (GR) Retrospectively set the temporal resolution. Periodic with a very long period. Y. Kim et al., Flexible retrospective selection of temporal resolution in realtime speech MRI using a golden-ratio spiral view order, Magnetic Resonance in Medicine, 65(5): 1365-1371, 2011. 8

Comparing seq1 vs. GR a) Reconstruction with seq1 (13 interleaves) with 78 ms resolution. b) Reconstruction with GR (34 interleaves) with selection of 48 ms resolution. Clearer view of fast-moving articulators. Less artifacts and aliasing in image. Y. Kim et al., Golden-ratio spiral imaging with gradient acoustic noise cancellation: application to realtime MRI of fluent speech, in Proc. Int. Soc. Magnetic Resonance in Medicine, 2012 9

Removing MRI Noise Least-mean squares filter (LMS-1) for noise removal Noisy signal h[n] - + MRI noise Estimated speech Spectrogram of MRI noise 10

Removing MRI Noise Use mathematical model of MRI noise as reference signal (LMS-2) amplitude Noisy signal h[n] - + Mathematical model of MRI noise f 0 2f 0 3f 0 f Estimated speech Noise components in frequency domain E. Bresch et al., Synchronized and Noise-Robust Audio Recordings During Realtime Magnetic Resonance Imaging Scans, J. Acoustical Society of America, 120(4): 1791-1794, 2006. 11

Limitations of Current Algorithm Does not work well for sequences with large period (small f 0 ). Cannot handle aperiodic sequences. Develop a denoising algorithm that does not rely on periodicity of pulse sequence. 12

Overview of Approach Noisy signal PLCA Wavelets Denoised signal PLCA: Probabilistic Latent Component Analysis: source separation technique Wavelets: Signal denoising technique 13

PLCA Variant of non-negative matrix factorization (NMF) V W H Spectrogram Dictionary Time activation weights 14

PLCA Does source separation by learning a dictionary and activation weights for each source. = Noise dictionary Time activation weights Noise spectrogram = Speech dictionary Time activation weights Speech spectrogram 15

PLCA Learn noise dictionary Look at spectrogram frame How much of the spectrum is explained by the noise spectrum? Not much A lot Update noise activation weights Learn/update speech and noise activation weights Learn/update speech dictionary 16

PLCA Results PLCA removes noise in silence regions (as expected). PLCA reduces noise in speech regions. Minimal distortion of speech. 17

Wavelets More flexible than Fourier analysis. Fourier: F jω = f t e jωt dt Wavelet: F a, b = f t ψ a,b t dt complex exponential ψ a,b t = ψ t b a wavelet Meyer Morlet Mexican hat 18

Wavelets Able to choose time-frequency resolution. more flexible than STFT. f f t 19 t

Wavelet Thresholding Idea: Find coefficients for the noise and set them to zero. λ 22

Wavelet Thresholding λ j = σ 2 N j 2 ζ j + ζ 2 j ln 1 + 1 ζ j ζ j ζ j = σ2 Xj σ2 Nj Variance of noisy signal in subband j Variance of noise in subband j Takes advantage of having a noise estimate Threshold is adaptive S. Tabibian et al., A New Wavelet Thresholding Method for Speech Enhancement Based on Symmetric Kullback-Leibler Divergence, in 14th Int. Computer Society of Iran Computer Conf. 23

Wavelets Compute wavelet coefficients for noise Compute wavelet coefficients for noisy signal Calculate wavelet threshold Reconstruct denoised signal from thresholded coefficients Soft-threshold the wavelet coefficients 24

Results 55 interleaves, 6.004 ms TR Noise suppression (db) results Proposed LMS-1 LMS-2 seq1 19.27 18.01 18.79 GR 24.1 18.37 9.17 25

Aurora 5 Digits Log-likelihood Ratio: models mismatch between spectral envelopes of clean and denoised speech signals. d LLR a s, a s T R s a a s s = log 10 a T s R s a s Autocorrelation matrix of clean speech LPC coefficients LPC coefficients of denoised speech LPC coefficients of clean speech Distortion variance Clean speech σ d = 1 s n L 2 2 Signal length Denoised speech V. R. Ramachandran et al., Objective and Subjective Evaluation of Adaptive Speech Enhancement Methods for Functional MRI, J. Magnetic Resonance Imaging. 26

Results 55 interleaves, 6.004 ms TR 27

Results Metric Sequence Proposed LMS-1 LMS-2 Noise suppression (db) LLR Distortion variance ( 10 5 ) seq1 30.23 32.55 26.53 GR 24.14 27.88 10.91 seq1 0.17 0.4 0.42 GR 0.11 0.41 0.33 seq1 7.52 34.8 21.4 GR 9.56 35.8 37.7 Proposed method improves noise suppression over LMS-2 for GR sequence noise. Less distortion than LMS methods. 28

Listening Test Results Environment TIMIT Aurora Sequence Algorithm Clean Proposed LMS-1 LMS-2 Noisy seq1 2 3 1 4 GR 1 2 3 4 seq1 1 3 4 2 5 GR 1 2 3 4 5 Presented sets of TIMIT sentences and Aurora digits to listeners. Each set contained a noisy audio clip, 3 denoised versions, and a clean version for Aurora. Listeners ranked each clip within a set from 1 (best) to 4 or 5 (worst). 29

Conclusion Combined PLCA and wavelets. Achieved 24 db noise reduction 15 db improvement over LMS-2. Low speech distortion: key for analysis/modeling. 30

Future Work Improve MRI noise modeling Room transfer function Real-time implementation Applications beyond MRI Cell phone, biometrics, 31

Thank you! We would like to acknowledge the support of NIH Grant DC007124. 32

USC-TIMIT: A MULTIMODAL ARTICULATORY DATA CORPUS FOR SPEECH RESEARCH 10 American English talkers (5M, 5F). Real time MRI (5 speakers also with EMA) and synchronized audio. 460 sentences each (>20 minutes) Freely available for speech research. WEB-LINK (with download info): http://sail.usc.edu/span/usc-timit/ SAIL homepage: http://sail.usc.edu Narayanan et al. (2011). A Multimodal Real-Time MRI Articulatory Corpus for Speech Research. InterSpeech.