A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan USC SAIL Lab INTERSPEECH Articulatory Data Acquisition and Processing Special Session Aug. 27, 2013
Motivation Use of MRI for Speech Research Non-invasive method for imaging the vocal tract. View structural details of the vocal tract. 2
Motivation Problem: MRI scanners produce high-energy broadband noise. Goal: suppress MRI noise in audio recordings without distorting the speech. 3
MRI Background Time-varying electromagnetic field produced by a pulse sequence. Causes gradient coils to vibrate. S. Narayanan, et al, An approach to real-time magnetic resonance imaging for speech production, J. Acoust. Soc. Am., 115:1771-1776, 2004. 4
MRI Background seq1 pulse sequence Currently used for acquiring real-time MRI of speech Periodic sequence f 0 = 1 repetition time number of interleaves How often you sample the image in the Fourier domain Number of samples used to reconstruct an image 5
MRI Background Acquisition timing used in upper airway imaging S. Narayanan, et al, An approach to real-time magnetic resonance imaging for speech production, J. Acoust. Soc. Am., 115:1771-1776, 2004. 7
MRI Background Golden ratio pulse sequence (GR) Retrospectively set the temporal resolution. Periodic with a very long period. Y. Kim et al., Flexible retrospective selection of temporal resolution in realtime speech MRI using a golden-ratio spiral view order, Magnetic Resonance in Medicine, 65(5): 1365-1371, 2011. 8
Comparing seq1 vs. GR a) Reconstruction with seq1 (13 interleaves) with 78 ms resolution. b) Reconstruction with GR (34 interleaves) with selection of 48 ms resolution. Clearer view of fast-moving articulators. Less artifacts and aliasing in image. Y. Kim et al., Golden-ratio spiral imaging with gradient acoustic noise cancellation: application to realtime MRI of fluent speech, in Proc. Int. Soc. Magnetic Resonance in Medicine, 2012 9
Removing MRI Noise Least-mean squares filter (LMS-1) for noise removal Noisy signal h[n] - + MRI noise Estimated speech Spectrogram of MRI noise 10
Removing MRI Noise Use mathematical model of MRI noise as reference signal (LMS-2) amplitude Noisy signal h[n] - + Mathematical model of MRI noise f 0 2f 0 3f 0 f Estimated speech Noise components in frequency domain E. Bresch et al., Synchronized and Noise-Robust Audio Recordings During Realtime Magnetic Resonance Imaging Scans, J. Acoustical Society of America, 120(4): 1791-1794, 2006. 11
Limitations of Current Algorithm Does not work well for sequences with large period (small f 0 ). Cannot handle aperiodic sequences. Develop a denoising algorithm that does not rely on periodicity of pulse sequence. 12
Overview of Approach Noisy signal PLCA Wavelets Denoised signal PLCA: Probabilistic Latent Component Analysis: source separation technique Wavelets: Signal denoising technique 13
PLCA Variant of non-negative matrix factorization (NMF) V W H Spectrogram Dictionary Time activation weights 14
PLCA Does source separation by learning a dictionary and activation weights for each source. = Noise dictionary Time activation weights Noise spectrogram = Speech dictionary Time activation weights Speech spectrogram 15
PLCA Learn noise dictionary Look at spectrogram frame How much of the spectrum is explained by the noise spectrum? Not much A lot Update noise activation weights Learn/update speech and noise activation weights Learn/update speech dictionary 16
PLCA Results PLCA removes noise in silence regions (as expected). PLCA reduces noise in speech regions. Minimal distortion of speech. 17
Wavelets More flexible than Fourier analysis. Fourier: F jω = f t e jωt dt Wavelet: F a, b = f t ψ a,b t dt complex exponential ψ a,b t = ψ t b a wavelet Meyer Morlet Mexican hat 18
Wavelets Able to choose time-frequency resolution. more flexible than STFT. f f t 19 t
Wavelet Thresholding Idea: Find coefficients for the noise and set them to zero. λ 22
Wavelet Thresholding λ j = σ 2 N j 2 ζ j + ζ 2 j ln 1 + 1 ζ j ζ j ζ j = σ2 Xj σ2 Nj Variance of noisy signal in subband j Variance of noise in subband j Takes advantage of having a noise estimate Threshold is adaptive S. Tabibian et al., A New Wavelet Thresholding Method for Speech Enhancement Based on Symmetric Kullback-Leibler Divergence, in 14th Int. Computer Society of Iran Computer Conf. 23
Wavelets Compute wavelet coefficients for noise Compute wavelet coefficients for noisy signal Calculate wavelet threshold Reconstruct denoised signal from thresholded coefficients Soft-threshold the wavelet coefficients 24
Results 55 interleaves, 6.004 ms TR Noise suppression (db) results Proposed LMS-1 LMS-2 seq1 19.27 18.01 18.79 GR 24.1 18.37 9.17 25
Aurora 5 Digits Log-likelihood Ratio: models mismatch between spectral envelopes of clean and denoised speech signals. d LLR a s, a s T R s a a s s = log 10 a T s R s a s Autocorrelation matrix of clean speech LPC coefficients LPC coefficients of denoised speech LPC coefficients of clean speech Distortion variance Clean speech σ d = 1 s n L 2 2 Signal length Denoised speech V. R. Ramachandran et al., Objective and Subjective Evaluation of Adaptive Speech Enhancement Methods for Functional MRI, J. Magnetic Resonance Imaging. 26
Results 55 interleaves, 6.004 ms TR 27
Results Metric Sequence Proposed LMS-1 LMS-2 Noise suppression (db) LLR Distortion variance ( 10 5 ) seq1 30.23 32.55 26.53 GR 24.14 27.88 10.91 seq1 0.17 0.4 0.42 GR 0.11 0.41 0.33 seq1 7.52 34.8 21.4 GR 9.56 35.8 37.7 Proposed method improves noise suppression over LMS-2 for GR sequence noise. Less distortion than LMS methods. 28
Listening Test Results Environment TIMIT Aurora Sequence Algorithm Clean Proposed LMS-1 LMS-2 Noisy seq1 2 3 1 4 GR 1 2 3 4 seq1 1 3 4 2 5 GR 1 2 3 4 5 Presented sets of TIMIT sentences and Aurora digits to listeners. Each set contained a noisy audio clip, 3 denoised versions, and a clean version for Aurora. Listeners ranked each clip within a set from 1 (best) to 4 or 5 (worst). 29
Conclusion Combined PLCA and wavelets. Achieved 24 db noise reduction 15 db improvement over LMS-2. Low speech distortion: key for analysis/modeling. 30
Future Work Improve MRI noise modeling Room transfer function Real-time implementation Applications beyond MRI Cell phone, biometrics, 31
Thank you! We would like to acknowledge the support of NIH Grant DC007124. 32
USC-TIMIT: A MULTIMODAL ARTICULATORY DATA CORPUS FOR SPEECH RESEARCH 10 American English talkers (5M, 5F). Real time MRI (5 speakers also with EMA) and synchronized audio. 460 sentences each (>20 minutes) Freely available for speech research. WEB-LINK (with download info): http://sail.usc.edu/span/usc-timit/ SAIL homepage: http://sail.usc.edu Narayanan et al. (2011). A Multimodal Real-Time MRI Articulatory Corpus for Speech Research. InterSpeech.