Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation

Size: px

Start display at page:

Download "Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation"

Maurice Fox
6 years ago
Views:

Spectral Envelope Estimation Johannes Abel and Tim Fingscheidt

1 Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation Johannes Abel and Tim Fingscheidt Institute for Communications Technology, Technische Universität Braunschweig

2 We Need More Acoustical Bandwidth! Problem: Speech quality and intelligibility suffers from limited acoustical bandwidth Conventional narrowband (NB) telephony call (acoustic bandwidth: 0.3<f<3.4 khz) Speech quality: 3.2/5.0 Mean opinion score (MOS) points Intelligibility: 90% (Consonant-vowel-consonant test) Wideband (WB) telephony call with acoustic bandwidth of 0.05<f<7 khz Speech quality: 4.5/5.0 MOS points Intelligibility: 98% Problem solved? [Data taken from: Krebber, Sprachübertragungsqualität von Fernsprech-Handapparaten, VDI-Fortschrittsberichte, 1995 and Terhardt, Akustische Kommunikation, Springer, 1998] J. Abel ABE using DNNs for Spectral Envelope Estimation 2/16

3 We Need More Acoustical Bandwidth! Requirements for a WB call: 1. WB-capable mobile handsets (far-end and near-end) 2. All participants of a call need to be located within a WB-capable cell 3. The provider s backbone network must be WB-capable 4. Further requirements for international WB calls and also for inter-operator connections If the many requirements are not met at the beginning of a call, only NB mode is possible. If requirements during a call are not met anymore, the call drops to NB mode. Typically, switching back to WB mode if requirements are met again is then disabled. Solution: Artificial Bandwidth Extension (ABE) Estimation of frequency components from 4 to 7 khz, a.k.a. the upper band (UB), at the receiver-side for a more consistent and WB-like experience J. Abel ABE using DNNs for Spectral Envelope Estimation 3/16

4 Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary J. Abel ABE using DNNs for Spectral Envelope Estimation 4/16

5 2. ABE Framework NB sample idx WB sample idx Frame index Power spectral density LP filter coef. Sampling frequencies UB Spectral Envelope Estimation NB PSD Computation WB. PSD Assembly. WB LP Analysis VAD estimated UB speech narrowband input speech 2 LP Analysis Filtering LP Synthesis Filtering wideband output speech J. Abel ABE using DNNs for Spectral Envelope Estimation 5/16

6 2. ABE Framework UB Spectral Envelope Classification Feature vec. A posteriori prob. Codebook entry Codebook entry idx Est. UB cepstral vec. UB Spectral Envelope Estimation UB Envelope Codebook Feature Extraction Statistical Model Spectral Conversion UB energy J. Abel ABE using DNNs for Spectral Envelope Estimation 6/16

7 2. ABE Framework Statistical Model: HMM/GMM (Baseline) : State prob. : Transition prob. : Likelihood LDA Matrix GMM Param. HMM Param. LDA Transform GMM Forward Algorithm HMM/GMM Linear discriminant analysis (LDA) for dimension reduction of features GMM as acoustic model Forward algorithm for HMM evaluation J. Abel ABE using DNNs for Spectral Envelope Estimation 7/16

8 2. ABE Framework Statistical Model: HMM/DNN (new) : Network weights : Network offsets DNN Param. HMM Param. DNN Prior Division Forward Algorithm HMM/DNN Deep neural network (DNN) as acoustic model Forward algorithm for HMM evaluation Posterior outputs from DNN are recalculated to likelihoods J. Abel ABE using DNNs for Spectral Envelope Estimation 8/16

9 2. ABE Framework Statistical Model: DNN (new) DNN Param. DNN DNN DNN as statistical model J. Abel ABE using DNNs for Spectral Envelope Estimation 9/16

10 Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary J. Abel ABE using DNNs for Spectral Envelope Estimation 10/16

pretraining, HMM/DNN/GMM training DNN validation checks Result reporting Speech Database TIMIT Train Set TIMIT Test Set NTT-AT Database

11 3. Simulations Experimental Setup DNN Experiments Initial weights for DNN training from restricted Boltzmann machine (RBM) pretraining DNN topologies under test: Number of hidden layers: 1, 2, 3, 4, 5, 6 Number of units per layer: 512 Datasets Step Codebook, RBM pretraining, HMM/DNN/GMM training DNN validation checks Result reporting Speech Database TIMIT Train Set TIMIT Test Set NTT-AT Database (EN+DE) Cepstral Distances for estimated UB envelope: estimated UB energy ratio: J. Abel ABE using DNNs for Spectral Envelope Estimation 11/16

12 3. Simulations Results Cepstral Distances DNN topology has only small influence on evaluation metrics #Hidden Layer(s) #Units 512 [db] [db] DNN DNN/ DNN/ DNN HMM HMM HMM/GMM Oracle UB energy cepstral distance decreased by more than 2 db (improvement!) Still big potential for further improvement UB envelope reconstruction very similar in all cases, small potential for further improvement J. Abel ABE using DNNs for Spectral Envelope Estimation 12/16

13 3. Simulations Results Speech Quality (WB-PESQ) Statistical Model MOS LQO HMM/GMM DNN HMM/DNN (Baseline) 2.73 [3.05,3.08] [2.99,3.02] Oracle MOS LQO points improvement! Gap to oracle less than 0.2 MOS LQO points J. Abel ABE using DNNs for Spectral Envelope Estimation 13/16

14 3. Simulations Latest ABE Approach and CCR-Test UB Spectral Envelope Estimation Feature Extraction DNN++ DNN Spectral Conversion CCR Condition CMOS AMR vs. AMR-WB 2.15 HMM/GMM M vs. AMR-WB 1.48 DNN++ vs. AMR-WB 1.31 HMM/GMM vs. DNN AMR vs. HMM/GMM 0.81 AMR vs. DNN J. Abel ABE using DNNs for Spectral Envelope Estimation 14/16

15 Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary J. Abel ABE using DNNs for Spectral Envelope Estimation 15/16

16 4. Summary DNNs outperform GMMs as acoustic model for artificial bandwidth extension Using DNNs led to an improvement of up to 0.35 MOS LQO points when ABE-processed speech is evaluated using WB-PESQ A superior UB energy estimation is responsible for the speech quality gain, rather than the UB envelope The UB spectral envelope estimation performance of DNNs is similar compared to GMMs Huge potential for further improvement of UB energy estimate Superiority of using DNNs in ABE was proven by a clear 1.37 CMOS points advantage over AMR-coded narrowband speech J. Abel ABE using DNNs for Spectral Envelope Estimation 16/16

17 Thank you for your attention Johannes Abel J. Abel ABE using DNNs for Spectral Envelope Estimation 17/16

2. ABE Framework UB Envelope Codebook Speech Data if frame contains

energy ratio LBG Clustering prediction gain NB UB Envelope Codebook

Extension Exploiting Speech Waveform and Phonetic Transcription, in

18 2. ABE Framework UB Envelope Codebook Speech Data if frame contains an /s/ or /z/ sound else prediction gain UB SLP Analysis Relative energy ratio LBG Clustering prediction gain NB UB Envelope Codebook 16 entries calculated from with 8 entries calculated from with P. Bauer and T. Fingscheidt, A Statistical Framework for Artificial Bandwidth Extension Exploiting Speech Waveform and Phonetic Transcription, in Proc. of EUSIPCO, Glasgow, Scotland, Aug. 2009, pp J. Abel ABE using DNNs for Spectral Envelope Estimation 18/16

19 3. Simulations Results Phoneme Accuracy Relative classification accuracy of HMM/DNN vs. (measured on validation set) HMM/GMM for phonemes Phoneme /f/ /th/ /dh/ /t/ /zh/ /s/ of 5 phonemes that profit most are fricative sounds All phonemes take profit from DNN as acoustic model J. Abel ABE using DNNs for Spectral Envelope Estimation 19/16

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,