BANDWIDTH EXTENSION OF NARROWBAND SPEECH BASED ON BLIND MODEL ADAPTATION
|
|
- Lesley Jordan
- 5 years ago
- Views:
Transcription
1 5th European Signal Processing Conference (EUSIPCO 007, Poznan, Poland, September 3-7, 007, copyright by EURASIP BANDWIDH EXENSION OF NARROWBAND SPEECH BASED ON BLIND MODEL ADAPAION Sheng Yao and Cheung-Fat Chan Department of Electronic Engineering City University of Hong Kong, Kowloon, Hong Kong and ABSRAC raditional telephone transmission networ has speech frequency upper-limit below Hz. he narrowband telephone speech (0 Hz sounds muffled as compared with the original wideband speech (0-8 Hz. Artificial bandwidth extension is an economical way of enhancing the quality of narrowband speech without modifying the infrastructure of the networ. Esting bandwidth extension methods usually include off-line learning phase and on-line enhancing phase. he performance of these systems depends largely on the consistency of wideband training data and actual narrowband input data. In real situation, input speeches usually mismatch with off-line training speeches, leading to serious model errors. o avoid the data mismatch, we propose a method based on blind adaptation of linear dynamic model. he benefit of our method is the exclusion of off-line training phase and experiment results show that our systems is comparable with those data-oriented systems in the measurements of highband spectral distortion. When data mismatch occurs, our system outperforms those systems.. INRODUCION With the gradual development of wideband voice terminals such as adaptive multi-rate wideband codec (AMR-WB and variable rate multi-model wideband codec (VRM-WB, current speech transmission networ is a mixture of traditional narrowband terminals and new wideband terminals. During this transition period, bandwidth extension systems (BWE helps to enhance the perceived quality of narrowband speech without the cost of replacing the old narrowband infrastructure. he authors of [] indicate that esting BWE systems are performing reasonably, not because they accurately retrieve the original missing high-band information, but rather they extend the high-band such that the signal sounds perceptually pleasant. Basically reported BWE systems can be his research is supported by Strategic Research Grant (Project of City University of Hong Kong classified into two categories: memoryless systems and memory systems. Memoryless methods are the earlier development of BWE, with members such as VQ codeboo mapping [], linear mapping [3] and Gaussian mixture model (GMM conversion []. hese methods are usually criticized for the disregardness of inter-frame correlationship, which is the cause of relatively large hissing artefacts. Recently, more attention is paid on memory system development. Candidates are hidden Marov model (HMM method [5], HMM with state mapping [7] and linear dynamic model [8]. hese systems are featured for the capability of estimating the missing high-band information based on previous estimations. hey focus more on the retrieval of the trajectory of spectrum evolution, thus hissing artefact greatly reduced. However, all the systems aforementioned are dataoriented. hey perform well when input narrowband speech is consistent with training database, i.e. the same speaer or similar recoding environment. It is not the case in real application where data mismatch often occurs. In this paper, we propose a memory system based on linear dynamic model whose parameters adapt to the input narrowband speech in a blind manner. Off-line model training is not required except for the initial model. Experiment results show that the proposed method is superior to memoryless systems and comparable with memory systems in the measurement of highband spectral distortion. he rest of paper is organized as follows. Section presents the employment of linear dynamic model in bandwidth extension systems. Section 3 explains how the proposed system maes itself adaptive to the input narrowband speech in a blind sense. In section, the objective performance is compared. he last section is for conclusion.. LINEAR DYNAMIC MODEL he model is also termed state space model. In linear state p space model, the hidden speech state vector x( R is presumably linearly evolving according to equation (. x ( + Ax ( + u + w( ( 007 EURASIP 350
2 5th European Signal Processing Conference (EUSIPCO 007, Poznan, Poland, September 3-7, 007, copyright by EURASIP is time index or speech frame index. ransformation matrix A and deterministic control vector u are pre-trained model parameter. w ( is uncorrelated zero-mean Gaussian noise vector with covariance E [ w( w( l ] Qδ. he m observation vector o( R is the noisy linear transformation of state vector x( according to equation (. l (a original wideband female speech from speaer clh o ( Cx ( + v( ( he equation is static in nature since vector o and x share the same index. v( is also uncorrelated zero-mean Gaussian noise with covariance E[ v( v( l ] Rδ. l By assuming o( as input narrowband speech feature vector and x( as the unnown wideband feature vector, linear state space model is employed in the speech bandwidth extension system. Such an assumption is reasonable because Human speech process is a non-stationary random process. he hidden state of the process is never static. State equation ( is one possible representation of state evolution for the ease of mathematical treatment. he idea of linear transformation relationship between o ( and x( assumed in observation equation ( has already been applied in memoryless linear mapping system. he performance is satisfactory apart from the limitation of memoryless nature of the system. 3 Due to the presence of noise and possible noninvertiblity of matrix C in equation (, the state vector x( cannot be uniquely estimated given the observation vector o(, which reflects the one-to-many mapping between narrowband and wideband speech features [6]. We extract 0-order line spectral frequencies (LSF as narrowband feature o. arget wideband feature s defined as 8-order LSF. 3. BLIND MODEL ADAPAION Given a sequence of narrowband feature vector o and a trained linear state space model θ { A, u, C, Q, R}, if we assume θ is stationary and well trained for the sequence, a pretty good estimation of x sequence can be obtained via Kalman filter algorithm as shown below: For,,..., L, Kalman Prediction xˆ Axˆ + u A A + Q Kalman Gain (b estimation of wideband speech with the mismatched model of male speaer bjm Figure ( illustration of model mismatch Κ C ( C C + R Kalman Correction x xˆ + Κ ( o( Cxˆ, initialized usually by ˆ Κ ( C C + R Κ. ˆ 0 x 0 E[ x(0] μ(0 E [ x(0 x(0 ] (0 0 0 he formulation of Kalman gain matrix Κ aims at the minimization of the trace of the state error matrix.herefore Kalman filter algorithm is a MMSE estimator of the hidden wideband speech vector x(. Note that, Kalman filter algorithm is sequential. However, θ { A, u, C, Q, R} should not be stationary. he method described in [8] provides the state space model with several different modes. he system, bloc by bloc, chooses the best fitted modes for the input narrowband sequence via some clustering techniques. Albeit such a treatment offers the state space model a certain degree of dynamics and the subjective and objective performance of the system is satisfactory, lie other data-oriented methods, a relatively large amount of training data is required. Besides, in case the input narrowband speech is quite different from the training database (i.e. speaer or recording environment difference, severe model error occurs, leading to unacceptable level of hissing. For example, a false formant trajectory may appear in high-band spectrogram when a male model is applied to a female input (see figure ( and vise versa. We propose a model updating mechanism that doesn t require off-line training. he basic assumption is that the system is confident about previously estimated wideband features and, by utilizing those results, allows the updating of the model parameters. he concept is illustrated in figure (. For an arbitrary input narrowband vector sequence with length N ( o (, o( +, o( +... o( + N, linear 007 EURASIP 35
3 5th European Signal Processing Conference (EUSIPCO 007, Poznan, Poland, September 3-7, 007, copyright by EURASIP Figure ( the sequence length is fixed to N frames and the updating bloc move forward frame by frame (a with model parameter updating state space model is assumed stationaryθ θ. he corresponding wideband estimate xˆ (, xˆ( +, xˆ( +... xˆ( + N are obtained from the previous estimation. Consider the narrowband input vector o ( + N. First, the sequential Kalman filter algorithm continues to estimate x ˆ( + N with modelθ θ. Given the narrowband observation sequence (b without model parameter updating o (, o( +, o( +, o( o( + N and wideband state estimations up to x ˆ( + N, x ˆ(, xˆ( +, xˆ( +, xˆ( xˆ( + N, the linear state space model θ θ + for sequence from + to + N is updated in mamum lielihood sense. where Aˆ uˆ] + [ ( ( ] 3( ( ˆ ( ( [ Qˆ C ( N + { [ Aˆ uˆ] + [ ] } N + Rˆ + [ 7 ( C+ 8( ] N + i + 3 x + i x i + x + i (, (, (, (, + 5 (, x, 6( i 7(, 8( With θ θ + and next narrowband input o ( + N +, xˆ ( + N + can be estimated via Kalman filter algorithm. hen θ θ + is updated with (c original wideband LSF trajectory Figure (3 system capability of tracing wideband features o ( +, o( +, o( + 3, o( +... o( + N + x ˆ ( +, xˆ( +, xˆ( + 3, xˆ( +... xˆ( + N + he procedure continues until the end of input is reached, which actually conducts a timely online training of the linear state space model. he computation is not as burdensome as model training in [8] since quantities, to require 8 full calculations only once (for the very first updating. In the following updating process, an addition and a subtraction is enough. he value of N is set to 60 frames (about 0 ms for our codec configuration. If N is too small (say less than 50, matrix ( ( and 3 6 ( ( N may become singular. he initial linear state space model is offline trained with environmental signals collected when speaers are not taling. he method is named blind because the updating of 007 EURASIP 35
4 5th European Signal Processing Conference (EUSIPCO 007, Poznan, Poland, September 3-7, 007, copyright by EURASIP state space model is localized within consecutive N speech frames. herefore the model parameters are optimized merely for these N frames. Besides, the wideband training data is the previous estimated data rather than the true data. One may wonder whether the updating is correct since there are estimation errors. We did the following experiment and found out such a blind adaptation is trustable. he first portion of the experiment is to enhance the narrowband speech with initial model not adapted. In such a case, model error is quite large. he other portion is the normal operation (allowing blind adaptation. As is depicted in figure (3, under normal operation of the proposed system, the general shape of high-band feature trajectories can be retrieved. Average distortion of line spectral frequencies is listed in table (. For reference, the third column is the result of well trained and source-matched memory system presented in [8]. Note that the goodness is more relevant to small high-order LSF distortions. case (a dotted curve: proposed method case (b dashed curve: memoryless VQ method solid curve: original Figure ( spectral envelope comparison With adaptation Without adaptation LDS reference lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( lsf( able ( LSF distortions along orders Conceptually, in the beginning, the proposed system enhances the silence narrowband input, producing spectrally flat high-band noisy signals lie what ordinary linear state space system does. When speech content comes in, the highband spectral distortion will be large if the initial model does not change accordingly. With adaptation, the model parameters are timely and locally optimized for current voiced narrowband input, driving the underlying model to a voiced model, frame by frame. Since the required wideband feature is previous estimations, which is spectrally flat, the new model parameter is actually optimized for such wideband output speeches as have similar narrowband with input and a spectrally flat high-band. Recall that most suffered speech sounds under bandwidth limitation are fricatives and plosives. hese sounds have a relatively flat high-band and few (a original wideband female speech (b estimated wideband speech by linear state space model with correct speaer model (c estimated wideband speech by blind model adaptation Figure (5 performance illustration of blind model adaptation voicing content in high-band portion. he objective of bandwidth extension is to artificially extend the bandwidth so that the speech becomes perceptually better. See illustration in figure (. In case (a, the speech quality is still enhanced even though high-band formant structure is not recovered. But if case (b occurs (due to model error or memoryless design limitation, human ear is quite sensitive to that noise. Finally in figure (5, we can see the performance difference between proposed method and conventional LDS system [8].. PERFORMANCE EVALUAION he objective measurement is high-band spectral distortion defined as follows: 007 EURASIP 353
5 5th European Signal Processing Conference (EUSIPCO 007, Poznan, Poland, September 3-7, 007, copyright by EURASIP D(dB Outlier Outlier (>5dB (>7.5dB VQ[] %.0% Linear mapping[3].963.5%.% GMM[] % 0.98% HMM[5].06.0% 0.7% HMM state mapping[7].357.% 0.0% Linear state space.33.05% 0.6% model[8] Proposed.55 3.% 0.77% able ( est A (test data matches training data D(dB Outlier Outlier (>5dB (>7.5dB VQ[] % 3.9% Linear mapping[3] %.07% GMM[] %.78% HMM[5] % 0.99% HMM state mapping[7] %.0% Linear state space % 0.8% model[8] Proposed % 0.80% able (3 est B (test data mismatches training data π D π π ( 0 log S ( ω 0 log S ( ω 0 org 0 ext dω BWE systems such as [][][3][5][7][8] are implemented and trained in a speaer-dependent manner. he training data is from the phonetically balanced IViE corpus ( 8-minute speaer dependent paragraph reading speech (about 00,000 frames according to our speech analysizer is piced out to train all the six systems. he silence segments are collected for the training of the initial model of the proposed system, which is the only off-line training requirement for the system. he performance is listed in table ( and (3. he mismatched test data is collected from another speaer with different gender. As we can see in table ((3, the proposed method has similar performances under two (3 circumstances. When test data is consistent with training database, the performance is better than memoryless systems and comparable with memory systems. When model mismatch occurs, it outperforms all the data-oriented methods. 5. CONCLUSION In this paper we present a bandwidth extension system based on blind adaptation of linear state space model. By the measurement of high-band spectral distortion, the proposed system is comparable with data-oriented memory systems and better than memoryless systems. When data mismatch occurs, the performance is better than all the data-oriented systems on the condition that the bacground environment is not dramatically changed. Moreover, off-line training is not required and the efficient computation of on-line model adaptation maes sure the system delay not too large. REFERENCES [] N. Enbom, and W.B. Kleijn, Bandwidth Expansion of Speech Based on Vector Quantization of the Mel Frequency Cepstral Coefficients, Proc. Speech Coding, pp. 7-73, 999. [] K.Y. Par, and H.S. Kim, Narrowband to Wideband Conversion of Speech Using GMM Based ransformation, Proc. ICASSP, pp , 000. [3] Y. Naatoh, M. sushima, and. Norimatsu, Generation of Broadband Speech from Narrowband Speech Based on Linear Mapping, Electronics and Communications in Japan, Part, Vol 85, No. 8, pp. -53, 00. [] M. Nilsson, H. Gustafsson, S. V. Anderson, and W. B. Kleijn, Gaussian Mixture Model Based Mutual Information Estimation between Frequency Bands in Speech, Proc. ICASSP, pp. I55-I58, 00 [5] P. Jax, and P. Vary, On artificial Bandwidth Extension of elephone Speech, Signal Processing, pp , 003. [6] Y. Agiomyrgiannais, and Y. Stylianou, Combined Estimation/coding of Highband Spectral Envelopes for Speech Spectrum Expansion, Proc. ICASSP, pp. 69-7, 00. [7] S.Yao and C.F.Chan, Bloc-based Bandwidth Extension of Narrowband Speech Signal by using CDHMM, Proc. ICASSP, pp. I793-I796, 005 [8] S.Yao and C.F.Chan, Speech Bandwidth Enhancement using State Space Speech Dynamics, Proc. ICASSP, pp. I89-I9, EURASIP 35
651 Analysis of LSF frame selection in voice conversion
651 Analysis of LSF frame selection in voice conversion Elina Helander 1, Jani Nurminen 2, Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology, Finland 2 Noia Technology
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationBandwidth Extension for Speech Enhancement
Bandwidth Extension for Speech Enhancement F. Mustiere, M. Bouchard, M. Bolic University of Ottawa Tuesday, May 4 th 2010 CCECE 2010: Signal and Multimedia Processing 1 2 3 4 Current Topic 1 2 3 4 Context
More informationARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION
ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION Tenkasi Ramabadran and Mark Jasiuk Motorola Labs, Motorola Inc., 1301 East Algonquin Road, Schaumburg, IL 60196,
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationArtificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation
Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation Johannes Abel and Tim Fingscheidt Institute
More informationChange Point Determination in Audio Data Using Auditory Features
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationSUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle
SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic
More informationSPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK
18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmar, August 23-27, 2010 SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK
More informationWavelet-based Voice Morphing
Wavelet-based Voice orphing ORPHANIDOU C., Oxford Centre for Industrial and Applied athematics athematical Institute, University of Oxford Oxford OX1 3LB, UK orphanid@maths.ox.ac.u OROZ I.. Oxford Centre
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationPattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt
Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationTranscoding of Narrowband to Wideband Speech
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Transcoding of Narrowband to Wideband Speech Christian H. Ritz University
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationEvaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation
Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationSpeech Coding in the Frequency Domain
Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationSpeech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions
INTERSPEECH 01 Speech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions Hannu Pulakka 1, Ville Myllylä 1, Anssi Rämö, and Paavo Alku 1 Microsoft
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationI D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in
More informationEffect of bandwidth extension to telephone speech recognition in cochlear implant users
Effect of bandwidth extension to telephone speech recognition in cochlear implant users Chuping Liu Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationA Spectral Conversion Approach to Single- Channel Speech Enhancement
University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 2007 A Spectral Conversion Approach to Single- Channel Speech Enhancement Athanasios
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationEFFICIENT SUPER-WIDE BANDWIDTH EXTENSION USING LINEAR PREDICTION BASED ANALYSIS-SYNTHESIS. Pramod Bachhav, Massimiliano Todisco and Nicholas Evans
EFFICIENT SUPER-WIDE BANDWIDTH EXTENSION USING LINEAR PREDICTION BASED ANALYSIS-SYNTHESIS Pramod Bachhav, Massimiliano Todisco and Nicholas Evans EURECOM, Sophia Antipolis, France {bachhav,todisco,evans}@eurecom.fr
More informationUNSUPERVISED SPEAKER CHANGE DETECTION FOR BROADCAST NEWS SEGMENTATION
4th European Signal Processing Conference (EUSIPCO 26), Florence, Italy, September 4-8, 26, copyright by EURASIP UNSUPERVISED SPEAKER CHANGE DETECTION FOR BROADCAST NEWS SEGMENTATION Kasper Jørgensen,
More informationVQ Source Models: Perceptual & Phase Issues
VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationAn audio watermark-based speech bandwidth extension method
Chen et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:10 RESEARCH Open Access An audio watermark-based speech bandwidth extension method Zhe Chen, Chengyong Zhao, Guosheng Geng
More informationOn Kalman Filtering. The 1960s: A Decade to Remember
On Kalman Filtering A study of A New Approach to Linear Filtering and Prediction Problems by R. E. Kalman Mehul Motani February, 000 The 960s: A Decade to Remember Rudolf E. Kalman in 960 Research Institute
More informationWireless Communication: Concepts, Techniques, and Models. Hongwei Zhang
Wireless Communication: Concepts, Techniques, and Models Hongwei Zhang http://www.cs.wayne.edu/~hzhang Outline Digital communication over radio channels Channel capacity MIMO: diversity and parallel channels
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationCS 188: Artificial Intelligence Spring Speech in an Hour
CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch
More informationON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS
15th European Signal Processing Conference (EUSIPCO 7), Poznan, Poland, September 3-7, 7, copyright by EURASIP ON BEDROSIAN CONDIION IN APPLICAION O CHIRP SOUNDS E. HERMANOWICZ 1 ) ) and M. ROJEWSKI Faculty
More informationcore signal feature extractor feature signal estimator adding additional frequency content frequency enhanced audio signal 112 selection side info.
US 20170358311A1 US 20170358311Α1 (ΐ9) United States (ΐ2) Patent Application Publication (ΐο) Pub. No.: US 2017/0358311 Al NAGEL et al. (43) Pub. Date: Dec. 14,2017 (54) DECODER FOR GENERATING A FREQUENCY
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationAutomatic Morse Code Recognition Under Low SNR
2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping
More informationModulation Spectrum Power-law Expansion for Robust Speech Recognition
Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:
More informationBandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding?
WIDEBAND SPEECH CODING STANDARDS AND WIRELESS SERVICES Bandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding? Peter Jax and Peter Vary, RWTH Aachen University
More informationNonuniform multi level crossing for signal reconstruction
6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven
More informationIMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR
IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR Tomasz Żernici, Mare Domańsi, Poznań University of Technology, Chair of Multimedia Telecommunications and Microelectronics, Polana 3, 6-965, Poznań,
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationSGN Audio and Speech Processing
Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations
More informationON THE POTENTIAL FOR ARTIFICIAL BANDWIDTH EXTENSION OF BONE AND TISSUE CONDUCTED SPEECH: A MUTUAL INFORMATION STUDY
Authors' accepted manuscript of the article published in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) http://dx.doi.org/10.1109/icassp.2015.7178944 ON THE POTENTIAL
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationJoint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events
INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationFlexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders
Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders Václav Eksler, Bruno Bessette, Milan Jelínek, Tommy Vaillancourt University of Sherbrooke, VoiceAge Corporation Montreal, QC,
More informationStudents: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions
More informationANALOGUE TRANSMISSION OVER FADING CHANNELS
J.P. Linnartz EECS 290i handouts Spring 1993 ANALOGUE TRANSMISSION OVER FADING CHANNELS Amplitude modulation Various methods exist to transmit a baseband message m(t) using an RF carrier signal c(t) =
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY
ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY D. Nagajyothi 1 and P. Siddaiah 2 1 Department of Electronics and Communication Engineering, Vardhaman College of Engineering, Shamshabad, Telangana,
More informationHUMAN speech is frequently encountered in several
1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationA Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder
A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder Jing Wang, Jingg Kuang, and Shenghui Zhao Research Center of Digital Communication Technology,Department of Electronic
More informationTHERE is a constant need for speech codecs with decreased
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 377 Conditional Vector Quantization for Speech Coding Yannis Agiomyrgiannakis and Yannis Stylianou Abstract In
More informationAccurate Delay Measurement of Coded Speech Signals with Subsample Resolution
PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,
More informationA Comparative Study of Formant Frequencies Estimation Techniques
A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax
More informationBEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor
BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient
More informationA CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
More informationCan binary masks improve intelligibility?
Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +
More informationtechniques are means of reducing the bandwidth needed to represent the human voice. In mobile
8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques
More informationVocoder (LPC) Analysis by Variation of Input Parameters and Signals
ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of
More informationSpeech Signal Enhancement Techniques
Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationChapter 2 Direct-Sequence Systems
Chapter 2 Direct-Sequence Systems A spread-spectrum signal is one with an extra modulation that expands the signal bandwidth greatly beyond what is required by the underlying coded-data modulation. Spread-spectrum
More informationMultimedia Signal Processing: Theory and Applications in Speech, Music and Communications
Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal
More informationThe Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach
The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach ZBYNĚ K TYCHTL Department of Cybernetics University of West Bohemia Univerzitní 8, 306 14
More informationEnhancement of Speech in Noisy Conditions
Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationChapter 2 Channel Equalization
Chapter 2 Channel Equalization 2.1 Introduction In wireless communication systems signal experiences distortion due to fading [17]. As signal propagates, it follows multiple paths between transmitter and
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationSeparating Voiced Segments from Music File using MFCC, ZCR and GMM
Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.
More informationCOM 12 C 288 E October 2011 English only Original: English
Question(s): 9/12 Source: Title: INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 Audience STUDY GROUP 12 CONTRIBUTION 288 P.ONRA Contribution Additional
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationMachine recognition of speech trained on data from New Jersey Labs
Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation
More informationSubjective Voice Quality Evaluation of Artificial Bandwidth Extension: Comparing Different Audio Bandwidths and Speech Codecs
INTERSPEECH 01 Subjective Voice Quality Evaluation of Artificial Bandwidth Extension: Comparing Different Audio Bandwidths and Speech Codecs Hannu Pulakka 1, Anssi Rämö, Ville Myllylä 1, Henri Toukomaa,
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationA Hybrid Synchronization Technique for the Frequency Offset Correction in OFDM
A Hybrid Synchronization Technique for the Frequency Offset Correction in OFDM Sameer S. M Department of Electronics and Electrical Communication Engineering Indian Institute of Technology Kharagpur West
More information