An Improved Voice Activity Detection Based on Deep Belief Networks

Size: px
Start display at page:

Download "An Improved Voice Activity Detection Based on Deep Belief Networks"

Transcription

1 e-issn Volume 2 Issue 4, April 2016 pp Scientific Journal Impact Factor : An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K. 1, Anand Pavithran 2 1,2 Department of Computer Science and Engineering MES College of Engineering, Kuttippuram Kerala, , India Abstract Multiple acoustic features are important for robust- ness of Voice Activity Detection(VAD). Statistical and Machine learning methods have been used for VAD recently. Machine learning methods concentrate more on detecting multiple acoustic features. Deep Belief Network(DBN) is a powerful hierarchical model for feature extraction. This is a nonlinear combination method of multiple features by a deep model. Here, consider the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, a linear classifier predict the class of the new feature. It is able to incorporate the deep regularity of the acoustic features, so that the overall advantage of the features can be fully mined. As the number of feature increases, the complexity also increases. An improved method is proposed with less time complexity. Keywords Voice Activity Detection, Deep Belief Network, Feature Fusion. I. INTRODUCTION A voice activity detection (VAD) algorithm is able to distinguish speech signals from noise. An effective VAD algorithm can differentiate between speech, which contains background noise, and signals with background noises only. The result of a VAD decision is a binary value, which indicates the presence of speech in the input signal (for example the output value is 1) or the presence of noise only (for example the output value is 0). A VAD algorithm is an integral part from amongst a variety of speech communication systems, such as speech recognition and speech coding in mobile phones, and IP telephony. In telecommunication systems an effective VAD algorithm plays an important role, especially in automatic speech recognition (ASR) systems. VAD is used to reduce the computation by eliminating unnecessary transmission and processing of non- speech segments and to reduce potential mis-recognition errors in non-speech segments. The typical design of a VAD algorithm is as follows: 1) There may be a noise reduction stage. 2) Some features or quantities are calculated from a section of the input signal. 3) A classification rule is applied to classify the section as speech or nonspeech. The classification of VAD are: VADs in standard speech processing systems. Statistical signal processing based VADs. Supervised machine learning based VADs. Unsupervised machine learning based All rights Reserved 676

2 The research on the multiple feature fusion topic is rather important due to the following two reasons[1]. First, the discriminability of a single acoustic feature based VAD is limited. Traditional VADs pay much attention on exploring new complicated acoustic features that are more discriminative. However, seldom features perform overwhelmingly better than the others. Second, the topic of feature fusion is not fully mined. Although most machine learning based VADs do some efforts to the feature fusion task, the main advantage of these VADs still lies in the superiority of the machine learning based approaches to the non-machine learning based approaches, while the feature fusion methods seem still lack of thorough study. The desirable aspects of VAD algorithms are listed below: A Good Decision Rule: A physical property of speech that can be exploited to give consistent and accurate judgment in classifying segments of the signal into silence or otherwise. Adaptability to Background Noise: Adapting to non-stationary background noise improves robustness. Low Computational Complexity: The complexity of VAD algorithm must be low to suit real-time applications. This paper deals with the design of an improved VAD based on DBN. The time complexity of proposed method is less than that of the existing system. II. RELATED WORKS There are many methods for VAD such as, statistical signal processing based, supervised machine learning and unsupervised machine learning based etc. Some of them are discussed in this section. Tao Yu et al., proposed a supervised machine learning based VAD [2] is a discriminative training method that uses a linear weighted sum instead of the simple sum algorithm in the multiple observation technique. In this method, the optimal combination weights from two discriminative training methods are studied to directly improve VAD performance, in terms of reduced misclassification errors and improved receiver operating characteristics (ROC) curves. The weights are optimized by the gradient descent algorithm and Minimum Classification Error (MCE) is used as the optimization objective. J. W. Shin et al., proposed a Support Vector Machine (SVM) [3] provides an effective generalization performance on classification problems based on machine learning. SVM based VAD employing effective feature vectors. Three feature vectors are considered: a posteriori SNR, a priori SNR and predicted SNR. A posteriori SNR is estimated as the ratio of input signal and the variance of noise signal and predicted SNR is estimated by the power spectra of the noise and speech. SVM makes an hyperplane that is separated without errors. Maximum margin clustering (MMC) [4] is an unsupervised learning approach for statistical voice activity detection. MMC can improve the robustness of support vector machine(svm) based VAD while requiring no data labeling for model training. In the MMC framework, the multiple observation compound feature(mo-cf) is proposed to improve accuracy. MO-CF is composed of two sub-features, they are, multiple observation signal-to- noise ratio (MO-SNR) and multiple observation maximum probability (MO-MP). Dongwen Ying et al., propose an unsupervised learning framework [5] to construct statistical models for VAD. This framework is realized by a sequential Gaussian mixture model (GMM). It comprises an initialization process and an updating process. Select the smoothed subband logarithmic energy as the acoustic feature. The input signal is grouped into several Mel subbands in the frequency domain. Then, the logarithmic energy is calculated by using the logarithmic value of the absolute magnitude sum of each subband. Eventually, it is smoothed to form an envelope for classification. Two Gaussian models are employed as the classifier to describe the logarithmic energy All rights Reserved 677

3 of speech and nonspeech, respectively. These two models are incorporated into a two component GMM. Its parameters are estimated in an unsupervised way. Speech/nonspeech classification is firstly conducted at each subband. Then, all subband s decisions are summarized by a voting procedure. The proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Various VAD methods are discussed, in order to further improve the performance, proposed to introduce DBN to VAD. Fundamentally, the advantage of the DBN-based VAD is that DBN has a much stronger ability of describing the variations of the features. The DBN based Vad is discussed in the following section and an improvement is also done. III. DBN BASED VAD The DBN-based VAD connects multiple acoustic features of an observation in serial to a long feature vector which is used as the visible layer [i.e., input] of DBN. Then, by transferring the long feature vector through multiple nonlinear hidden layers a new feature is extracted. Finally, the new feature is given as the input of linear classifier [i.e., softmax output layer] of DBN to predict the class of the observation. The prediction function of DBN is formulated as[1]: VAD only contains two classes(speech and non-speech), the prediction function of the DBN-based VAD is as follows: where H1 / H0 denotes the speech/noise hypothesis, and η is a tunable decision threshold, usually setting to 0, g (L) (.) is the activation function of the Lth hidden layer and is defined as, is the weights between the adjacent two layers with i as the i th unit of the L th layer and j as the j th unit of the (L-1) th layer and { x r } r is the input feature vector. The training process of DBN consists of two phases. First, it takes a greedy layer-wise unsupervised pre-training phase, of the stacked RBMs to find initial parameters that are close to a good solution of the deep neural network. Then, it takes a supervised back-propagation training phase to fine-tune the initial parameters. The key point that contributes to the success of DBN is the greedy layer-wise unsupervised pre-training of the RBM models. It performs like a regularizer of the supervised training phase that prevents DBN from over-fitting to the training All rights Reserved 678

4 The layer-wise unsupervised pre-training of the RBM models contributes to the success of DBN. RBM is an energy model-based two layer, bipartite, undirected stochastic graphical model. Specifically, one layer of RBM is composed of visible units v, and the other layer is composed of hidden units h. There are symmetric connections between the two layers and no connection within each layer. The connection weights can be represented by a weight matrix. In this paper, we only consider the Bernoulli (visible)-bernoulli (hidden) RBM, which means v i {0,1} and h j {0,1}. RBM tries to find a model that maximize the likelihood of, which is equivalent to the optimization problem given below, Where And Energy (v,h;w)=-b T v- c T h-h T Wv where b and c are bias terms of visible layer and hidden layer. DBN is a powerful hierarchical generative model for feature extraction and it can perfectly fuse the advantages of multiple features. In this work, eight features are considered for feature extraction. As the number of features increases, the complexity of the DBN also increases and it will take more time for voice activity detection. So, the limitation of this work is its increased time complexity corresponding to the number of features. In order to overcome this problem, another method in which the same accuracy can be achieved by using more significant features is designed. IV. IMPROVED VAD BASED ON DBN An improved method is introduced and which is based on the more significant features considered for feature extraction. The modification is done by considering the following features: pitch, discrete Fourier transform (DFT), mel-frequency cepstral coefficients (MFCC), linear predictive coding (LPC), energy and zero crossing rate(zcr). In this method, taken the energy and zero crossing rate instead of relative-spectral perceptual linear predictive analysis (RASTA-PLP) and amplitude modulation spectrograms (AMS)[8] in the former case. A. Feature Extraction 1) Pitch: The pitch is estimated using Cepstral Method. The steps involved are: The original signal is transformed using a Fast Fourier Transform (FFT) algorithm. The resulting spectrum is converted to a logarithmic scale. It then transformed using the same FFT algorithm to obtain the power cepstrum. The power cepstrum reverts to the time domain and exhibits peaks corresponding to the period of the frequency spacings. The cepstral coefficients are given All rights Reserved 679

5 where τ - frequency, F - fourier transform, x[n] signal in the time domain and F{x[n] } 2 - power spectrum estimate of the signal. 2) MFCC: Sounds are filtered by the shape of the vocal tract including tongue, teeth etc. This shape determines what sound comes out. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum. the job of MFCCs is to accurately represent this envelope[9]. The steps involved are; Preemphasis: passing of signal through a filter which emphasizes higher frequencies. Framing: the speech signal is divided into short time frames. Hamming windowing: Y(n)= X(n) W(n) Where Hamming window is defined as W(n), 0 n N-1, N = number of samples in each frame, Y[n] = Output signal, X (n) = input signal. Fast Fourier Transform: To convert each frame of N samples from time domain into frequency domain. Where X(w), H(w) and Y(w) are the Fourier Transform of X(t), h(t) and Y(t) respectively. Mel Filter Bank Processing: The frequencies range in FFT spectrum is very wide and voice signal does not follow the linear scale. - A weighted sum of filter spectral components is used. Discrete Cosine Transform : to convert the log Mel spectrum into time domain. The result of the conversion is called Mel Frequency Cepstrum Coefficient. 3) LPC: LPC is a tool used in audio signal processing which represents the spectral envelope of digital signal in a compressed form[7]. The function [a,e]=lpc(x,n), finds coefficients, A=[ 1A(2)... A(N+1) ], of an Nth order forward linear predictor. Xp(n)= -A(2)*X(n-1)-A(3)*X(n-2)-...- A(N+1)*X(n-N) such that the sum of the squares of the errors err(n) = X(n) - Xp(n) is minimized. [A, E ] = LPC(X,N) returns the variance (power) of the prediction error. Where X can be a vector or a matrix. If X is a matrix containing a separate signal in each column, LPC returns a model estimate for each column in the rows of A. And N specifies the order of the polynomial A(z) which must be a positive integer. N must be less or equal to the length of X. If X is a matrix, N must be less or equal to the length of each column of X. 4) RASTA-PLP: PLP speech analysis is based on short term spectrum of speech. RASTA applies a band-pass filter to the energy in each frequency subband to smooth over short-term noise variations to remove any constant offset resulting from static spectral coloration in the speech channel. RASTA-PLP makes PLP more robust to linear spectral distortions. The steps involved are: Compute the critical band spectrum and take its All rights Reserved 680

6 Estimate the temporal derivative of the log critical band spectrum. Re-integrate the log critical band temporal derivative. Take inverse logarithm of this relative log spectrum, yielding a relative auditory spectrum. Compute an all pole model of this spectrum. 5) Energy: The energy of the speech signal provides a representation that reflects these amplitude variations. Shorttime energy can define as: Where w(m) is the hamming window. 6) Zero Crossing Rate(ZCR): For discrete-time signals, a zero crossing is said to occur if successive samples have different algebraic signs. The rate at which zero crossings occur is a simple measure of the frequency content of a signal. Zero-crossing rate is a measure of number of times in a given time interval/frame that the amplitude of the speech signals passes through a value of zero. The definition for zero crossings rate is: Where And V. EXPERIMENTS AND RESULTS All experiments are conducted with MATLAB 2013A in windows operating system. A. Dataset The training set consists of twenty signals and the test set also contains the same twenty signals. In this paper, the sampling rate is 8 khz. Because speech can be approximated as a stationary process in short-time scales, divide speech signals into a sequence of short-time frames. The frame is used as the basic detection unit. Given a frame, if the samples labeled as speech are more than a half, the frame is labeled as speech, otherwise, the frame is labeled as noise. B. Acoustic Features for VAD To better show the advantages of the feature fusion tech- niques, extract eight acoustic features from each observa- tion. They are pitch, discrete Fourier transform (DFT), mel- frequency cepstral coefficients (MFCC), linear predictive cod- ing (LPC)[7], energy and zero crossing rate(zcr). C. Parameter All rights Reserved 681

7 For the proposed DBN based VAD, the depth of DBN [i.e., the number of the hidden layers, or the number of the RBM models] is taken as 2. Denote the n-layer s DBN as DBNn. That is, denote the DBN with only one hidden layer as DBN1. D. Results Twenty speech signals are used for training the network samples(from sample number to 50000) of each signal is taken for training. The testing is conducted for the same samples of the selected signal. Voice activity present in a signal is detected. Figures 1 and 2 show the output of VAD for testing two example signals. Voice frames are represented by 1 and silence by 0. The comparison between the basic method and improved method is given in the graph in figure 3. From the graph it is clear that the proposed method is better than the existing method. Fig. 1. The original signal and result of VAD for example 1 Fig. 2. The original signal and result of VAD for example All rights Reserved 682

8 Fig. 3. Comparison between the basic method and improved method VI. CONCLUSION Voice activity detector (VAD) is an important front-end of modern speech signal processing systems. The DBN-based VAD aims to extract a new feature that can fully express the advantages of all acoustic features by transferring the acoustic features through multiple nonlinear hidden layers. The complexity of DBN is more as the number of features increases. A less complex VAD is designed by considering more significant features. The improved method outperforms the existing method. The scope for the future work relies on the wide variety of application areas of speech processing and to improve the recognition and transmission of speech. And also a modified VAD with less number of features will result in a less complex DBN. REFERENCES [1] Xiao-Lei Zhang and Ji Wu, Deep Belief Networks Based Voice Ac- tivity Detection, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, April [2] Tao Yu and John H. L. Hansen, Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection, IEEE Signal Processing Letters, Vol. 17, No. 11, November [3] Ji Wu and Xiao-Lei Zhang, VAD based on statistical models and machine learning approaches, ELSEVIER, Computer Speech and Lang., [4] Ji Wu and Xiao-Lei Zhang, Maximum Margin Clustering Based Statis- tical VAD With Multiple Observation Compound Feature, IEEE Signal Processing Letters, Vol. 18, No. 5, May [5] Dongwen Ying, Yonghong Yan, Jianwu Dang, and Frank K. Soong, Voice Activity Detection Based on an Unsupervised Learning Frame- work, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 8, November [6] D. Yu and L. Deng, Deep learning and its applications to signal and information processing, IEEE Signal Processing Magazine, vol. 28, no.1, pp , Jan [7] Lawrence R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, Jan [8] Jurgen Tchorz and Birger Kollmeier, Automatic classification of the acoustical situation using amplitude modulation spectrograms. [9] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, Voice Recogni- tion Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, Journal of Computing, vol.2, Issue 3, Mar 2010, ISSN All rights Reserved 683

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Speech Recognition using FIR Wiener Filter

Speech Recognition using FIR Wiener Filter Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Speaker and Noise Independent Voice Activity Detection

Speaker and Noise Independent Voice Activity Detection Speaker and Noise Independent Voice Activity Detection François G. Germain, Dennis L. Sun,2, Gautham J. Mysore 3 Center for Computer Research in Music and Acoustics, Stanford University, CA 9435 2 Department

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW ANJALI BALA * Kurukshetra University, Department of Instrumentation & Control Engineering., H.E.C* Jagadhri, Haryana, 135003, India sachdevaanjali26@gmail.com

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY Jesper Højvang Jensen 1, Mads Græsbøll Christensen 1, Manohar N. Murthi, and Søren Holdt Jensen 1 1 Department of Communication Technology,

More information

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications

More information

HIGH RESOLUTION SIGNAL RECONSTRUCTION

HIGH RESOLUTION SIGNAL RECONSTRUCTION HIGH RESOLUTION SIGNAL RECONSTRUCTION Trausti Kristjansson Machine Learning and Applied Statistics Microsoft Research traustik@microsoft.com John Hershey University of California, San Diego Machine Perception

More information

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio Topic Spectrogram Chromagram Cesptrogram Short time Fourier Transform Break signal into windows Calculate DFT of each window The Spectrogram spectrogram(y,1024,512,1024,fs,'yaxis'); A series of short term

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

A Survey and Evaluation of Voice Activity Detection Algorithms

A Survey and Evaluation of Voice Activity Detection Algorithms A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

A Comparative Study of Formant Frequencies Estimation Techniques

A Comparative Study of Formant Frequencies Estimation Techniques A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Digital Speech Processing- Lecture 14A Algorithms for Speech Processing Speech Processing Algorithms Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Single speech

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt Aalborg Universitet Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt Published in: Proceedings of the

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM

CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM Nuri F. Ince 1, Fikri Goksu 1, Ahmed H. Tewfik 1, Ibrahim Onaran 2, A. Enis Cetin 2, Tom

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Automatic Morse Code Recognition Under Low SNR

Automatic Morse Code Recognition Under Low SNR 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping

More information

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Shibani.H 1, Lekshmi M S 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala,

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Performance Analysis of OFDM under DWT, DCT based Image Processing Anshul Soni soni.anshulec14@gmail.com Ashok Chandra Tiwari Abstract In this paper, the performance of conventional discrete cosine transform

More information

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal

More information