An Improved Voice Activity Detection Based on Deep Belief Networks
|
|
- Patrick Dennis
- 5 years ago
- Views:
Transcription
1 e-issn Volume 2 Issue 4, April 2016 pp Scientific Journal Impact Factor : An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K. 1, Anand Pavithran 2 1,2 Department of Computer Science and Engineering MES College of Engineering, Kuttippuram Kerala, , India Abstract Multiple acoustic features are important for robust- ness of Voice Activity Detection(VAD). Statistical and Machine learning methods have been used for VAD recently. Machine learning methods concentrate more on detecting multiple acoustic features. Deep Belief Network(DBN) is a powerful hierarchical model for feature extraction. This is a nonlinear combination method of multiple features by a deep model. Here, consider the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, a linear classifier predict the class of the new feature. It is able to incorporate the deep regularity of the acoustic features, so that the overall advantage of the features can be fully mined. As the number of feature increases, the complexity also increases. An improved method is proposed with less time complexity. Keywords Voice Activity Detection, Deep Belief Network, Feature Fusion. I. INTRODUCTION A voice activity detection (VAD) algorithm is able to distinguish speech signals from noise. An effective VAD algorithm can differentiate between speech, which contains background noise, and signals with background noises only. The result of a VAD decision is a binary value, which indicates the presence of speech in the input signal (for example the output value is 1) or the presence of noise only (for example the output value is 0). A VAD algorithm is an integral part from amongst a variety of speech communication systems, such as speech recognition and speech coding in mobile phones, and IP telephony. In telecommunication systems an effective VAD algorithm plays an important role, especially in automatic speech recognition (ASR) systems. VAD is used to reduce the computation by eliminating unnecessary transmission and processing of non- speech segments and to reduce potential mis-recognition errors in non-speech segments. The typical design of a VAD algorithm is as follows: 1) There may be a noise reduction stage. 2) Some features or quantities are calculated from a section of the input signal. 3) A classification rule is applied to classify the section as speech or nonspeech. The classification of VAD are: VADs in standard speech processing systems. Statistical signal processing based VADs. Supervised machine learning based VADs. Unsupervised machine learning based All rights Reserved 676
2 The research on the multiple feature fusion topic is rather important due to the following two reasons[1]. First, the discriminability of a single acoustic feature based VAD is limited. Traditional VADs pay much attention on exploring new complicated acoustic features that are more discriminative. However, seldom features perform overwhelmingly better than the others. Second, the topic of feature fusion is not fully mined. Although most machine learning based VADs do some efforts to the feature fusion task, the main advantage of these VADs still lies in the superiority of the machine learning based approaches to the non-machine learning based approaches, while the feature fusion methods seem still lack of thorough study. The desirable aspects of VAD algorithms are listed below: A Good Decision Rule: A physical property of speech that can be exploited to give consistent and accurate judgment in classifying segments of the signal into silence or otherwise. Adaptability to Background Noise: Adapting to non-stationary background noise improves robustness. Low Computational Complexity: The complexity of VAD algorithm must be low to suit real-time applications. This paper deals with the design of an improved VAD based on DBN. The time complexity of proposed method is less than that of the existing system. II. RELATED WORKS There are many methods for VAD such as, statistical signal processing based, supervised machine learning and unsupervised machine learning based etc. Some of them are discussed in this section. Tao Yu et al., proposed a supervised machine learning based VAD [2] is a discriminative training method that uses a linear weighted sum instead of the simple sum algorithm in the multiple observation technique. In this method, the optimal combination weights from two discriminative training methods are studied to directly improve VAD performance, in terms of reduced misclassification errors and improved receiver operating characteristics (ROC) curves. The weights are optimized by the gradient descent algorithm and Minimum Classification Error (MCE) is used as the optimization objective. J. W. Shin et al., proposed a Support Vector Machine (SVM) [3] provides an effective generalization performance on classification problems based on machine learning. SVM based VAD employing effective feature vectors. Three feature vectors are considered: a posteriori SNR, a priori SNR and predicted SNR. A posteriori SNR is estimated as the ratio of input signal and the variance of noise signal and predicted SNR is estimated by the power spectra of the noise and speech. SVM makes an hyperplane that is separated without errors. Maximum margin clustering (MMC) [4] is an unsupervised learning approach for statistical voice activity detection. MMC can improve the robustness of support vector machine(svm) based VAD while requiring no data labeling for model training. In the MMC framework, the multiple observation compound feature(mo-cf) is proposed to improve accuracy. MO-CF is composed of two sub-features, they are, multiple observation signal-to- noise ratio (MO-SNR) and multiple observation maximum probability (MO-MP). Dongwen Ying et al., propose an unsupervised learning framework [5] to construct statistical models for VAD. This framework is realized by a sequential Gaussian mixture model (GMM). It comprises an initialization process and an updating process. Select the smoothed subband logarithmic energy as the acoustic feature. The input signal is grouped into several Mel subbands in the frequency domain. Then, the logarithmic energy is calculated by using the logarithmic value of the absolute magnitude sum of each subband. Eventually, it is smoothed to form an envelope for classification. Two Gaussian models are employed as the classifier to describe the logarithmic energy All rights Reserved 677
3 of speech and nonspeech, respectively. These two models are incorporated into a two component GMM. Its parameters are estimated in an unsupervised way. Speech/nonspeech classification is firstly conducted at each subband. Then, all subband s decisions are summarized by a voting procedure. The proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Various VAD methods are discussed, in order to further improve the performance, proposed to introduce DBN to VAD. Fundamentally, the advantage of the DBN-based VAD is that DBN has a much stronger ability of describing the variations of the features. The DBN based Vad is discussed in the following section and an improvement is also done. III. DBN BASED VAD The DBN-based VAD connects multiple acoustic features of an observation in serial to a long feature vector which is used as the visible layer [i.e., input] of DBN. Then, by transferring the long feature vector through multiple nonlinear hidden layers a new feature is extracted. Finally, the new feature is given as the input of linear classifier [i.e., softmax output layer] of DBN to predict the class of the observation. The prediction function of DBN is formulated as[1]: VAD only contains two classes(speech and non-speech), the prediction function of the DBN-based VAD is as follows: where H1 / H0 denotes the speech/noise hypothesis, and η is a tunable decision threshold, usually setting to 0, g (L) (.) is the activation function of the Lth hidden layer and is defined as, is the weights between the adjacent two layers with i as the i th unit of the L th layer and j as the j th unit of the (L-1) th layer and { x r } r is the input feature vector. The training process of DBN consists of two phases. First, it takes a greedy layer-wise unsupervised pre-training phase, of the stacked RBMs to find initial parameters that are close to a good solution of the deep neural network. Then, it takes a supervised back-propagation training phase to fine-tune the initial parameters. The key point that contributes to the success of DBN is the greedy layer-wise unsupervised pre-training of the RBM models. It performs like a regularizer of the supervised training phase that prevents DBN from over-fitting to the training All rights Reserved 678
4 The layer-wise unsupervised pre-training of the RBM models contributes to the success of DBN. RBM is an energy model-based two layer, bipartite, undirected stochastic graphical model. Specifically, one layer of RBM is composed of visible units v, and the other layer is composed of hidden units h. There are symmetric connections between the two layers and no connection within each layer. The connection weights can be represented by a weight matrix. In this paper, we only consider the Bernoulli (visible)-bernoulli (hidden) RBM, which means v i {0,1} and h j {0,1}. RBM tries to find a model that maximize the likelihood of, which is equivalent to the optimization problem given below, Where And Energy (v,h;w)=-b T v- c T h-h T Wv where b and c are bias terms of visible layer and hidden layer. DBN is a powerful hierarchical generative model for feature extraction and it can perfectly fuse the advantages of multiple features. In this work, eight features are considered for feature extraction. As the number of features increases, the complexity of the DBN also increases and it will take more time for voice activity detection. So, the limitation of this work is its increased time complexity corresponding to the number of features. In order to overcome this problem, another method in which the same accuracy can be achieved by using more significant features is designed. IV. IMPROVED VAD BASED ON DBN An improved method is introduced and which is based on the more significant features considered for feature extraction. The modification is done by considering the following features: pitch, discrete Fourier transform (DFT), mel-frequency cepstral coefficients (MFCC), linear predictive coding (LPC), energy and zero crossing rate(zcr). In this method, taken the energy and zero crossing rate instead of relative-spectral perceptual linear predictive analysis (RASTA-PLP) and amplitude modulation spectrograms (AMS)[8] in the former case. A. Feature Extraction 1) Pitch: The pitch is estimated using Cepstral Method. The steps involved are: The original signal is transformed using a Fast Fourier Transform (FFT) algorithm. The resulting spectrum is converted to a logarithmic scale. It then transformed using the same FFT algorithm to obtain the power cepstrum. The power cepstrum reverts to the time domain and exhibits peaks corresponding to the period of the frequency spacings. The cepstral coefficients are given All rights Reserved 679
5 where τ - frequency, F - fourier transform, x[n] signal in the time domain and F{x[n] } 2 - power spectrum estimate of the signal. 2) MFCC: Sounds are filtered by the shape of the vocal tract including tongue, teeth etc. This shape determines what sound comes out. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum. the job of MFCCs is to accurately represent this envelope[9]. The steps involved are; Preemphasis: passing of signal through a filter which emphasizes higher frequencies. Framing: the speech signal is divided into short time frames. Hamming windowing: Y(n)= X(n) W(n) Where Hamming window is defined as W(n), 0 n N-1, N = number of samples in each frame, Y[n] = Output signal, X (n) = input signal. Fast Fourier Transform: To convert each frame of N samples from time domain into frequency domain. Where X(w), H(w) and Y(w) are the Fourier Transform of X(t), h(t) and Y(t) respectively. Mel Filter Bank Processing: The frequencies range in FFT spectrum is very wide and voice signal does not follow the linear scale. - A weighted sum of filter spectral components is used. Discrete Cosine Transform : to convert the log Mel spectrum into time domain. The result of the conversion is called Mel Frequency Cepstrum Coefficient. 3) LPC: LPC is a tool used in audio signal processing which represents the spectral envelope of digital signal in a compressed form[7]. The function [a,e]=lpc(x,n), finds coefficients, A=[ 1A(2)... A(N+1) ], of an Nth order forward linear predictor. Xp(n)= -A(2)*X(n-1)-A(3)*X(n-2)-...- A(N+1)*X(n-N) such that the sum of the squares of the errors err(n) = X(n) - Xp(n) is minimized. [A, E ] = LPC(X,N) returns the variance (power) of the prediction error. Where X can be a vector or a matrix. If X is a matrix containing a separate signal in each column, LPC returns a model estimate for each column in the rows of A. And N specifies the order of the polynomial A(z) which must be a positive integer. N must be less or equal to the length of X. If X is a matrix, N must be less or equal to the length of each column of X. 4) RASTA-PLP: PLP speech analysis is based on short term spectrum of speech. RASTA applies a band-pass filter to the energy in each frequency subband to smooth over short-term noise variations to remove any constant offset resulting from static spectral coloration in the speech channel. RASTA-PLP makes PLP more robust to linear spectral distortions. The steps involved are: Compute the critical band spectrum and take its All rights Reserved 680
6 Estimate the temporal derivative of the log critical band spectrum. Re-integrate the log critical band temporal derivative. Take inverse logarithm of this relative log spectrum, yielding a relative auditory spectrum. Compute an all pole model of this spectrum. 5) Energy: The energy of the speech signal provides a representation that reflects these amplitude variations. Shorttime energy can define as: Where w(m) is the hamming window. 6) Zero Crossing Rate(ZCR): For discrete-time signals, a zero crossing is said to occur if successive samples have different algebraic signs. The rate at which zero crossings occur is a simple measure of the frequency content of a signal. Zero-crossing rate is a measure of number of times in a given time interval/frame that the amplitude of the speech signals passes through a value of zero. The definition for zero crossings rate is: Where And V. EXPERIMENTS AND RESULTS All experiments are conducted with MATLAB 2013A in windows operating system. A. Dataset The training set consists of twenty signals and the test set also contains the same twenty signals. In this paper, the sampling rate is 8 khz. Because speech can be approximated as a stationary process in short-time scales, divide speech signals into a sequence of short-time frames. The frame is used as the basic detection unit. Given a frame, if the samples labeled as speech are more than a half, the frame is labeled as speech, otherwise, the frame is labeled as noise. B. Acoustic Features for VAD To better show the advantages of the feature fusion tech- niques, extract eight acoustic features from each observa- tion. They are pitch, discrete Fourier transform (DFT), mel- frequency cepstral coefficients (MFCC), linear predictive cod- ing (LPC)[7], energy and zero crossing rate(zcr). C. Parameter All rights Reserved 681
7 For the proposed DBN based VAD, the depth of DBN [i.e., the number of the hidden layers, or the number of the RBM models] is taken as 2. Denote the n-layer s DBN as DBNn. That is, denote the DBN with only one hidden layer as DBN1. D. Results Twenty speech signals are used for training the network samples(from sample number to 50000) of each signal is taken for training. The testing is conducted for the same samples of the selected signal. Voice activity present in a signal is detected. Figures 1 and 2 show the output of VAD for testing two example signals. Voice frames are represented by 1 and silence by 0. The comparison between the basic method and improved method is given in the graph in figure 3. From the graph it is clear that the proposed method is better than the existing method. Fig. 1. The original signal and result of VAD for example 1 Fig. 2. The original signal and result of VAD for example All rights Reserved 682
8 Fig. 3. Comparison between the basic method and improved method VI. CONCLUSION Voice activity detector (VAD) is an important front-end of modern speech signal processing systems. The DBN-based VAD aims to extract a new feature that can fully express the advantages of all acoustic features by transferring the acoustic features through multiple nonlinear hidden layers. The complexity of DBN is more as the number of features increases. A less complex VAD is designed by considering more significant features. The improved method outperforms the existing method. The scope for the future work relies on the wide variety of application areas of speech processing and to improve the recognition and transmission of speech. And also a modified VAD with less number of features will result in a less complex DBN. REFERENCES [1] Xiao-Lei Zhang and Ji Wu, Deep Belief Networks Based Voice Ac- tivity Detection, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, April [2] Tao Yu and John H. L. Hansen, Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection, IEEE Signal Processing Letters, Vol. 17, No. 11, November [3] Ji Wu and Xiao-Lei Zhang, VAD based on statistical models and machine learning approaches, ELSEVIER, Computer Speech and Lang., [4] Ji Wu and Xiao-Lei Zhang, Maximum Margin Clustering Based Statis- tical VAD With Multiple Observation Compound Feature, IEEE Signal Processing Letters, Vol. 18, No. 5, May [5] Dongwen Ying, Yonghong Yan, Jianwu Dang, and Frank K. Soong, Voice Activity Detection Based on an Unsupervised Learning Frame- work, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 8, November [6] D. Yu and L. Deng, Deep learning and its applications to signal and information processing, IEEE Signal Processing Magazine, vol. 28, no.1, pp , Jan [7] Lawrence R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, Jan [8] Jurgen Tchorz and Birger Kollmeier, Automatic classification of the acoustical situation using amplitude modulation spectrograms. [9] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, Voice Recogni- tion Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, Journal of Computing, vol.2, Issue 3, Mar 2010, ISSN All rights Reserved 683
Mel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationSpeech Recognition using FIR Wiener Filter
Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationAuditory Based Feature Vectors for Speech Recognition Systems
Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationSpeaker and Noise Independent Voice Activity Detection
Speaker and Noise Independent Voice Activity Detection François G. Germain, Dennis L. Sun,2, Gautham J. Mysore 3 Center for Computer Research in Music and Acoustics, Stanford University, CA 9435 2 Department
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationVOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW
VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW ANJALI BALA * Kurukshetra University, Department of Instrumentation & Control Engineering., H.E.C* Jagadhri, Haryana, 135003, India sachdevaanjali26@gmail.com
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationI D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in
More informationSYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE
SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationRhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University
Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004
More informationMFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM
www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationSignal segmentation and waveform characterization. Biosignal processing, S Autumn 2012
Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationVoice Activity Detection for Speech Enhancement Applications
Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity
More informationA CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More informationGammatone Cepstral Coefficient for Speaker Identification
Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia
More informationEVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY
EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY Jesper Højvang Jensen 1, Mads Græsbøll Christensen 1, Manohar N. Murthi, and Søren Holdt Jensen 1 1 Department of Communication Technology,
More informationCHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS
66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications
More informationHIGH RESOLUTION SIGNAL RECONSTRUCTION
HIGH RESOLUTION SIGNAL RECONSTRUCTION Trausti Kristjansson Machine Learning and Applied Statistics Microsoft Research traustik@microsoft.com John Hershey University of California, San Diego Machine Perception
More informationTopic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Topic Spectrogram Chromagram Cesptrogram Short time Fourier Transform Break signal into windows Calculate DFT of each window The Spectrogram spectrogram(y,1024,512,1024,fs,'yaxis'); A series of short term
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationPerceptive Speech Filters for Speech Signal Noise Reduction
International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science
More informationA Survey and Evaluation of Voice Activity Detection Algorithms
A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationA Comparative Study of Formant Frequencies Estimation Techniques
A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax
More informationPerformance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System
Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)
More informationAudio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23
Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal
More informationMMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2
MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,
More informationSpeech/Non-speech detection Rule-based method using log energy and zero crossing rate
Digital Speech Processing- Lecture 14A Algorithms for Speech Processing Speech Processing Algorithms Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Single speech
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationI D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationEvaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt
Aalborg Universitet Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt Published in: Proceedings of the
More informationCHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS
46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationCLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM
CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM Nuri F. Ince 1, Fikri Goksu 1, Ahmed H. Tewfik 1, Ibrahim Onaran 2, A. Enis Cetin 2, Tom
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationAn Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet
Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationSPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction
SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements
More informationSeparating Voiced Segments from Music File using MFCC, ZCR and GMM
Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationAutomatic Morse Code Recognition Under Low SNR
2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping
More informationDominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation
Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Shibani.H 1, Lekshmi M S 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala,
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationLearning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives
Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri
More informationDiscriminative Training for Automatic Speech Recognition
Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationInternational Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)
Performance Analysis of OFDM under DWT, DCT based Image Processing Anshul Soni soni.anshulec14@gmail.com Ashok Chandra Tiwari Abstract In this paper, the performance of conventional discrete cosine transform
More informationMultimedia Signal Processing: Theory and Applications in Speech, Music and Communications
Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal
More information