Mel Spectrum Analysis of Speech Recognition using Single Microphone
|
|
- Sheryl Hines
- 6 years ago
- Views:
Transcription
1 International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree Sastha Institute of Engineering and Technology, Chennai. [2] Assistant Professor, Dept of Electronics and Communication Engineering. SSIET, Chennai. [1] salakshmi.ece@gmail.com, [2] vendanchola@gmail.com Abstract - Digital processing of speech signal is very important for high speed and precise automatic speech recognition technology. Speech recognition is the process in which certain words of a particular speaker will automatically recognized that are based on the information included in individual speech waves. Nowadays it is being used for health care, telephony military and people with disabilities therefore the digital signal processes such as feature extraction and feature matching are used for speech recognition. Inorder to extract the valuable information from speech signal, the data needs to be manipulated and analyzed. Basic method used for extracting the features of the speech signal is to find the Mel Frequency Cepstral Coefficients () and Mel Energy Dynamic Coefficients ().In this project, speech is recognised using single microphone based on the Mel spectrum analysis. This technique basically exploits the certain features of speech. and are used for extracting the features of speech and Dynamic Time Warping (DTW) is used for feature matching. The experimental results were analysed with the help of MATLAB and the results are simulated. Keywords: Mel frequency Cepstral Coefficients (), Mel Energy Dynamic Coefficients (), Dynamic Time Warping (DTW). I. INTRODUCTION Speech signal processing is one of the biggest developing areas of research in signal processing. The aim of digital speech signal processing is to take advantage of digital computing techniques to process the speech signal for increased understanding, improved communication and increased efficiency and productivity associated with speech activities. The major fields of speech signal processing consists of speech synthesis, speech recognition and speech/speaker transformation. Inorder to extract features from speech signal it has to be divided into fixed frames called windows. extraction approaches mostly depend upon modelling human voice production and modelling perception system. They provide a compressed representation of the speech signal by extracting a sequence of feature vectors. Speech recognition systems have a wide range of applications from the relatively simple isolated-word recognition systems for name-dialling, automated customer service and voice-control of cars and machines to continuous speech recognition as in auto-dictation or broadcast-news transcription. Potamitis described the robust estimation of speech presence probability of every spectral component of a speech signal impinging on a linear microphone array [9]. The performance of several objective measures was evaluated in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. [4]. Speech recognition is the process in which certain words of a particular speaker will automatically recognized that are based on the information included in individual speech waves. The proposed technique offers two phases namely training phase and testing phase. This project describes an approach of speech recognition by using and. Several features are extracted from speech signal of spoken words. and are extracted from speech signal of spoken words. DTW is used to cope with different speaking speeds in speech recognition. DTW is an algorithm, which is used for measuring similarity between two sequences, which may vary in time or speed. The experimental analysis is done with MATLAB and results are analysed. II. MEL SPECTRUM ANALYSIS Speech recognition system described in this paper comprises two phases namely training and testing. In training phase, the speech signal is acquired and presence and absence of speech signal is detected using Voice Activity Detector (VAD). The All Rights Reserved 2015 IJERECE 15
2 and features are then extracted and the data are stored as shown in Fig.1. Signal acquisition Fig. 1 Training Phase In testing phase, when a MATLAB program named speech test is executed it postulates to the user to choose any speech sample from the test group that are pre-recorded in the database. at the back end extracts the features of the chosen speech sample.here, the stored features are classified using DTW. DTW is an algorithm used for feature matching. The status of speech recognition is described. The block diagram for testing phase is shown in Fig. 2. Signal acquisition VAD VAD Classifier set Fig. 2 Testing Phase Store Store Recognition Status The s are frequently used as a speech parameterization in speech recognizers. Practical applications of speech recognition and dialogue systems bring sometimes a requirement to synthesize or reconstruct the speech from the saved or transmitted s. The Mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another. The name Mel comes from the word melody to indicate that the scale is based on pitch comparisons. Thus for each tone with an actual frequency measured in Hz, a subjective pitch is measured on a scale called the Mel scale. is based on human hearing perceptions which cannot perceive frequencies over 1 KHz. In other words, in is based on known variation of the human ear s critical bandwidth with frequency. has two types of filter which are spaced linearly at low frequency below 1000 Hz and logarithmic spacing above 1000 Hz. A subjective pitch is present on Mel frequency scale to capture important characteristic of phonetic in speech. The VAD refers to a class of signal processing methods that detects if short segments of a speech signal contain voiced or unvoiced signal data. A VAD is normally using decision rules based on selected estimated signal features. VADs play a major role as a pre-processing block in a variety of speech processing applications such as speech enhancement, speech coding and speech recognition where it is desirable to classify voiced signal parts from unvoiced. The classifier uses the spectral feature set (feature vector) of the input (unknown) word and attempts to find the best match from the library of known words.dtw technique is used for feature matching. DTW is an efficient method to solve the time alignment problem. DTW algorithm aims at aligning two sequences of feature vectors by warping the time axis repetitively until an optimal match between the two sequences is found. This algorithm performs a piece wise linear mapping of the time axis to align both the signals. DTW first locally matches the features of the selected sampled speech signal by measuring the local distance. DTW then measures the global distance and the part that matches with the chosen sampled speech is the result of the test program that shows the correct spoken word in the command window. A. FEATURE EXTRACTION The extraction of the best parametric representation of acoustic signals is an important task to produce a better recognition performance. have been used for feature extraction which is mainly used for speech recognition system. The purpose for using is to enhance the effectiveness of in the field of speech processing. All Rights Reserved 2015 IJERECE 16
3 The Mel spectrum is used in recognition of information in conventional speech recognition system. The spectrum does not exploit the variations in fundamental resolution and hence is lower in accuracy to improve the accuracy of operation a spectral decomposition approach. consists of seven computational steps. Speech s Fig. 3 The feature extraction is shown in fig.3. Each step has its function and mathematical approaches as discussed briefly in the following: Pre-processing To enhance the accuracy and efficiency of the extraction processes, speech signals are normally pre-processed before features are extracted. The signal passed through a filter is processed which emphasizes higher frequencies. This process will increase the energy of signal at higher frequency. Framing Pre- Proces sing DCT The process of segmenting the speech samples obtained from Analog-to-Digital Conversion (ADC) into a small frame with the length within the range of 20 to 40 msec. The voice signal is divided into frames of N samples. Windowing Framing Log Windowing Mel Filter Ban FFT Hamming window is used as window shape by considering the next block in feature extraction processing chain and integrates all the closest frequency lines. The Hamming window equation is given as: If the window is defined as W (n), 0 n N-1 where W(n) = Hamming window N = number of samples in each frame Y(n) = Output signal X(n) = Input signal The result of windowing signal is shown below: Y n = X n. W n (1) W n = cos 2πn 0 n N 1 N 1 (2) Fast Fourier Transform To convert each frame of N samples from time domain into frequency domain. The Fourier transform is to convert the convolution of the glottal pulse and the vocal tract impulse response in the time domain. Mel Filter Bank Processing The frequencies range in FFT spectrum is very wide and voice signal does not follow the linear scale. Each filter s magnitude frequency response is triangular in shape and equal to unity at the centre frequency and decrease linearly to zero at centre frequency of two adjacent filters. Then, each filter output is the sum of its filtered spectral components. After that the equation (3) is used to compute the Mel for given frequency f in hertz. There two reason for using triangular filter bank are: Logarithm To smooth the magnitude spectrum and To reduce the size of the features involved The logarithm has the effect of changing multiplication into addition. It is a part of computation of Cepstrum. Therefore, this step simply converts the multiplication of the magnitude in the Fourier transform into addition. Discrete Cosine Transform This is the process to convert the log Mel spectrum into time domain using DCT. The result of the conversion is called. The set of coefficient is called acoustic vectors. Therefore, each input utterance is transformed into a sequence of acoustic vector. DCT is applied on the log energy obtained from the triangular filter banks to get Mel scale cepstral coefficients. B. FEATURE EXTRACTION Another feature is also extracted. It is extracted as follows: the magnitude spectrum of each speech utterance is estimated using FFT, then input to a bank of 12 filters equally spaced on the All Rights Reserved 2015 IJERECE 17
4 Mel frequency scale. The logarithm mean energies of the filter outputs are calculated En(i), i= 1...N. Then, the first and second differences of En (i) are calculated. The feature extraction process contains following steps shown in fig. 5. Speech Pre- Processing Framing Windowing FFT Fig. 6 Test speech signal VAD detects the voiced or unvoiced signal data in the short segment of speech signal. VAD is based on zero crossing rate. Zero crossing rate can be defined as the number of times the successive samples in a speech signal have different algebraic signs or the amplitude of signal crosses the value of zero. The voice activity detected speech signal is shown in fig.7. s Compute 1 st and 2 nd Energies Mean Log Energie s Mel Filter Bank Fig.5 Pre-processing, framing, windowing, FFT and Mel filter bank and frequency wrapping processes of feature extraction are same as feature extraction. Take logarithmic mean of energies: In this process a mean log of every filter energies is calculated. This mean value represent energy of individual filter in a filter bank. Fig.7 VAD Speech Signal Compute 1 st and 2 nd difference: The final Mel energy spectrum dynamics coefficients are then obtained by combining the first and second differences of filter energies. I. EXPERIMENTAL RESULTS A. FEATURE EXTRACTION Fig. 8 Framed Speech Signal.Speech features such as and are extracted from speech. Speech signal is converted to.wav format and simulated using MATLAB. The test speech signal is shown in fig. 6. Fig. 9 FFT Log Mel spectrum is converted back to time. DCT can be used for calculating Coefficients from All Rights Reserved 2015 IJERECE 18
5 the given log Mel spectrum as they divide a given sequence of finite length data into discrete vector. The result is called the. The DCT is done for transforming the Mel coefficients back to time domain. The DCT applied signal is shown in fig. 11. A mean log of every filter energies is calculated. This mean value represent energy of individual filter in a filter bank. After taking logarithm, mean log energies are evaluated as shown in fig. 13. Fig. 10 Logarithm DCT is used to orthogonalise the filter energy vectors. The information is compacted into the first number of components and shortens the vector to number of components. Fig.13 Mean log energies Fig. 11 DCT B. FEATURE EXTRACTION The speech signal is divided into sequence of frames, where each frame can be analyzed independently and represented by a single feature vector. Framed speech signal for feature extraction is shown in fig.12. Fig.14 - First mean difference Fig. 15 Second Mean difference Fig Framed Speech Signal The final Mel energy spectrum dynamics coefficients are then obtained by combining the first and second differences of filter CONCLUSION All Rights Reserved 2015 IJERECE 19
6 The main aim of this project was to recognize speech using Mel spectrum analysis. The feature extraction was done using and. The feature matching was done with the help of DTW technique. The extracted features were stored in a.mat file using algorithm. A distortion measure based on minimizing the Euclidean distance was used when matching the unknown speech signal with the speech signal database. The experimental results were analysed with the help of MATLAB and it is proved that the results are efficient. This process can be extended for n number of speakers. The project shows that the Mel spectrum is the best nonlinear feature extracting technique in speech recognition, with minimal error rates and fast computing speed. This project may be extended by hardware implementation using digital signal processor. REFERENCES [1] Chang J. H., Jo Q. H., Kim D. K. and Kim N. K (2009) Global soft decision employing support vector machine for speech enhancement, IEEE Signal Processing, Lett., Vol. 6, No. 1, pp [2] Chang J. H., Kim N. S. and Mitra.S. K (2006) Voice activity detection based on multiple statistical models, IEEE Trans. Signal Process., Vol. 54, No. 6, pp [3] Cohen I. and Berdugo B. (2002) Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal process. Lett., Vol. 9, No. 1, pp [4] Hu Y. and Loizou P. (2008) Evaluation of objective quality measures of speech enhancement, IEEE Trans.Audio, Speech and Language Processing, Vol. 16, No. 1, pp [5] Krishnamoorthy N. and Hansen J. H. L. (2009) Babble noise : Modeling, analysis and applications, IEEE Audio, Speech and Language Processing, Vol. 17, No. 7, pp [6] Kum J. M. and Chang J. H. (2009) Speech enhancement based on minima controlled recursive averaging incorporating secondorder conditional MAP criterion, IEEE Signal Processing, Lett., Vol. 16, No. 7, pp [7] Martin R. (2001) Noise Power Spectral Density Estimation based on Optimal Smoothing and Minimum Statistics, IEEE transactions on speech and audio process., Vol. 9, No. 5, pp [8] Pirinen T. W. and Visa A. (2006) Signal independent wideband activity detection features for microphone arrays, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp [9] Potamitis I. (2004) Estimation of speech presence probability in the field of microphone array, IEEE Signal Process. Lett., Vol. 11, No. 12, pp [10] Sohn J., Kim N. S. and Sung W. (1999) A Statistical model based voice activity detection, IEEE Signal Process. Lett., Vol. 6, No. 1, pp. 1. All Rights Reserved 2015 IJERECE 20
speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationVOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW
VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW ANJALI BALA * Kurukshetra University, Department of Instrumentation & Control Engineering., H.E.C* Jagadhri, Haryana, 135003, India sachdevaanjali26@gmail.com
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationMFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM
www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationGammatone Cepstral Coefficient for Speaker Identification
Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationMMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2
MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationSpeech Compression Using Voice Excited Linear Predictive Coding
Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationTopic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Topic Spectrogram Chromagram Cesptrogram Short time Fourier Transform Break signal into windows Calculate DFT of each window The Spectrogram spectrogram(y,1024,512,1024,fs,'yaxis'); A series of short term
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationA DEVICE FOR AUTOMATIC SPEECH RECOGNITION*
EVICE FOR UTOTIC SPEECH RECOGNITION* ats Blomberg and Kjell Elenius INTROUCTION In the following a device for automatic recognition of isolated words will be described. It was developed at The department
More informationSGN Audio and Speech Processing
Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations
More informationSpeech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech
Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu
More informationAn Improved Voice Activity Detection Based on Deep Belief Networks
e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationCHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS
46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech
More informationVoice Excited Lpc for Speech Compression by V/Uv Classification
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationSpeech Enhancement Based On Noise Reduction
Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion
More informationPerceptive Speech Filters for Speech Signal Noise Reduction
International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationAPPLICATIONS OF DSP OBJECTIVES
APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationEnhancement of Speech in Noisy Conditions
Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationFundamental frequency estimation of speech signals using MUSIC algorithm
Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More informationStudents: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions
More informationPerformance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System
Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)
More informationIntroducing COVAREP: A collaborative voice analysis repository for speech technologies
Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationPattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt
Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationSpeech Recognition using FIR Wiener Filter
Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of
More informationAUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES
AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES N. Sunil 1, K. Sahithya Reddy 2, U.N.D.L.mounika 3 1 ECE, Gurunanak Institute of Technology, (India) 2 ECE,
More informationFrequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement
Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation
More informationFPGA implementation of DWT for Audio Watermarking Application
FPGA implementation of DWT for Audio Watermarking Application Naveen.S.Hampannavar 1, Sajeevan Joseph 2, C.B.Bidhul 3, Arunachalam V 4 1, 2, 3 M.Tech VLSI Students, 4 Assistant Professor Selection Grade
More informationSGN Audio and Speech Processing
SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although
More informationSpeech Recognition on Robot Controller
Speech Recognition on Robot Controller Implemented on FPGA Phan Dinh Duy, Vu Duc Lung, Nguyen Quang Duy Trang, and Nguyen Cong Toan University of Information Technology, National University Ho Chi Minh
More informationUniversity of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015
University of Colorado at Boulder ECEN 4/5532 Lab 1 Lab report due on February 2, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationElectrical & Computer Engineering Technology
Electrical & Computer Engineering Technology EET 419C Digital Signal Processing Laboratory Experiments by Masood Ejaz Experiment # 1 Quantization of Analog Signals and Calculation of Quantized noise Objective:
More informationIdentification of disguised voices using feature extraction and classification
Identification of disguised voices using feature extraction and classification Lini T Lal, Avani Nath N.J, Dept. of Electronics and Communication, TKMIT, Kollam, Kerala, India linithyvila23@gmail.com,
More informationBasic Characteristics of Speech Signal Analysis
www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,
More informationDigital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10
Digital Signal Processing VO Embedded Systems Engineering Armin Wasicek WS 2009/10 Overview Signals and Systems Processing of Signals Display of Signals Digital Signal Processors Common Signal Processing
More informationChapter IV THEORY OF CELP CODING
Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationSPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction
SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements
More informationIMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM
IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationVocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA
Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau
More informationMel- frequency cepstral coefficients (MFCCs) and gammatone filter banks
SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationPROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.
PROBLEM SET 6 Issued: 2/32/19 Due: 3/1/19 Reading: During the past week we discussed change of discrete-time sampling rate, introducing the techniques of decimation and interpolation, which is covered
More informationCHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR
22 CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR 2.1 INTRODUCTION A CI is a device that can provide a sense of sound to people who are deaf or profoundly hearing-impaired. Filters
More informationSpeech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya
More informationIMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181
IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181 1 KALPANA JOSHI, 2 NILIMA KOLHARE & 3 V.M.PANDHARIPANDE 1&2 Dept.of Electronics and Telecommunication Engg, Government College of
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationDetermination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech
Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech L. Demri1, L. Falek2, H. Teffahi3, and A.Djeradi4 Speech Communication
More informationDilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor
A Novel Approach for Waveform Compression Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor CSE Department, Guru Nanak Dev Engineering College, Ludhiana Abstract Waveform Compression
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More information