KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Similar documents
Atmospheric Signal Processing. using Wavelets and HHT

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes

Empirical Mode Decomposition: Theory & Applications

Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction

Empirical Mode Decomposition (EMD) of Turner Valley Airborne Gravity Data in the Foothills of Alberta, Canada

Assessment of Power Quality Events by Empirical Mode Decomposition based Neural Network

Application of Hilbert-Huang Transform in the Field of Power Quality Events Analysis Manish Kumar Saini 1 and Komal Dhamija 2 1,2

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Empirical Mode Decomposition (EMD) of Turner Valley Airborne Gravity Data in the Foothills of Alberta, Canada*

Hilbert-Huang Transform, its features and application to the audio signal Ing.Michal Verner

NOVEL APPROACH FOR FINDING PITCH MARKERS IN SPEECH SIGNAL USING ENSEMBLE EMPIRICAL MODE DECOMPOSITION

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Guan, L, Gu, F, Shao, Y, Fazenda, BM and Ball, A

AdaBoost based EMD as a De-Noising Technique in Time Delay Estimation Application

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Method for Mode Mixing Separation in Empirical Mode Decomposition

Pattern Recognition Part 2: Noise Suppression

INDUCTION MOTOR MULTI-FAULT ANALYSIS BASED ON INTRINSIC MODE FUNCTIONS IN HILBERT-HUANG TRANSFORM

Speech Synthesis using Mel-Cepstral Coefficient Feature

By Shilpa R & Dr. P S Puttaswamy Vidya Vardhaka College of Engineering, India

Noise Reduction in Cochlear Implant using Empirical Mode Decomposition

Random and coherent noise attenuation by empirical mode decomposition Maïza Bekara, PGS, and Mirko van der Baan, University of Leeds

Research Article Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

Distinction Between EMD & EEMD Algorithm for Pitch Detection in Speech Processing

Investigation on Fault Detection for Split Torque Gearbox Using Acoustic Emission and Vibration Signals

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Telemetry Vibration Signal Trend Extraction Based on Multi-scale Least Square Algorithm Feng GUO

The Improved Algorithm of the EMD Decomposition Based on Cubic Spline Interpolation

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

ICA & Wavelet as a Method for Speech Signal Denoising

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Gearbox fault detection using a new denoising method based on ensemble empirical mode decomposition and FFT

Tribology in Industry. Bearing Health Monitoring

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Applications of Music Processing

Study of Phase Relationships in ECoG Signals Using Hilbert-Huang Transforms

Vibration-based Fault Detection of Wind Turbine Gearbox using Empirical Mode Decomposition Method

DETECTION AND CLASSIFICATION OF POWER QUALITY DISTURBANCES

Feature Extraction of ECG Signal Using HHT Algorithm

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Frequency Domain Analysis for Assessing Fluid Responsiveness by Using Instantaneous Pulse Rate Variability

SUMMARY THEORY. VMD vs. EMD

Frequency Demodulation Analysis of Mine Reducer Vibration Signal

Baseline wander Removal in ECG using an efficient method of EMD in combination with wavelet

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Development of a New Signal Processing Diagnostic Tool for Vibration Signals Acquired in Transient Conditions

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

An Improved Voice Activity Detection Based on Deep Belief Networks

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

ANALYSIS OF POWER SYSTEM LOW FREQUENCY OSCILLATION WITH EMPIRICAL MODE DECOMPOSITION

The Application of the Hilbert-Huang Transform in Through-wall Life Detection with UWB Impulse Radar

Enhanced Waveform Interpolative Coding at 4 kbps

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Voice Excited Lpc for Speech Compression by V/Uv Classification

Overview of Code Excited Linear Predictive Coder

Research on Analysis of Aircraft Echo Characteristics and Classification of Targets in Low-Resolution Radars Based on EEMD

NCCF ACF. cepstrum coef. error signal > samples

Voice Activity Detection

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Original Research Articles

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

The characteristic identification of disc brake squeal based on ensemble empirical mode decomposition

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Blind EMD-based Audio Watermarking using Quantization

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Audio Fingerprinting using Fractional Fourier Transform

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Monophony/Polyphony Classification System using Fourier of Fourier Transform

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

A Comparative Study of Formant Frequencies Estimation Techniques

A Novel Approach to Improve the Smoothening the Wind Profiler Doppler Spectra Using Empirical Mode Decomposition with Moving Average Method

Speech Synthesis; Pitch Detection and Vocoders

Hilbert-Huang Transform and Its Applications in Engineering and Biomedical Signal Analysis

MULTI-FAULT ANALYSIS IN INDUCTION MOTORS USING MULTI-SENSOR FEATURES

Audio Signal Compression using DCT and LPC Techniques

Seismic application of quality factor estimation using the peak frequency method and sparse time-frequency transforms

IDENTIFICATION OF NONLINEAR SITE RESPONSE FROM TIME VARIATIONS OF THE PREDOMINANT FREQUENCY

Adaptive Fourier Decomposition Approach to ECG Denoising. Ze Wang. Bachelor of Science in Electrical and Electronics Engineering

Roberto Togneri (Signal Processing and Recognition Lab)

Introduction of Audio and Music

Fault Detection Using Hilbert Huang Transform

240 JVE INTERNATIONAL LTD. JOURNAL OF VIBROENGINEERING. FEB 2018, VOL. 20, ISSUE 1. ISSN

SGN Audio and Speech Processing

Impact of Time Varying Angular Frequency on the Separation of Instantaneous Power Components in Stand-alone Power Systems

A Novel Method of Bolt Detection Based on Variational Modal Decomposition 1

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Isolated Digit Recognition Using MFCC AND DTW

1032. A new transient field balancing method of a rotor system based on empirical mode decomposition

Timbral Distortion in Inverse FFT Synthesis

2151. Fault identification and severity assessment of rolling element bearings based on EMD and fast kurtogram

Research Article Study on the Noise Reduction of Vehicle Exhaust NO X Spectra Based on Adaptive EEMD Algorithm

A Review of SSVEP Decompostion using EMD for Steering Control of a Car

Speech Recognition using FIR Wiener Filter

ASSESSMENT OF POWER QUALITY EVENTS BY HILBERT TRANSFORM BASED NEURAL NETWORK. Shyama Sundar Padhi

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

EE482: Digital Signal Processing Applications

Transcription:

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW, Mysuru, India 1,2,3 Professor & Head, Department of Telecommunication Engineering, GSSSIETW, Mysuru, India 4 ABSTRACT The conventional way of expressing the information is speech. The technological applications of digital audio signal processing are audio data compression, synthesis of audio effects and audio classification which requires speech signal processing. The objective of human speech is not merely to transfer words from one person to another, but rather to communicate idea. This paper deals with the processing of Konkani speech signals using open source software called Octave. After feature extraction feature matching is performed for word recognition. A complete word is decomposed into Intrinsic Mode Functions (IMFs) using Ensemble Empirical Mode Decomposition (EEMD) and the extracted first Instantaneous Amplitude () using Hilbert- Huang Transform represents the presence of speech signal. The statistical parameters like standard deviation, mean, etc are extracted for classification between Hindi and Konkani speech signal. The online real-time, obtainable data are also tested by the presented approach. Keywords: Ensemble Empirical Mode Decomposition, Hilbert- Huang Transform, Octave. I. INTRODUCTION There are as many as 880 languages spoken across India. 31 languages have been adopted by different states and union territories giving them the status of official languages. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and is spoken along the South western coast of India. Speech is one of the ancient ways to express ourselves. Today these speech signals are also used in biometric recognition technologies and communicating with machine. The fundamental difficulty of speech recognition is that the speech signal is highly variable due to different speakers, speaking rates, contents and acoustic conditions. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and is spoken along the South western coast of India. It is one of the 22 scheduled languages mentioned in the 8th schedule of the Indian Constitution and the official language of the Indian state of Goa. The first Konkani inscription is dated 1187 A.D. It is a minority language in Karnataka, Maharashtra and Kerala, Dadra and Nagar Haveli, and Daman and Diu. Linear predictive analysis (LPC), Power spectral analysis (FFT), Relative spectra filtering of log domain coefficients (RASTA), Mel-frequency cepstral coefficients are the few feature extraction techniques widely used. Theoretically, it should be possible to recognize speech directly from the digitized waveform. However, because of the large variability of the speech signal, it is better to perform some feature extraction that would reduce that variability. Particularly, eliminating various source of information, such as whether the sound is voiced or unvoiced and, if voiced, it eliminates the effect of the periodicity or pitch, amplitude of excitation signal and fundamental frequency etc.linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. II. Literature Survey a. Statistical Decision Approach Atal and Rabiner proposed a statistical decision approach to voiced-unvoiced-silence classification in which a set of measured features were combined using a non-euclidean distance-metric to give a reliable decision. This method was optimized for telephone line inputs by Rabiner et al,. Their results showed that reliable discrimination between voiced 403 P a g e

and non-voiced speech could be obtained over telephone lines using the statistical approach; however, the overall error rate for the three-class decision was fairly high (11.7percent) over telephone lines. Based on the results, it was felt that an alternative approach was required to lower the error rate for telephone line inputs. The problem with combining a set of features is that they can only partially represent the information present in the signal. To obtain a complete representation of the signal properties requires a classification procedure based on the signal waveform, or its spectrum [1]. b. Digital Wiener Filter Approach A novel approach was suggested by McAuley in which a matched digital Wiener filter was designed for each of the signal classes and the signal was processed by each of these filters. Based on the signal output from each of the filters, a distance was computed representing how closely the input signal was matched to the filter, and the minimum distance was used to make the final classification. Although this approach shows promise, it requires a large amount of signal processing and has not as yet been extensively tested [1]. c. Pattern Recognition Approach In this approach, the speech patterns are used directly without explicit feature determination and segmentation. The method, training of speech patterns, and recognition of patterns by way of pattern comparison. In the parameter measurement phase, a sequence of measurements is made on the input signal to define the test pattern. The unknown test pattern is then compared with each sound reference pattern and a measure of similarity between the test pattern and reference pattern is computed [1]. d. Proposed Approach Firstly, the application of Empirical Mode Decomposition algorithm to analyse the power quality disturbances is presented. The EMD algorithm decomposes the uni-variate signals. Signal decomposition is a process of breaking down of given signal into its fundamental components. The representation of any signal in its fundamental form plays a vital role in many applications like de-noising, compression, in separating the mixtures of many dependent signals, etc. The basic part of the Hilbert-Huang transform is Empirical Mode Decomposition. Empirical Mode Decomposition is a promising signal processing method to analyse the unstable signals like power quality disturbances. Norden E Huang proposed this method. The main aim of using EMD is to decompose the input, non-stationary signals into its mono components and the resulted monocomponent functions using this algorithm are called as Intrinsic Mode Function (IMF). Mono components can be stated as the functions for which the non-negative instantaneous frequencies are determined. This algorithm adaptively breaks down the non-stationary PQ disturbances into Intrinsic Mode Function (IMF) and residue which represents the frequency and amplitude modulation based on the type of time series being tested [2-7]. The process of breaking the input non-linear, non-stationary signals into its mono component includes various steps. The EMD algorithm stages include, Step 1: Consider an univariate signal, y(t). Step 2: Locate the local maximum and the local minimum peaks of the input signal. Step 3: Generate the upper and lower envelopes using cubic spline interpolation. Step 4: Find the mean, m(t) of the envelopes. m(t) = (Upper Envelope + Lower Envelope/2)...(1) Step 5: Find the difference between signal and mean. d(t) = y(t) - m(t)...(2) Step 6: If the difference between the signal and mean, d(t) is called as Intrinsic Mode Function C 1(t), if the difference between the extrema and zero-crossing is one or equal and if it is a zero mean process. Step 7: Calculate the residue, r(t) = r(t) - C 1(t)...(3) Step 8: If the residue, r(t) is a monotonic function then stop the process, otherwise replace the input variable y(t) by r(t) and go to step 2 to extract the IMF, residue [3-7]. The above process is repeated till the residue is obtained as a monotonic function.. 404 P a g e

(a) (b) Figure.1 (a) An example of a univariate input signal, y(t), (b) Locating the local maxima, minima. (a) (d) Figure 2. (c) Locating the upper and lower envelopes. (d) Mean of the envelopes. Figure.3. EEMD Flowchart. The major drawback that is associated with the EMD algorithm is mode mixing and envelope end effects. The presence of different modes in expected modes may distort the envelopes and can cause errors. The impact of lack of points before the first point of the signal and lack of points after the end point of the signal may also create spreading of envelopes. These two limitations of EMD will result in erroneous feature extraction. Therefore, to avoid these limitations a noise assisted algorithm based on the statistical features of the white noise is defined [3-7]. The process of sifting is carried out on input and white noise signal. The EEMD algorithm consists of the steps shown in Figure. 3. 405 P a g e

III. EXPERIMENTAL RESULTS Figures shows the results of feature recognition of Konkani speech eka, dhoni, theeni. Figure (a) is the Input speech signal, Figure (b) is the Instantaneous amplitude () peaks, Figure (c) is the Sum of IMFs, Figure (d) is the IMFs of the speech. Table I shows the statistical parameters obtained from the input signal using proposed approach. (a) (b) (c) Figure.4. (a) Input speech signal, (b) first, (c) Sum of IMFs. Figure.5. Decomposed IMFs. 406 P a g e

Table I. Statistical Parameters Extracted. Sl.No. Parameter Values Obtained 1. Correlation 0.34399 2. Maximum Value of IF 0.49758 3. Minimum Value of IF 0 4. Standard Deviation of IF 0.080651 5. Mean of IF 0.23436 6. Singular value of IF 105.56 7. Maximum value of 8. Minimum Value of 9. Standard deviation of 0.11810 1.6240e-06 0.0042224 10. Mean of 0.0021410 11. Singular value of 2.0162 12. IMF sum 3.5209 13. Sum of Input signals 3.5209 IV. CONCLUSION This processing of Konkani speech signals using open source software called Octave is proposed. The statistical features extraction is performed for word recognition. A word is decomposed into Intrinsic Mode Functions (IMFs) using Ensemble Empirical Mode Decomposition (EEMD) and the extracted first Instantaneous Amplitude () using Hilbert-Huang Transform represents the presence of speech signal and hence the silence in the speech can be identified. The statistical parameters like standard deviation, mean, etc are extracted for classification between Hindi and Konkani speech signal. The algorithms are tested using open source software, Octave. REFERENCES [1] Lawrence R. Rabiner, and Marvin R. Sambur, Application of an LPC Distance Measure to the Voiced-Unvoiced Silence Detection Problem, IEEE Transactions On Acoustics, Speech, And Signal Processing, Volume. ASSP-25, NO. 4, August 1977. [2] N.E. Huang, Z. Shen, S.R. Long, M.L. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung and H.H. Liu, The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis, Proc. Roy. Soc.London, Volume. 454, pp. 903 995, 1998. [3] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Analysis of Power Quality Disturbances using Empirical Mode Decomposition and SVM Classifier, International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE),Volume 4, Issue 5, May 2015. [4] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Three-Phase Analysis of Power Quality Disturbances and Classification by SVM, International Journal of Computer Applications (0975 8887), (NCESCO-2015), 2015. [5] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Power Quality Disturbances Monitoring by Hilbert-Huang Transform with SVM Classifier, International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, 2015, pp. 6-10. [6] Shilpa R, S. S. Prabhu and P. S. Puttaswamy, "Power quality disturbances monitoring using Hilbert-Huang transform and SVM classifier, International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, 2015, pp. 6-10. [7] Shruthi S Prabhu, Nayana C G, Dr. Parameshachari B D, Application of Adaptive Filter - Hilbert Transform to Detect FECG, International Journal of VLSI Design, Microelectronics and Embedded System, Volume 1, Issue 2. 407 P a g e