KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW, Mysuru, India 1,2,3 Professor & Head, Department of Telecommunication Engineering, GSSSIETW, Mysuru, India 4 ABSTRACT The conventional way of expressing the information is speech. The technological applications of digital audio signal processing are audio data compression, synthesis of audio effects and audio classification which requires speech signal processing. The objective of human speech is not merely to transfer words from one person to another, but rather to communicate idea. This paper deals with the processing of Konkani speech signals using open source software called Octave. After feature extraction feature matching is performed for word recognition. A complete word is decomposed into Intrinsic Mode Functions (IMFs) using Ensemble Empirical Mode Decomposition (EEMD) and the extracted first Instantaneous Amplitude () using Hilbert- Huang Transform represents the presence of speech signal. The statistical parameters like standard deviation, mean, etc are extracted for classification between Hindi and Konkani speech signal. The online real-time, obtainable data are also tested by the presented approach. Keywords: Ensemble Empirical Mode Decomposition, Hilbert- Huang Transform, Octave. I. INTRODUCTION There are as many as 880 languages spoken across India. 31 languages have been adopted by different states and union territories giving them the status of official languages. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and is spoken along the South western coast of India. Speech is one of the ancient ways to express ourselves. Today these speech signals are also used in biometric recognition technologies and communicating with machine. The fundamental difficulty of speech recognition is that the speech signal is highly variable due to different speakers, speaking rates, contents and acoustic conditions. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and is spoken along the South western coast of India. It is one of the 22 scheduled languages mentioned in the 8th schedule of the Indian Constitution and the official language of the Indian state of Goa. The first Konkani inscription is dated 1187 A.D. It is a minority language in Karnataka, Maharashtra and Kerala, Dadra and Nagar Haveli, and Daman and Diu. Linear predictive analysis (LPC), Power spectral analysis (FFT), Relative spectra filtering of log domain coefficients (RASTA), Mel-frequency cepstral coefficients are the few feature extraction techniques widely used. Theoretically, it should be possible to recognize speech directly from the digitized waveform. However, because of the large variability of the speech signal, it is better to perform some feature extraction that would reduce that variability. Particularly, eliminating various source of information, such as whether the sound is voiced or unvoiced and, if voiced, it eliminates the effect of the periodicity or pitch, amplitude of excitation signal and fundamental frequency etc.linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. II. Literature Survey a. Statistical Decision Approach Atal and Rabiner proposed a statistical decision approach to voiced-unvoiced-silence classification in which a set of measured features were combined using a non-euclidean distance-metric to give a reliable decision. This method was optimized for telephone line inputs by Rabiner et al,. Their results showed that reliable discrimination between voiced 403 P a g e

and non-voiced speech could be obtained over telephone lines using the statistical approach; however, the overall error rate for the three-class decision was fairly high (11.7percent) over telephone lines. Based on the results, it was felt that an alternative approach was required to lower the error rate for telephone line inputs. The problem with combining a set of features is that they can only partially represent the information present in the signal. To obtain a complete representation of the signal properties requires a classification procedure based on the signal waveform, or its spectrum [1]. b. Digital Wiener Filter Approach A novel approach was suggested by McAuley in which a matched digital Wiener filter was designed for each of the signal classes and the signal was processed by each of these filters. Based on the signal output from each of the filters, a distance was computed representing how closely the input signal was matched to the filter, and the minimum distance was used to make the final classification. Although this approach shows promise, it requires a large amount of signal processing and has not as yet been extensively tested [1]. c. Pattern Recognition Approach In this approach, the speech patterns are used directly without explicit feature determination and segmentation. The method, training of speech patterns, and recognition of patterns by way of pattern comparison. In the parameter measurement phase, a sequence of measurements is made on the input signal to define the test pattern. The unknown test pattern is then compared with each sound reference pattern and a measure of similarity between the test pattern and reference pattern is computed [1]. d. Proposed Approach Firstly, the application of Empirical Mode Decomposition algorithm to analyse the power quality disturbances is presented. The EMD algorithm decomposes the uni-variate signals. Signal decomposition is a process of breaking down of given signal into its fundamental components. The representation of any signal in its fundamental form plays a vital role in many applications like de-noising, compression, in separating the mixtures of many dependent signals, etc. The basic part of the Hilbert-Huang transform is Empirical Mode Decomposition. Empirical Mode Decomposition is a promising signal processing method to analyse the unstable signals like power quality disturbances. Norden E Huang proposed this method. The main aim of using EMD is to decompose the input, non-stationary signals into its mono components and the resulted monocomponent functions using this algorithm are called as Intrinsic Mode Function (IMF). Mono components can be stated as the functions for which the non-negative instantaneous frequencies are determined. This algorithm adaptively breaks down the non-stationary PQ disturbances into Intrinsic Mode Function (IMF) and residue which represents the frequency and amplitude modulation based on the type of time series being tested [2-7]. The process of breaking the input non-linear, non-stationary signals into its mono component includes various steps. The EMD algorithm stages include, Step 1: Consider an univariate signal, y(t). Step 2: Locate the local maximum and the local minimum peaks of the input signal. Step 3: Generate the upper and lower envelopes using cubic spline interpolation. Step 4: Find the mean, m(t) of the envelopes. m(t) = (Upper Envelope + Lower Envelope/2)...(1) Step 5: Find the difference between signal and mean. d(t) = y(t) - m(t)...(2) Step 6: If the difference between the signal and mean, d(t) is called as Intrinsic Mode Function C 1(t), if the difference between the extrema and zero-crossing is one or equal and if it is a zero mean process. Step 7: Calculate the residue, r(t) = r(t) - C 1(t)...(3) Step 8: If the residue, r(t) is a monotonic function then stop the process, otherwise replace the input variable y(t) by r(t) and go to step 2 to extract the IMF, residue [3-7]. The above process is repeated till the residue is obtained as a monotonic function.. 404 P a g e

(a) (b) Figure.1 (a) An example of a univariate input signal, y(t), (b) Locating the local maxima, minima. (a) (d) Figure 2. (c) Locating the upper and lower envelopes. (d) Mean of the envelopes. Figure.3. EEMD Flowchart. The major drawback that is associated with the EMD algorithm is mode mixing and envelope end effects. The presence of different modes in expected modes may distort the envelopes and can cause errors. The impact of lack of points before the first point of the signal and lack of points after the end point of the signal may also create spreading of envelopes. These two limitations of EMD will result in erroneous feature extraction. Therefore, to avoid these limitations a noise assisted algorithm based on the statistical features of the white noise is defined [3-7]. The process of sifting is carried out on input and white noise signal. The EEMD algorithm consists of the steps shown in Figure. 3. 405 P a g e

III. EXPERIMENTAL RESULTS Figures shows the results of feature recognition of Konkani speech eka, dhoni, theeni. Figure (a) is the Input speech signal, Figure (b) is the Instantaneous amplitude () peaks, Figure (c) is the Sum of IMFs, Figure (d) is the IMFs of the speech. Table I shows the statistical parameters obtained from the input signal using proposed approach. (a) (b) (c) Figure.4. (a) Input speech signal, (b) first, (c) Sum of IMFs. Figure.5. Decomposed IMFs. 406 P a g e

Table I. Statistical Parameters Extracted. Sl.No. Parameter Values Obtained 1. Correlation 0.34399 2. Maximum Value of IF 0.49758 3. Minimum Value of IF 0 4. Standard Deviation of IF 0.080651 5. Mean of IF 0.23436 6. Singular value of IF 105.56 7. Maximum value of 8. Minimum Value of 9. Standard deviation of 0.11810 1.6240e-06 0.0042224 10. Mean of 0.0021410 11. Singular value of 2.0162 12. IMF sum 3.5209 13. Sum of Input signals 3.5209 IV. CONCLUSION This processing of Konkani speech signals using open source software called Octave is proposed. The statistical features extraction is performed for word recognition. A word is decomposed into Intrinsic Mode Functions (IMFs) using Ensemble Empirical Mode Decomposition (EEMD) and the extracted first Instantaneous Amplitude () using Hilbert-Huang Transform represents the presence of speech signal and hence the silence in the speech can be identified. The statistical parameters like standard deviation, mean, etc are extracted for classification between Hindi and Konkani speech signal. The algorithms are tested using open source software, Octave. REFERENCES [1] Lawrence R. Rabiner, and Marvin R. Sambur, Application of an LPC Distance Measure to the Voiced-Unvoiced Silence Detection Problem, IEEE Transactions On Acoustics, Speech, And Signal Processing, Volume. ASSP-25, NO. 4, August 1977. [2] N.E. Huang, Z. Shen, S.R. Long, M.L. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung and H.H. Liu, The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis, Proc. Roy. Soc.London, Volume. 454, pp. 903 995, 1998. [3] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Analysis of Power Quality Disturbances using Empirical Mode Decomposition and SVM Classifier, International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE),Volume 4, Issue 5, May 2015. [4] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Three-Phase Analysis of Power Quality Disturbances and Classification by SVM, International Journal of Computer Applications (0975 8887), (NCESCO-2015), 2015. [5] Shilpa R, Shruthi S Prabhu, Dr P S Puttaswamy, Power Quality Disturbances Monitoring by Hilbert-Huang Transform with SVM Classifier, International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, 2015, pp. 6-10. [6] Shilpa R, S. S. Prabhu and P. S. Puttaswamy, "Power quality disturbances monitoring using Hilbert-Huang transform and SVM classifier, International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, 2015, pp. 6-10. [7] Shruthi S Prabhu, Nayana C G, Dr. Parameshachari B D, Application of Adaptive Filter - Hilbert Transform to Detect FECG, International Journal of VLSI Design, Microelectronics and Embedded System, Volume 1, Issue 2. 407 P a g e