Optimization of Speech Recognition using LPC Technic

Similar documents
Overview of Code Excited Linear Predictive Coder

EE482: Digital Signal Processing Applications

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Speech Synthesis using Mel-Cepstral Coefficient Feature

Voice Excited Lpc for Speech Compression by V/Uv Classification

Digital Speech Processing and Coding

Speech Compression Using Voice Excited Linear Predictive Coding

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

AN AUTOREGRESSIVE BASED LFM REVERBERATION SUPPRESSION FOR RADAR AND SONAR APPLICATIONS

Communications Theory and Engineering

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Adaptive Filters Linear Prediction

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Chapter IV THEORY OF CELP CODING

Audio Signal Compression using DCT and LPC Techniques

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Adaptive Filters Application of Linear Prediction

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Comparison of CELP speech coder with a wavelet method

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

ENEE408G Multimedia Signal Processing

APPLICATIONS OF DSP OBJECTIVES

Speech Enhancement using Wiener filtering

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Speech Synthesis; Pitch Detection and Vocoders

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Analysis/synthesis coding

Spectral analysis of seismic signals using Burg algorithm V. Ravi Teja 1, U. Rakesh 2, S. Koteswara Rao 3, V. Lakshmi Bharathi 4

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Study of Turbo Coded OFDM over Fading Channel

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

A Comparative Study of Formant Frequencies Estimation Techniques

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

A LPC-PEV Based VAD for Word Boundary Detection

Improving Sound Quality by Bandwidth Extension

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

ECE Digital Signal Processing

Synthesis of speech with a DSP

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

Level I Signal Modeling and Adaptive Spectral Analysis

Hamming net based Low Complexity Successive Cancellation Polar Decoder

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

RECENTLY, there has been an increasing interest in noisy

MITIGATING INTERFERENCE TO GPS OPERATION USING VARIABLE FORGETTING FACTOR BASED RECURSIVE LEAST SQUARES ESTIMATION

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Msc Engineering Physics (6th academic year) Royal Institute of Technology, Stockholm August December 2003

Speech Coding using Linear Prediction

NOISE ESTIMATION IN A SINGLE CHANNEL

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Low Bit Rate Speech Coding

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Enhanced Waveform Interpolative Coding at 4 kbps

Application of Frequency-Shift Filtering to the Removal of Adjacent Channel Interference in VLF Communications

The Channel Vocoder (analyzer):

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Chapter 4 SPEECH ENHANCEMENT

SGN Audio and Speech Processing

Synthesis Algorithms and Validation

MSK has three important properties. However, the PSD of the MSK only drops by 10log 10 9 = 9.54 db below its midband value at ft b = 0.

Voice Activity Detection

Chapter 2 Channel Equalization

An Approach to Very Low Bit Rate Speech Coding

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

Applications of Music Processing

Advanced audio analysis. Martin Gasser

COMPARISON OF CHANNEL ESTIMATION AND EQUALIZATION TECHNIQUES FOR OFDM SYSTEMS

Isolated Digit Recognition Using MFCC AND DTW

Aparna Tiwari, Vandana Thakre, Karuna Markam Deptt. Of ECE,M.I.T.S. Gwalior, M.P, India

Convolutional Coding Using Booth Algorithm For Application in Wireless Communication

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Advanced Signal Processing and Digital Noise Reduction

Adaptive Feedback Cancellation With Band-Limited LPC Vocoder in Digital Hearing Aids

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Signal Processing Toolbox

Wideband Speech Coding & Its Application

PERFORMANCE ANALYSIS OF MIMO-SPACE TIME BLOCK CODING WITH DIFFERENT MODULATION TECHNIQUES

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

GUJARAT TECHNOLOGICAL UNIVERSITY

Bandwidth Extension for Speech Enhancement

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

International Journal of Computer Trends and Technology (IJCTT) Volume 40 Number 2 - October2016

A Novel Spread Spectrum System using MC-DCSK

The Optimization of G.729 Speech codec and Implementation on the TMS320VC5402

Transcription:

IOSR Journal of Engineering (IOSRJEN) ISSN: 2250-3021 Volume 2, Issue 8 (August 2012), PP 09-13 Optimization of Speech Recognition using Technic Vipulsangram K Kadam 1, Dr.Ravindra C Thool 2 1 (Associate Professor & research student, Department of Electronics,P.E.S College of Engineering,Nagsenvan Aurangabad-431002 (MS).Dr.Babasaheb Ambedkar Marathwada University, Aurangab-431002) 2 (professor & Head, Department of Information Technology, SGGS Institute of Engineering &Technology, Nanded (MS) (An autonomous institute set up and 100% funded by Government of Maharashtra)) ABSTRACT: - Here you implement a speech compression technique known as Linear Prediction Coding () using DSP System Toolbox from Mat lab functionality available at the MATLAB command line. In this system we uses the Levinson-Durbin and Time-Varying Lattice Filter blocks for low-bandwidth transmission of speech using linear predictive coding. At present, VBEC automatic determination only considers the optimum topology for training data, and does not consider the effects of the decoding condition on the optimality (e.g., mixture and state pruning process and the search parameter dependence in decoding) for the optimum model topology. We need further study on the superiority from a practical point of view. In future work, I would like to extend the model determination taking the decoding process into consideration, for example, by reflecting the effect of the decoding strategy of the mixture or state pruning process in the objective function of our optimized tool for speech recognition. We have further focus on how to optimize the optimum area as shown in following fig1. considering the decoding process. This can be achive by Technic as shown below. [10] Keywords: DSP,,MATLAB,VBEC,FIR,TIMIT,SIMULINK I. INTRODUCTION A number of noisy speech enhancement algorithms are experimentally compared in terms of linear predictive coding () perturbations. The enhancement algorithms considered are simple spectral subtraction, spectral over subtraction with use of a spectral floor, spectral subtraction with residual noise removal, and timedomain and frequency-domain adaptive minimum mean-square-error filtering. perturbations considered are cepstral distance, log likelihood r Linear Predictive Coding () has been used to compress and encode speech signals for digital transmission at a low bit rate. determines a FIR system that predicts a speech sample from the past samples by minimizing the squared error between the actual occurrence and the estimated. The coefficients of the FIR system are encoded and sent. At the receiving end, the inverse system called AR model is excited by a random signal to reproduce the encoded speech. The use of can be extended to speech recognition since the FIR coefficients are the condensed information of a speech signal of typically 10ms -30ms. PARCOR parameter associated with that represents a vocal tract model based on a lattice filter structure is considered for speech recognition. The use of FIR coefficients and the frequency response of AR model were previously investigated. [1] In this we have taken the method to detect a limited number of phonemes from a continuous stream of speech. A system being developed slides a time window of 16 ms and calculates the PARCOR parameters continuously, feeding them to a classifier. A classifier is a supervised classifier that requires training. The classifier uses the Maximum Likelihood Decision Rule. The training uses TIMIT speech database, which contains the recordings of 20 speakers of 8 major dialects of American English. The classification results of some typical vowel and consonant phonemes segmented from the continuous speech are listed. The vowel and consonant correct classification rate are 65.22% and 93.51%. Overall, they indicate that the PARCOR parameters have the potential capability to characterize the phonemes.atios, and weighted likelihood ratio. [2] A communication system was built and tested to operate in the land mobile VHF band (150-174 MHz) at a channel separation of only 6 khz. The audio source was digitally encoded at 2.4 kbits/s using linear predictive coding (). The speech data stream was transmitted by frequency shift keying (FSK) which allowed the use of class-c transmitters and discriminator detection in the receiver. Baseband filtering of the NRZ data resulted in a narrow transmitter spectrum. The receiver had a 3 db bandwidth of 2.4 khz which allowed data transmission with minimal intersymbol interference and frequency offset degradation. A 58 percent eye opening was found. Bit error rate (BER) performance was measured with simulated Rayleigh fading at typical 150 MHz rates. Additional tests included capture, ignition noise susceptibility, adjacent channel protection, degradation from frequency offset, and bit error effects upon speech quality. A field test was conducted to compare the speech quality of the digital radio to that of a conventional 5 khz deviation FM mobile radio. [3] 9 P a g e

Optimization of Speech Recognition using Technic In this, we try to use some part of propose a speech-model based method using the linear predictive (LP) residual of the speech signal and the maximum-likelihood (ML) estimator proposed in Blind estimation of reverberation time, (R. Ratnam, J. Acoust. Soc. Amer., 2004) to blindly estimate the reverberation time (RT60). The input speech is passed through a low order linear predictive coding () filter to obtain the LP residual signal. It is proven that the unbiased autocorrelation function of this LP residual has the required properties to be used as an input to the ML estimator. It is shown that this method can successfully estimate the reverberation time with less data than existing blind methods. Experiments show that the proposed method can produce better estimates of RT60, even in highly reverberant rooms. This is because the entire input speech data is used in the estimation process. The proposed method is not sensitive to the type of input data (voiced, unvoiced), number of gaps, or window length. In addition, evaluation using white Gaussian noise and recorded babble noise shows that it can estimate RT60 in the presence of (moderate) background noise. [4] Fig.1 Optimum model search for an acoustic model. II. SYSTEM IMPLEMENTATION 2. Introduction In this paper, a comparative study between two speech coders have been reported, considering their performances in simulation and in real-time. The speech coders taken for study are Linear Predictive Coder () and Cepstral coder. The simulation models are made on SIMULINK and the real-time models can be implemented on TMS320C6713 DSK. For simulation, a comparison between synthesized speech signals using both the speech coders is given. For real-time implementation, some important parameters like memory consumption and execution time for these coders have been calculated. [5] In this system we implement analysis and synthesis ( coding) of a speech signal. This process consists of two steps; analysis and synthesis. In the analysis section, we extract the reflection coefficients from the signal and use it to compute the residual signal. In the synthesis section, we reconstruct the signal using the residual signal and reflection coefficients. The residual signal and reflection coefficients require less number of bits to code than the original speech signal. The block diagram below shows the system we will implement. In this simulation, the speech signal is divided into frames of size 320 samples, with an overlap of 160 samples. Each frame is windowed using a Hamming window. Twelfth-order autocorrelation coefficients are found, and then the reflection coefficients are calculated from the autocorrelation coefficients using the Levinson-Durbin algorithm. The original speech signal is passed through an analysis filter, which is an all-zero filter with coefficients as the reflection coefficients obtained above. The output of the filter is the residual signal. This residual signal is passed through a synthesis filter which is the inverse of the analysis filter. The output of the synthesis filter is the original optimized signal. [6], [7], [8], [9], [10] The Optimum Reflection Coefficients for the Lattice Forward and Backward Predictors in Section we derived the set of linear equations which provide the predictor coefficients that minimize the mean-square value of the prediction error. In this section we consider the problem of optimizing the reflection coefficients in the lattice predictor and expressing the reflection coefficients in terms of the forward and backward prediction errors. The forward prediction error in the lattice -filter is expressed as the reflection coefficient K, yields the result 10 P a g e

Optimization of Speech Recognition using Technic Km = E[fm 1 n gn m 1 n 1 ] E[lgm 1 n 1 l 2 (1) We observe that the optimum choice of the reflection coefficients in the lattice predictor is the negative of the (normalized) cross correlation coefficients between the forward and backward errors in the lattice.' Since it is apparent from (1 1.2.28) that K, ((1. it follows that the minimum mean-square value of the prediction error, which can be expressed recursively asis a monotonically decreasing sequence. [11] Here we initialize some of the variables like the frame size and also instantiate the System objects used in our processing. These objects also pre-compute any necessary variables or tables resulting in efficient processing calls later inside a loop. We create a buffer System object and set its properties such that we get an output of twice the length of the frame Size with an overlap length of frame Size. We also create a window System object. Here we will use the default window which is Hamming. By creating an autocorrelator System object and set its properties to compute the lags in the range [0:12] scaled by the length of input. We create a System object which computes the reflection coefficients from auto-correlation function using the Levinson- Durbin recursion. We configure it to output both polynomial coefficients and reflection coefficients. The polynomial coefficients are used to compute and plot the spectrum. By creating an FIR digital filter System object used for analysis. Also create two all-pole digital filter System objects used for synthesis and de-emphasis. 2.1 Stream Processing Loop Here we call our processing loop where we do the analysis and synthesis of the input audio signal using the System objects we have instantiated. The loop is stopped when we reach the end of the input file, which is detected by the AudioFileReader System object. Following fig shows the signal &. Residual aamer.wav Audio A: 22050 Hz, 16 bit, mono In Resid I ResidI O ResidO Resid Out From Multimedia File Analysis Bit Stream Quantization Synthesis To Audio Device Reflection Coeffs 1 Pad 1 u Fig.2 Block representation of system implementation using Simulink III. PERFORMANCE ANALYSIS determines the coefficients of a forward linear predictor by minimizing the prediction error in the least squares sense. It has applications in filter design and speech coding. [a,g] = lpc(x,p) finds the coefficients of a pth-order linear predictor (FIR filter) that predicts the current value of the real-valued time series x based on past samples.p is the order of the prediction filter polynomial, a = [1 a(2)... a(p+1)]. If p is unspecified, lpc uses as a default p = length(x)-1. If x is a matrix containing a separate signal in each column, lpc returns a model estimate for each column in the rows of matrix a and a column vector of prediction error variances g. The length of p must be less than or equal to the length of x.algorithms for lpc uses the autocorrelation method of autoregressive (AR) modeling to find the filter coefficients. The generated filter might not model the process exactly even if the data sequence is truly an AR process of the correct order. This is because the autocorrelation method implicitly windows the data, that is, it assumes that signal samples beyond the length of x are 0. [12] 11 P a g e

Optimization of Speech Recognition using Technic IV. FIGURES AND GRAPHS Fig.3 graph shows the the sinal & Signal Fig.4 of signal Fig.5 Reflection coefficients of signal Fig.6 Residual of signal 12 P a g e

Optimization of Speech Recognition using Technic V. CONCLUSION We have seen here the implementation of speech compression technique using Linear Prediction Coding. The implementation used the DSP System Toolbox functionality available at the MATLAB command line. The code involves only calling of the successive System objects with appropriate input arguments. This involves no error prone manual state tracking which may be the case for instance for a MATLAB implementation of Buffer. From the performance it is observed that the optimized speech recognition can be achive.propose work can be done by using the hardware TMS3206713 Acknowledgements I take this opportunity to acknowledge my Guide Dr.R.C Thool Professor & Head Department of Information Technology SGGS Institute of Engineering Technology Vishnupuri, Nanded 431606 (M.S) the valuable guidelines given to complete this paper by avalabling the resources like IEEE Digital library etc. also acknowledge my friends who directly indirectly help to complete this paper Kadam Vipulsangram K REFERENCES [1] Ahmed, M.S. Dept. of Syst. Eng., King Fahd Univ. of Pet. & Miner., Dhahran,Comparison of noisy speech enhancement algorithms in terms of perturbation, Acoustics, Speech and Signal Processing, IEEE Transactions on Date of Publication: Jan 1989,Volume: 37, Issue: 1 Page(s): 121-125 [2] Ying Cui; Takaya,Recognition of Phonemes In a Continuous Speech Stream By Means of PARCOR Parameter In Vocoder, K.Electrical and Computer Engineering, 2007. CCECE 2007. Canadian Conference on Digital Object Identifier:10.1109/CCECE.2007.402 Publication Year: 2007, Page(s): 1606 1609 [3] Speech McLaughlin, M.; Linder, D. Carney. S, Design and Test of a Spectrally Efficient Land Mobile Communications System Using, Selected Areas in Communications, and IEEE Journal on Volume: 2, Issue: 4 Digital Object Identifier: 10.1109/JSAC.1984.1146086 Publication Year: 1984, Page(s): 611 620 [4] Keshavarz, A.; Mosayyebpour, S.; Biguesh, M.; Gulliver, T.A.; Esmaeili M,Speech-Model Based Accurate Blind Reverberation Time Estimation Using an Filter, Audio, Speech, and Language Processing, IEEE Transactions on Volume: 20, Issue: 6 Digital Object Identifier: 10.1109/TASL.2012.2191283 Publication Year: 2012, Page(s): 1884 1893 [5] Bhattacharya, S.; Singh, S.K.; Abhinav, T, Performance evaluation of and cepstral speechcoder in simulation and in real-time Recent Advances in Information Technology (RAIT), 2012 1st International Conference on Digital Object Identifier:10.1109/RAIT.2012.6194531 Publication Year: 2012, Page(s): 826-831 [6] Fliege, N.J., Mulitrate Digital Signal Processing (John Wiley and Sons, 1994). [7] Mitra, S.K., Digital Signal Processing (McGraw-Hill, 1998). [8] Orfanidis, S.J., Introduction to Signal Processing (Prentice-Hall, Inc., 1996). [9] Vaidyanathan, P.P., Multirate Systems and Filter Banks (Prentice-Hall, Inc., 1993). [10] www.mathworks.com [11] Proakis, Digital Signal Processing (third edition pp. 863-64). [12] Jackson, L.B., Digital Filters and Signal Processing (Second Edition, Kluwer Academic Publishers, 1989. pp.255-257). Kadam V.K 1 Associate Professor & Research Student, Department of Electronics, P.E.S College of Engineering Nagsenvan, Aurangabad-431002 (M.S) Dr.Babasaheb Ambedkar Marathwada University, Aurangabad-431002 (MS) Email:vipulsangram@pescoe.ac.in,vipulsangram@gmail.com Dr.R.C Thool 2 Department of Information Technology SGGS Institute of Engineering & Technology Vishnupuri, Nanded 431606 (M.S) (An autonomous institute set up and 100% funded by Government of Maharashtra) Email:rcthool@ssgs.ac.in 13 P a g e