Epoch Extraction From Emotional Speech

Size: px
Start display at page:

Download "Epoch Extraction From Emotional Speech"

Transcription

1 Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati {dgovind,prasanna}@iitg.ernet.in Abstract This work proposes a modified zero frequency filtering (ZFF) method for epoch extraction from emotional speech. Epochs refers the instants of maximum excitation of the vocal tract. In the conventional ZFF method, the epochs are estimated by trend removing the output of the zero frequency resonator (ZFR) using the window length equal to the average pitch period of the utterance. Use of this fixed window length for the epoch estimation causes spurious or missed estimation from the speech signals having rapid pitch variations like in emotional speech. This work therefore proposes a refined ZFF method for epoch estimation by trend removing the output of ZFR using the variable windows obtained by finding the average pitch periods for every fixed blocks of speech and low pass filtering the resulting trend removed signal segments using the estimated pitch as the cutoff frequency. The epoch estimation performance is evaluated for five different emotions in the German emotional speech corpus having simultaneous electroglotto graph (EGG) recordings. The improved epoch estimation performance indicates the robustness of the proposed method against rapid pitch variations in emotional speech signals. The effectiveness of the proposed method is also confirmed by the improved epoch estimation performance on the Hindi emotional speech database. I. INTRODUCTION Epochs are events like glottal closure in voiced speech and onset of frication or burst in unvoiced speech []. Due to the interaction of the vocal tract resonances, estimation of the epochs location from speech signal is a challenging task [], [2]. The knowledge of epochs can be used in many speech processing applications [3] [7]. Because of its importance, there are different methods proposed for estimating the epochs location from speech signals [], [8], [9]. Among all these methods zero frequency filtering (ZFF) method for extracting the epochs locations provides a simple and accurate way of estimation from speech signals []. In ZFF method, the speech signal is passed though the cascade of two resonators located at zero frequency where these zero frequency resonator (ZFR) removes the effect due to vocal tract resonances which are predominantly present in the higher frequency region of the speech signals. The trend in the ZFR output is removed by local mean substraction using window length equal to the average pitch period of the speech utterance. This trend removed signal is known as zero frequency filtered signal (ZFFS). The positive zero crossings of the ZFFS give the epochs location and the difference between successive epochs location provide instantaneous pitch periods. The accurate pitch estimation is important for emotional analysis stage of an emotional conversion system [6]. However, the rapid pitch variations in the emotional speech causes spurious or missed zero crossings in the ZFFS when trend removed using a single window length. Hence the motivation for the present work. There are some existing works done to improve ZFF epoch estimation from the signals having rapid pitch variations like laughter signals []. Here the window length used for the trend removal are estimated for each glottal activity region and trend in the corresponding segment of the ZFR output is removed using these varied window lengths. As described later, in the present work the trend in the ZFR output is removed using the window lengths estimated from the average pitch period for every fixed nonoverlapping frames of speech. The rest of the paper is described as follows: Section II describes the conventional ZFF algorithm. Section III explains the modified ZFF method for estimating epochs from emotional speech. A comparison of instantaneous F estimated from conventional and modified ZFF is given in Section IV and Section V summarizes with scope of some future work. II. CONVENTIONAL ZFF METHOD FOR EPOCH ESTIMATION The algorithmic steps to estimate the epochs in speech by ZFF, which hereafter will be termed as conventional ZFF, are as follows []: Difference input speech signal z(n) x(n) = z(n) z(n ) () Compute the output of cascade of two ideal digital resonators at Hz 4 y(n) = a k y(n k) + x(n) (2) k= where a = 4, a 2 = -6, a 3 = 4, a 4 = - Remove the trend i.e., ŷ(n) = y(n) ȳ(n) (3) where ȳ(n) = N 2N+ n= N y(n) and 2N + corresponds to the average pitch period computed over a longer segment of speech The trend removed signal ŷ(n) is termed as zero frequency filtered signal (ZFFS). The positive zero crossings of the filtered signal will give the location of the epochs. These epochs are periodically located in case of voiced speech representing the glottal closure instants and are ran-

2 domly located in case of unvoiced speech representing the onset of bursts or frication []. Figure shows the epochs estimated from a voiced segment and an unvoiced segment of speech. It can be noted that ZFFS gives periodic zero crossings in case of voiced segment and random zero crossings in the unvoiced segment Time (samples) Time (Samples) Fig. : Epochs from voiced and unvoiced segments of speech. A voiced speech segment, its ZFFS and epochs. (d) An unvoiced speech segment, its ZFFS and epochs. The epoch estimation performance is evaluated for five different emotions (Neutral, Angry, Happy, Boredom and Fear) of German emotional speech corpus having simultaneous EGG recordings []. Approximately speech files of speakers and texts per emotion were used for the performance evaluation. For evaluating the estimated epochs from the speech, following measures are used [8]. Larynx cycle: The range of sample (/2) (l r + l r )< n <(/2)(l r+ +l r ) where l r, l r and l r+ are the current, preceding and succeeding reference epoch locations, respectively Identification Rate (IDR): The percentage of larynx cycles for which exactly one epoch is detected. Miss Rate (MR): The percentage of larynx cycles for which no epoch is detected. False Alarm Rate (FAR): The percentage of larynx cycles for which more than one epoch is detected. Identification Error (ζ): The timing error between the reference and detected instants of significant excitation in larynx cycles for which exactly one epoch was detected. Identification Accuracy (σ) (IDA): The standard deviation of the identification error ζ. Small values of σ indicate high accuracy of identification. (d) TABLE I: Epoch estimation performance of conventional ZFF and DYPSA algorithms for different emotional speech signals taken from the German database []. ZFF Neutral Angry Happy Boredom Fear DYPSA Neutral Angry Happy Boredom Fear The performance of conventional ZFF is tabulated in the Table I. As expected the conventional ZFF method gives better epoch estimation for neutral emotion speech and the performance degrades in the case of other emotions, except boredom. The reason for this is due to the rapid pitch variations in the emotional speech compared to the neutral speech. In the conventional ZFF method of epoch estimation, due to fixed window size used for local mean subtraction, some of the epochs are either missed or spuriously hypothesized. For this reason an appropriate window size should be selected for the optimum epoch estimation from speech signals with larger pitch variations. As the pitch variation of boredom emotion is similar to that of the neutral, the epoch detection performance of both the emotions is nearly same. This epoch estimation performance from various emotions are compared with the DYPSA algorithm, another popular method for estimating epochs [8]. The DYPSA algorithm used in this work is implemented using the programs given in the Voicebox speech processing toolbox. The epochs estimated using DYPSA also shows the same trend and thus confirming the degradation is due to the large pitch variations in the emotional speech. A similar trend in the epoch estimation performance can be observed in the Hindi emotional speech database collected for four speakers (2 males and 2 females) in 4 different emotions (Neutral, Angry, Happy and Boredom) having simultaneous EGG recordings. Ten randomly selected sentences from the Hindi broadcast news database, are used for the emotional speech recording. As the speech is recorded in three sessions, there are 2 files (3x4x) available for each emotion. Table II shows epoch estimation performance obtained for Hindi emotional speech database. The degradation in the epoch estimation performance can be observed here also for angry and happy emotions. Even though the level of degradation performance is different, it follows a similar trend as in German emotional speech corpus.

3 TABLE II: Epoch estimation performance of conventional ZFF method on Hindi emotional speech database Neutral Angry Happy Boredom TABLE III: Epoch estimation performance of modified ZFF method by updating the window length for 25 ms segment of speech from different emotions. Neutral Angry Happy Boredom Fear TABLE IV: Epoch estimation performance of refined ZFF method on various emotions Neutral Angry Happy Boredom Fear TABLE V: Epoch estimation performance of refined ZFF method on various emotions from Hindi emotional speech database Neutral Angry Happy Boredom III. MODIFIED ZFF METHOD FOR EPOCH ESTIMATION IN EMOTIONAL SPEECH For reliably estimating the epochs from emotional speech, a refinement to the conventional ZFF algorithm is developed here. Instead of using fixed average pitch period as the window length for the trend removal of the ZFR output, the window length is updated for every short time segment of length 2-3 ms. A robust method to find the F values from the ZFFS for acoustically degraded speech is described in [2]. In this method, F is computed as the frequency value corresponding to the highest magnitude in the short time fourier transform (STFT) of the ZFFS segment obtained from the conventional ZFF method and the window length is computed as the reciprocal of the F value. This window length is used for the trend removal of the ZFR output to get the ZFFS for that particular speech segment. In the present work we use this approach for the emotional speech case which can be treated as a degraded speech due to the change in the psychological state of the speaker. The performance of this method for the estimation of epochs in emotional speech is given in Table III. The performance improves significantly compared to the fixed window case demonstrating the significance of variable window length for trend removal in case of emotional speech. Figure 2 compares the epoch estimation using conventional ZFF method and the proposed modified ZFF method. Figure 2 shows the ZFFS of a segment of angry speech. The corresponding zero crossings termed as epochs are plotted in Figure 2. The fixed window length results in spurious zero crossings. Figure 2(d) shows the ZFFS for the same segment obtained from the method given in [2]. Even though, the number of spurious zero crossings are reduced, it still leaves some spurious zero crossings unaltered. To analyze this, the corresponding magnitude of STFT of ZFFS for both the methods are plotted in Figures 2 and. The magnitude of the harmonics beyond the fundamental frequency are comparatively stronger. To alleviate this problem the trend removed ZFFS segment is passed through a low pass filter having cut off frequency equal to.5 F. This is to suppress the strength of the pitch harmonics that come beyond F as shown in Figure 2(i). This resulted in further reduction of spurious zero crossings as given in Figure 2(h). Similarly, all the ZFFS segments obtained are concatenated together to obtain the modified ZFFS. Modified epochs are estimated by finding the positive zero crossings in the modified ZFFS. The performance after the low pass filtering of ZFFS is given in Table IV. As given, it further improves with low pass filtering. Thus the proposed modified ZFF method employs both variable window length and low pass filtering for the epoch estimation. The steps in the modified ZFF method can be summarized as follows: Compute the ZFFS using conventional ZFF method. Compute F as the highest magnitude frequency value in the STFT of each 25 ms non-overlapping ZFFS frames. Derive window length as the reciprocal of F for each frame. Remove the trend in the corresponding segment of the resonator output signal y(n) (Equation (2)) using a moving average filter of length equal to the window length of that segment, as given in Equation(3). Low pass filter each of the trend removed ZFFS segment with a cut off frequency equal to.5 F. Concatenate all the modified ZFFS segments to obtain the modified ZFFS signal. Hypothesize the negative to positive zero crossings of the modified ZFFS as the estimated epoch locations. Table V shows the improved epoch estimation performance for the Hindi emotional speech database. Even though there is a significant improvement in the epoch estimation due to the modified ZFF method for the emotional

4 (d) (g).5 (h) 5 (i) Fig. 2: Comparison of conventional ZFF and the modified ZFF approaches. The ZFFS obtained from a voiced segment of angry speech showing the spurious zero crossings, its epochs and STFT magnitude spectrum. (d) The modified ZFFS obtained by updating the window length, its epochs and STFT magnitude spectrum. (g) The ZFFS obtained by low pass filtering the modified ZFFS segments, (h) epochs estimated and its (i) STFT magnitude spectrum showing no frequency components beyond F. speech case, still the performance in case of angry, happy and fear are not comparable with that of the neutral or boredom. To study the reason for this, the glottal waves from these five emotions are analyzed. Figure 3 shows the speech waveform, glottal wave and difference of glottal wave of five emotions under the condition of same speaker, text and syllable. The prominent features in the difference glottal wave are the impulse like discontinuities. The amplitude of these impulse like discontinuities give an indication about the intensity with which the closing of vocal folds occur. Hence the difference glottal wave may be treated as the representative of strength of excitation. The strength of excitation of the emotions like angry, happy and fear are low as compared to neutral and boredom emotions. This is due to the rapid pitch variations and/or the difference in the nature of vocal folds activity for these emotions. The rapid pitch variations cause the vocal folds to vibrate with the lower suction pressure and hence the reduced strength of excitation. Depending on the psychological state of particular emotion, the tension associated with the vocal folds and associated muscle structure may be different. Because of these factors, the impulse strength may not as prominent in the case of neutral and boredom emotions. These can be observed by comparing the difference glottal waves of different emotions. This in turn may result in the reduction of energy around the zero frequency region, leading to the spurious epoch detections. This may be the reason for the reduced epoch estimation performance in case of angry, happy and fear emotions. Further exploration and understanding is required in this direction. IV. F ESTIMATION USING CONVENTIONAL AND MODIFIED ZFF METHODS The instantaneous pitch period or the epoch interval can be found by the successive difference between the estimated epoch locations. Taking the reciprocal of each epoch interval, multiplied by the sampling frequency gives the fundamental frequency (F ) [2]. Figure 4 shows the F contours derived from the estimated epochs using conventional and modified ZFF methods for neutral and angry emotional speech signals. It is to be observed from the Figures 4 and that the F contours obtained using conventional and modified ZFF methods remain nearly same for the neutral emotion. For angry emotion, the Figures 4 and indicate that the F values obtained using the modified ZFF method are more continuous than that obtained using conventional ZFF method. Hence the merit of the modified ZFF method. Therefore the estimated epochs and ZFFS obtained using the modified ZFF method are used for the prosody modification as explained in the following sections. V. CONCLUSION AND FUTURE WORK The present work identifies the unreliable epoch estimations from the emotional speech by the popular epoch extraction methods like ZFF and DYPSA. As the ZFF method provides reliable and accurate estimate of the epochs from neutral speech signals compared to other methods, the present work proposed a refinement for the conventional ZFF method for

5 (d) (g) (h) (i) (j) (k) (l) (m) (n) (o) Fig. 3: speech waveforms, glottal waveforms and difference of glottal waves of neutral (-), angry((d)-), happy((g)-(i)), boredom((j)- (l)) and fear((m)-(o)), respectively. reliable epoch estimation from the emotional speech signals. The modified ZFF method for epoch estimation uses both variable window length for trend removing the ZFR output and low pass filtering of higher harmonics in the trend removed ZFFS. The performance evaluation indicates the robustness of the modified ZFF method in estimating reliably from emotional speech compared to conventional ZFF method of epoch extraction. The modified ZFF method of epoch estimation can be used as a tool in the emotional analysis stage of neutral to emotion conversion systems for comparing the estimated source features of various emotions. Further exploration is needed to bring the epoch extraction performance from emotional speech signals to that of the neutral speech signals. VI. ACKNOWLEDGEMENT This work is a part of ongoing UK-India Education Research Initiative (UKIERI) project (27-2) on Study of Source Features for Speech Synthesis and Speaker Recognition between IIT Guwahati, IIIT Hyderabad and CSTR, University of Edinburgh, UK. We would also like to thank Prof. Felix Burkhardt for providing the EGG recordings of the German emotional speech database Time (s) Time (s) Fig. 4: Comparing the F contour obtained using conventional and refined ZFF method. The F contour obtained from, a neutral (- ) and angry ((d)-) speech signals using conventional and refined ZFF methods. REFERENCES [] K. S. R. Murty and B. Yegnanarayana, Epoch extraction from speech signals, IEEE Trans. Audio, Speech and Language Process., vol. 6, no. 8, pp , Nov. 28. [2] B. Yegnanarayana and K. S. R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio, Speech and Language Process., vol. 7, no. 4, pp , May 29. [3] B. Yegnanarayana and R. N. J. Veldhuis, Extraction of vocal-tract system characteristics from speech signals, IEEE Trans. Speech and Audio Process., vol. 6, no. 4, pp , July 998. [4] K. S. Rao and B. Yegnanarayana, Prosody modification using instants of significant excitation, IEEE Trans. Audio, Speech and Language Processing, vol. 4, pp , May 26. [5] E. A. P. Habets, N. D. Gaubitch, and P. A. Naylor, Temporal selective dereverberation of noisy speech using one microphone, in in Proc. ICASSP, Jan. 28, pp [6] D. Govind, S. R. M. Prasanna, and B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in proc. INTERSPEECH 2, Aug. 2. [7] S. R. M. Prasanna and D. Govind, Analysis of excitation source information in emotional speech, in Proc. INTERSPEECH, Sep. 2, pp [8] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, Estimation of glottal closure instants in voiced speech using DYPSA algorithm, IEEE Trans. Audio, Speech Lang. Process., vol. 5, no., pp , 27. [9] R. Smits and B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 4, pp , Sep.995. [] K. S. Kumar, M. S. H. Reddy, K. S. R. Murty, and B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, in proc INTERSPEECH, 29. [] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlemeier, and B. Weiss, A database of German emotional speech, in Proc. INTERSPEECH, 25, pp [2] B. Yegnanarayana, S. R. M. Prasanna, and G. Seshadri, Study of robustness of zero frequency resonator method for extraction of fundamental frequency, in ICASSP, May 2. (d)

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Cumulative Impulse Strength for Epoch Extraction

Cumulative Impulse Strength for Epoch Extraction Cumulative Impulse Strength for Epoch Extraction Journal: IEEE Signal Processing Letters Manuscript ID SPL--.R Manuscript Type: Letter Date Submitted by the Author: n/a Complete List of Authors: Prathosh,

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Automatic Evaluation of Hindustani Learner s SARGAM Practice

Automatic Evaluation of Hindustani Learner s SARGAM Practice Automatic Evaluation of Hindustani Learner s SARGAM Practice Gurunath Reddy M and K. Sreenivasa Rao Indian Institute of Technology, Kharagpur, India {mgurunathreddy, ksrao}@sit.iitkgp.ernet.in Abstract

More information

/$ IEEE

/$ IEEE 614 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009 Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals B. Yegnanarayana, Senior Member,

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS

ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS Hania Maqsood 1, Jon Gudnason 2, Patrick A. Naylor 2 1 Bahria Institue of Management

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech 456 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 2, MARCH 2006 A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech Mike Brookes,

More information

Detecting Speech Polarity with High-Order Statistics

Detecting Speech Polarity with High-Order Statistics Detecting Speech Polarity with High-Order Statistics Thomas Drugman, Thierry Dutoit TCTS Lab, University of Mons, Belgium Abstract. Inverting the speech polarity, which is dependent upon the recording

More information

EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT

EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT Dushyant Sharma, Patrick. A. Naylor Department of Electrical and Electronic Engineering, Imperial

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

GLOTTAL-synchronous speech processing is a field of. Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review

GLOTTAL-synchronous speech processing is a field of. Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor,

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

SIGNIFICANCE OF EXCITATION SOURCE INFORMATION FOR SPEECH ANALYSIS

SIGNIFICANCE OF EXCITATION SOURCE INFORMATION FOR SPEECH ANALYSIS SIGNIFICANCE OF EXCITATION SOURCE INFORMATION FOR SPEECH ANALYSIS A THESIS submitted by SRI RAMA MURTY KODUKULA for the award of the degree of DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

More information

A New Method for Instantaneous F 0 Speech Extraction Based on Modified Teager Energy Algorithm

A New Method for Instantaneous F 0 Speech Extraction Based on Modified Teager Energy Algorithm International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 4, Issue (016) ISSN 30 408 (Online) A New Method for Instantaneous F 0 Speech Extraction Based on Modified Teager Energy

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals INTERSPEECH 016 September 8 1, 016, San Francisco, USA Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals Gurunath Reddy M, K. Sreenivasa Rao

More information

Prosody Modification using Allpass Residual of Speech Signals

Prosody Modification using Allpass Residual of Speech Signals INTERSPEECH 216 September 8 12, 216, San Francisco, USA Prosody Modification using Allpass Residual of Speech Signals Karthika Vijayan and K. Sri Rama Murty Department of Electrical Engineering Indian

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Basic Characteristics of Speech Signal Analysis

Basic Characteristics of Speech Signal Analysis www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

NOVEL APPROACH FOR FINDING PITCH MARKERS IN SPEECH SIGNAL USING ENSEMBLE EMPIRICAL MODE DECOMPOSITION

NOVEL APPROACH FOR FINDING PITCH MARKERS IN SPEECH SIGNAL USING ENSEMBLE EMPIRICAL MODE DECOMPOSITION International Journal of Advance Research In Science And Engineering http://www.ijarse.com NOVEL APPROACH FOR FINDING PITCH MARKERS IN SPEECH SIGNAL USING ENSEMBLE EMPIRICAL MODE DECOMPOSITION ABSTRACT

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular

More information

VOICED speech is produced when the vocal tract is excited

VOICED speech is produced when the vocal tract is excited 82 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012 Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm Mark R. P. Thomas,

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Subtractive Synthesis & Formant Synthesis

Subtractive Synthesis & Formant Synthesis Subtractive Synthesis & Formant Synthesis Prof Eduardo R Miranda Varèse-Gastprofessor eduardo.miranda@btinternet.com Electronic Music Studio TU Berlin Institute of Communications Research http://www.kgw.tu-berlin.de/

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model Harjeet Kaur Ph.D Research Scholar I.K.Gujral Punjab Technical University Jalandhar, Punjab, India Rajneesh Talwar Principal,Professor

More information

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 3, MAY 1999 333 Correspondence Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm Sassan Ahmadi and Andreas

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals Sunil Rudresh, Aditya Vasisht, Karthika Vijayan, and Chandra Sekhar Seelamantula, Senior Member, IEEE arxiv:8.9v

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan USC SAIL Lab INTERSPEECH Articulatory Data

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

Text Emotion Detection using Neural Network

Text Emotion Detection using Neural Network International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 7, Number 2 (2014), pp. 153-159 International Research Publication House http://www.irphouse.com Text Emotion Detection

More information

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION Frank Kurth, Alessia Cornaggia-Urrigshardt

More information

Speech Recognition using FIR Wiener Filter

Speech Recognition using FIR Wiener Filter Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

A LANDMARK-BASED APPROACH TO AUTOMATIC VOICE ONSET TIME ESTIMATION IN STOP-VOWEL SEQUENCES. Stephan R. Kuberski, Stephen J. Tobin, Adamantios I.

A LANDMARK-BASED APPROACH TO AUTOMATIC VOICE ONSET TIME ESTIMATION IN STOP-VOWEL SEQUENCES. Stephan R. Kuberski, Stephen J. Tobin, Adamantios I. A LANDMARK-BASED APPROACH TO AUTOMATIC VOICE ONSET TIME ESTIMATION IN STOP-VOWEL SEQUENCES Stephan R. Kuberski, Stephen J. Tobin, Adamantios I. Gafos University of Potsdam Linguistics Department Potsdam,

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph XII. SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph A. STUDIES OF PITCH PERIODICITY In the past a number of devices have been built to extract pitch-period information from speech. These efforts

More information

Robust speech recognition using temporal masking and thresholding algorithm

Robust speech recognition using temporal masking and thresholding algorithm Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,

More information

On the glottal flow derivative waveform and its properties

On the glottal flow derivative waveform and its properties COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF CRETE On the glottal flow derivative waveform and its properties A time/frequency study George P. Kafentzis Bachelor s Dissertation 29/2/2008 Supervisor: Yannis

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Improving Sound Quality by Bandwidth Extension

Improving Sound Quality by Bandwidth Extension International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-212 Improving Sound Quality by Bandwidth Extension M. Pradeepa, M.Tech, Assistant Professor Abstract - In recent

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

The Effects of Noise on Acoustic Parameters

The Effects of Noise on Acoustic Parameters The Effects of Noise on Acoustic Parameters * 1 Turgut Özseven and 2 Muharrem Düğenci 1 Turhal Vocational School, Gaziosmanpaşa University, Turkey * 2 Faculty of Engineering, Department of Industrial Engineering

More information

Modulation Domain Spectral Subtraction for Speech Enhancement

Modulation Domain Spectral Subtraction for Speech Enhancement Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain Determination o Pitch Range Based on Onset and Oset Analysis in Modulation Frequency Domain A. Mahmoodzadeh Speech Proc. Research Lab ECE Dept. Yazd University Yazd, Iran H. R. Abutalebi Speech Proc. Research

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

Harmonics Enhancement for Determined Blind Sources Separation using Source s Excitation Characteristics

Harmonics Enhancement for Determined Blind Sources Separation using Source s Excitation Characteristics Harmonics Enhancement for Determined Blind Sources Separation using Source s Excitation Characteristics Mariem Bouafif LSTS-SIFI Laboratory National Engineering School of Tunis Tunis, Tunisia mariem.bouafif@gmail.com

More information

Speech/Data discrimination in Communication systems

Speech/Data discrimination in Communication systems IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN: 2278-2834 Volume 2, Issue 6 (Sep-Oct 2012), PP 45-49 Speech/Data discrimination in Communication systems Ashok Kumar Ginni 1,

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model HMM-based Speech Synthesis Using an Acoustic Glottal Source Model João Paulo Serrasqueiro Robalo Cabral E H U N I V E R S I T Y T O H F R G E D I N B U Doctor of Philosophy The Centre for Speech Technology

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION by DARYUSH MEHTA B.S., Electrical Engineering (23) University of Florida SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

More information

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection. Global Journal of Researches in Engineering: J General Engineering Volume 15 Issue 4 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice Yanmeng Guo, Qiang Fu, and Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing

More information

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal

More information