Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

Size: px
Start display at page:

Download "Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech"

Transcription

1 Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech L. Demri1, L. Falek2, H. Teffahi3, and A.Djeradi4 Speech Communication and Signal Processing Laboratory, Electronics and Computer Science Faculty, USTHB, BP 32, El alia, Algiers {ldemri19871, Abstract One of the major issues when transforming a voice using the PSOLA algorithm is to be able to accurately find the values for the signal modification parameters (α, β and γ) that allow us to transform the source signal into the target signal. In this paper, we propose a way to determine these parameters on the basis of a study of their influence on some speech acoustic descriptors. We are then able to deduce the relationship between each transformation parameter and the feature parameters of the voice. The obtained results are acceptable and allowed us to sort the vocal features in ascending order of transformation complexity. Voice transformation is an application that becomes possible with the availability of high quality speech synthesis and analysis systems since it involves delicate modifications of the spectral and prosodic characteristics of the speech signal. On the other hand, it is necessary to be able to rebuild a high quality signal from the modified parameters. Three interdependent issues must be dealt with when conceiving a voice conversion system: Firstly, we must determine some acoustic parameters that can characterize the speaker s identity. Secondly, a speech model is necessary to evaluate these parameters and generate a speech signal using the transformed parameters. Thirdly, the conversion function type, the learning algorithm, and the conversion function s application mode must be chosen [1]. Index Terms first voice transformation, PSOLA algorithm, vocal features I. INTRODUCTION III. Voice transformation or conversion is a method by which we can modify the speech signal of a reference speaker also called source speaker, in such a way that it seems to have been pronounced by the desired speaker, also called target speaker. To achieve this, a learning phase needs to be performed on a restrained set of recordings of both the source and the target speakers in order to determine the transformation function which is then applied to the reference speaker to perform the voice conversion. The voice conversion technology is used in many fields such as movie dubbing, automatic translation for telephone conversations between speakers who speak different languages (interpreted telephony applications), or in very low rate voice coding. Voice transformation can also be used to evaluate the reliability of speaker recognition systems. The objective of this work is a contribution to the establishment of a voice transformation methodology. The speech analysis and synthesis system is based on the PSOLA algorithm [1]. II. The speaker s identity is characterized by many different acoustic parameters (also called «features») including: prosody, tone color, and other spectral parameters such as the LPC and MFCC parameters, that were also used in this study. The tone of the voice depends on the physical properties of the speaker s voice organ. The tone is represented by the spectral characteristics of the signal (such as the formants). Prosody corresponds to the expression and style components, i.e. intonation and accent, and depends on the social and psychological conditions of the speaker. When analyzing the signal, prosody corresponds to the signal s pitch, energy and duration [1]. IV. PSOLA ANALYSIS/SYNTHESIS METHOD (PITCH SYNCHRONOUS OVER LAP-ADDING) The voice transformation method used in this study is based on the PSOLA algorithm [2]. This algorithm consists of an analysis and a synthesis steps as in Fig. 1. VOICE TRANSFORMATION PRINCIPLE A. Analysis In this step we must decompose the signal into elementary waveforms. The procedure is the following (as in Fig. 1): Manuscript received June 23, 213; revised August 2, 213. doi: /lnit SPEAKER IDENTITY CHARACTERIZATION 61

2 stretching factor). Pitch modification is performed by altering the distance between the waveforms (pushing them farther away from each other to lower the pitch, and closer to each other to increase it). Duration modification is performed by suppressing or duplicating the waveforms. FD-PSOLA Step (Frequency Domain). In this step we modify the spectral envelope (by compressing or dilating the spectrum (see Fig. 4). We can when using the wide-band PSOLA algorithm; each elementary waveform is an approximation of the spectral envelope [5]. So compressing/stretching the waveform is equivalent to compressing/stretching the spectral envelope. The dilatation/compression factor of the spectrum is called gamma (γ). Figure 1. The different steps of the PSOLA algorithm Singularity detection, which consists in the localization of the important concentrations of energy in the signal. Signal s pitch estimation (here obtained via the autocorrelation method). Voicing detection (obtained by the zero-crossing rate). The placement of markers in voiced zones is then realized by centering them on the energy maxima to avoid deteriorating the signal after the windowing step, and to keep the distance of T=1/F between two markers. For the unvoiced portions, the markers are equidistant to one another. Windowing is the last analysis step. It allows the signal to be broken into elementary waveforms (as in Fig. 2). This step requires the following conditions to be verified: a 5% overlapping must be kept between adjacent windows, and the window must be centered on the signal s energy maximum. This allows the maximum of the signal s local energy to be preserved. The signal is therefore broken into a series of elementary waveforms. Figure 4. Dilatation/compression reconstitution spectrum OLA Signal Reconstitution (Overlap-Add) In order to glue back together the previously modified waveforms, the OLA method multiplies the synthesized waveform by a triangular temporal window in order to make the transition between waveforms smooth (Fig. 5). This allows the amplitude of the synthesized signal to be maintained. Figure 2. Windowing: to cut out the signal into elementary window [3] B. Synthesis This step consists of 2 modification steps: one in the time domain (TD-PSOLA) and one in the frequency domain (FD-PSOLA). Figure 5. OLA Signal (overlap-add) [6] V. VOICE TRANSFORMATION USING THE PSOLA METHOD Figure 3. TD PSOLA [4] TD-PSOLA Step (Time Domain) We first modify the signal s pitch and duration as shown in Fig. 3. To do so, we define a pitch modification parameter named β (compression/dilatation factor) and a length modification parameter named α (temporal Figure 6. Global analysis/synthesis diagram 62

3 The fixed goal is to imitate a precise voice (named target signal: here a female voice as vowel/a/) by transforming the recording of the source voice: male voice. We want to create the perceptive illusion that the target pronounced whatever the source has recorded. It is therefore necessary to determine the acoustic feature parameters for each signal (source and target) and then to find the transformation function, in our case the values of α, β, and γ, required to pass from one to the other. This step can be summarized by a global analysis/synthesis diagram, shown on Fig. 6: The transformation first implies determining the feature parameters of both signals (durations, pitchs, first three formants (F1, F2 and F3), and the LPC and MFCC parameters), then making a quantitative comparison of these parameters. However, since the signals do not have the same lengths, it is necessary to align the parameters. The alignment method used is the DTW algorithm. Figure 7. Example of an obtained plot for the autocorrelation of a speech signal sequence with highlighting the maximum corresponding to the fundamental period Mel Frequency Cepstral coefficient (MFCC) The mfcc are speech signal acoustic parameters that consider human perception. The speech spectrum is reduced to a non linear Bark scale (or Mel), given by the following formulas: Bark (f) = 6* Arcsinh (f / ) (f: frequency in Hertz). Mel (f) = / Log (2) (1 + f / ) C. Acoustic Parameters Extraction Methods and Alignment with DTW The acoustic parameters are first to be extracted from the source and the target signals in order to be used to determine the required transformation parameters of the PSOLA algorithm. The parameters from the source and the target were then aligned using a DTW algorithm [4] in order to be compared and allow us to create the transformation function. In this study we have focused on three types of vocal parameters: The formants, the fundamental frequency, and the MFCCs [4]. Formants: The formants were obtained using the Linear Predictions Coefficients source/filter model. The formantic frequencies are spectrum maxima of the speech signal obtained using of the LPC model. They correspond to the voice timbre. Fondamental frequency (F) F correspond to the components of the expression and style (intonation and stress), which depend on social and psychological conditions of the speaker level signal, prosody is the pitch, energy and duration. The method used is based on the calculation of the autocorrelation function of each frame of the analyzed signal. For a nonstationary speech signal is used the formula (3) to calculate the autocorrelation short-term estimated on slices during which the signal is quasi-stationary [6]: Rxx (k) = (1/N) [ ( ) ( )][ ( ) ( ) (3) The calculation of MFCC coefficients is performed as follows: Pre-emphasis signal, it is to highlight the high frequency with a high-pass filter of the form H (z) = 1.9z-1 Cut out the signal window of 3 ms every 1 ms. Application of a Hamming window on these servings of 3 ms Application of a Fourier transform on each portion, we obtain the spectrum Figure 8. MFCC block diagram Creation of the filter bank. It is more triangular filters that will cover each of frequency. They allow a better simulate the functioning of the human ear. Conversion in Mel scale, using filters each portion. Applying a DCT (Discrete Cosine Transform) on the portions, we obtain the cepstral coefficients (MFCC) The different steps of MFCC calculation used are shown in Fig. 8, Dynamic time warping (DTW) [6] DTW is a time series alignment algorithm developed originally for speech recognition. It is an algorithm based on dynamic programming techniques. It allows for measuring similarity between two time series which may vary in time or speed. DTW algorithm allows also finding the optimal alignment between two times series if one of them may be warped non-linearly by stretching or shrinking it along its time axis. This warping between (1) m M N: sample number where w (n) is an appropriate window for analysis (here Hamming window used), N is the section length being analyzed (corresponding to 2ms time length of signal), N' is the number of signal samples used in the computation of R(m), Mo is the number of autocorrelation points to be computed, and l is the index of the starting sample of the frame. For pitch detection applications N' is generally set to the value N ' =N-m The pitch then revealed by a maximum of the autocorrelation for a moment that we identify thereafter (see Fig. 7). (2) 63

4 two time series can then be used to find corresponding regions between the two time series or to determine the similarity between the two time series. It aims at aligning two sequences of feature vectors by warping the time axis iteratively until an optimal match (according to a suitable metrics) between the two sequences is found Consider two sequences of feature vectors: A=a1, a2,, aj,.., an B= b1, b2,, bj,.., bm The two sequences can be arranged on the sides of a grid, with one on the top and the other up the left hand side (Fig. 9). Both sequences start on the bottom left of the grid. P = D (A, B) D. Determination of the Values of the Transformation Parameters (α, β and γ) We have thus calculated the feature parameters of both the source and target signals. The next step is to determine the values of the transformation parameters (α, β and γ) of the PSOLA method that will allow us to transform the source signal into the target signal. The proposed method is to study the influence of α, β and γ on all of the features mentioned here, in order to deduce their variation range and to find a relation between each acoustic feature and the considered PSOLA parameter. This will allow us to accurately determine the necessary values for α, β and γ. α, β and γ s Influences on the Considered Features: We started by affecting multiple values for each of the 3 parameters, α, β and γ. We have then studied the effect of the variation of each parameter (while keeping the other two constant), before drawing the variation of the features against these 3 parameters. The results for the gaits of the duration, the pitch, and the energy are shown in Fig. 1 Fig. 1 shows an influence of α on duration, β on F, and γ has no influence on (duration, F and energy). An example of results obtained for the gaits of the first formant (frequencies, bandwidths, and amplitudes) are shown in Fig. 11 for α, Fig. 12) for β, and Fig. 13 for γ. Figure 9. Two time series A and B arranged on the sides of a grid [6] Inside each cell a distance measure can be placed, comparing the corresponding elements of the two sequences. To find the best match or alignment between these two sequences one need to find a path through the grid which minimizes the total distance between them. The procedure for computing this overall distance involves finding all possible routes through the grid and for each one compute the overall distance. The overall distance is the minimum of the sum of the distances between the individual elements on the path divided by the sum of the weighting function. The weighting function is used to normalize for the path length. It is apparent that for any considerably long sequences the number of possible paths through the grid will be very large. The major optimizations or constraints of the DTW algorithm arise from the observations on the nature of acceptable paths through the grid. To find the best alignment between A and B one needs to find the path through the grid. P = p1,, ps,, pk ps = (is, js ) Which minimizes the total distance between them. P is called a warping function. Time-normalized distance between A and B is D (A, B) k d( p ) w D (A, B)= min s s 1 s k w s 1 (4) s D (ps): distance between is and js ws > : weighting coefficient Seeking a weighting coefficient function which guarantees that: k C ws (5) s 1 Best alignment path between A and B is P : Figure 1. Variations of the duration (in red), the pitch (in blue) and the energy (in green), as functions of α, β, and γ. 64

5 1 Formant F1 Frequencies(Hz) =,5 = =original = =1, time (ms) Amplitude F1 Amplitude ( decibel) 8 6 =,5 = =original = =1, Time (ms) 5 Bandwidth F1 bandwidth (Hz) 4 =,5 = =original = =1,7 3 2 Figure 13. Variations of the frequencies, the bandwidths, and the amplitudes as a function of γ for the first formant Time (ms) Figure 11. Variations of the frequencies, the bandwidths, and the amplitudes as a function of α for the first formant Figure 14. Variation of MFCC with α, β, γ Figure 12. Variations of the frequencies, the bandwidths, and the amplitudes as a function of β for the first 3 formants 65

6 4 F( F(Hz) ,,2 beta Figure 16. Variation of the duration as a function of α : d(α) and of the pitch as a function of β: F(β) From the Fig. 16 and 17, we have established the following relationships: α(d)= (1/slope)(dmodified-dsource) + 1 (7) β(f)= (1/slope)( Fmodified- Fsource) +1 (8) γ(fi)= (1/slope)( Fimodified- Fisource) +1 (9) 14 F1 (Hz) F2 (Hz) Figure 15. Variation of LPC with α, β, γ Results obtained from these different figures show: The α parameter influences the duration of the signal and does not affect other features. The β parameter influences the pitch and alters the frequencies of the formants. The γ parameter influences the LPC MFCC parameters therefore, the spectral envelope. This study has allowed us to define the variation ranges of each of the parameters, based on the intelligibility of the synthesized signal. These ranges are:.6 α β γ F3 (Hz) (6) Figure 17. Variations of F1, F2, and F3 as functions of γ The slope is the one on the right side of the obtained graph. From the values of the source and target parameters, and the slope of the corresponding graph, we determine the correct α, β, and γ parameters for the transformation. The values of the parameters obtained in our case, using (7), (8), (9) formula are given in Table I. TABLE I. 12 duration (ms) 12 8 E. Relation between the Features and the PSOLA Parameters Our goal here is to find a method that allows us to choose the parameters of PSOLA that match the values of the three main acoustic parameters (duration, pitch and formants) of the source and target (or modified) signal values. To do this, we have used multiple values for each of the parameters α, β and γ (for the source voice) and have looked for the values of the duration, the pitch, and the 3 first formants. The results we obtained are shown in Fig. 16 and Fig. 17. d( ) 14 VALUES OBTAINED FOR THE TRANSFORMATION PARAMETERS Transformation parameter Alpha Beta Value K (amplification parameter).88 8 The Fig. 18 shows that the corresponding acoustic parameters to target and transformed signal are overall close. However, to make a good estimation of the speech transformation, we must make a quantitative comparison ,8 2, alpha 66

7 of all these acoustics parameters, (because these parameters have not the same measurement units). this amounts to a comparison of the relative durations between the target and transformed parameters That allows us to identify the acoustics parameters which are most difficult to transform with this method and therefore those for which the relative distance between the target and transformed is the highest. Fig. 19 shows that. Figure 18. Comparison between the features of the source, target, and transformed signals: (black: source, red: target and green: transformed signal) problem was then to determine the threesome of values α, β and γ that allows us to transform the masculine vowel into the feminine one (regardless of the masculine speaker s language). In order to do this, we have made a study that allowed us to deduce the variation ranges as well as the laws of variation for these three parameters, so as to easily determine the correct values for the transformation. The obtained results showed a strong similarity between the target and the transformed signals, proving the effectiveness of our methodology. These results are very acceptable as shown by Fig.18 and Fig.19) The study also showed that the most easily transformable parameters using the PSOLA method are the pitch, the formants, and the MFCCs, but that the LPC parameters, the durations and the formants bandwidths were more subtle. It would be interesting, as an outlook to this work, to try a voice transformation on a longer signal. This would require a dynamic adjustment of the acoustic parameters. REFERENCES Figure 19. Relative distance between target and transformed acoustics parameters Fig. 19 shows that the highest relative distances are corresponding to: duration and bandwidths. This means that an exaggerated variation of duration or bandwidth can make very quickly transformed signal unintelligible, like we have seen when we have established the ranges of α, β and γ. VI. CONCLUSION In this study, we have developed an approach for voice transformation based on the PSOLA algorithm. This approach required a learning stage in which we have calculated and compared the speech signal s acoustic parameters (after aligning them using the DTW algorithm), namely: duration, energy, pitch, formants (frequencies, bandwidths and amplitudes) and the MFCC and LPC parameters for the source and target signals. The transformation function was obtained after a study of the influence of the signal transformation parameters α (temporal variation), β (pitch variation factor) and γ (spectral envelope variation factor) using the PSOLA algorithm. We have then performed a transformation attempt of a masculine vowel /a/ into a feminine vowel /a/. The [1] M. Lindasalwa, M. Begam, and I. Elamvazuthi, Voice recognition algorithms using Mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques, Journal of Computing, vol. 2, no. 3, pp , 21. [2] H. Valbret, E. Moulines, and J. P. Tubach, Voice transformation using PSOLA techniques, speech communication, Speech Communication, vol. 11, no. 2-3, pp , [3] A. Bala et al., Voice command recognition system based on mfcc and dtw, International Journal of Engineering Science and Technology. vol. 2, no. 12, pp , 21. [4] G. Peeters, Models and modification of sound signal adapted to its local characteristics," Ph.D. Thesis, University Paris 6. Specialty: Acoustic Signal Processing and Computing Applied to Music, 21. [5] T. En-najjary, "Voice conversion for speech synthesis," Presentation Eurocom, September 26. [6] P. Martin, Intarder of speech signal by autocorrelation," XXIVèmes Study Days on Speech, Nancy, June, 22. [7] H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. on Acoust, Speech, and Signal Process, pp , Lyes Demri is a Student PhD and study in the Option: Telecommunications and information processing. University of Science and Technology Houari Boumediene. Algiers. Speech communication and signal processing laboratory, Electronics and Computer Science Faculty. Leila Falek is a doctor electronics. Director of Research. Speech communication and signal processing laboratory, Electronics and Computer Science Faculty. Telecommunications department. University of Science and Technology Houari Boumediene. Algiers Hocine Teffahi is a Professor of Electronic. Director of Research. Speech communication and signal processing laboratory, Electronics and Computer Science Faculty. Telecommunications department. University of Science and Technology Houari Boumediene. Algiers Amar Djeradi. Professor of Electronic. Director of Research. Speech communication and signal processing laboratory, Electronics and Computer Science Faculty. Telecommunications department. University of Science and Technology Houari Boumediene. Algiers. 67

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW ANJALI BALA * Kurukshetra University, Department of Instrumentation & Control Engineering., H.E.C* Jagadhri, Haryana, 135003, India sachdevaanjali26@gmail.com

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure

More information

Singing Expression Transfer from One Voice to Another for a Given Song

Singing Expression Transfer from One Voice to Another for a Given Song Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam MACLab Music and Audio Computing Introduction Introduction

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Identification of disguised voices using feature extraction and classification

Identification of disguised voices using feature extraction and classification Identification of disguised voices using feature extraction and classification Lini T Lal, Avani Nath N.J, Dept. of Electronics and Communication, TKMIT, Kollam, Kerala, India linithyvila23@gmail.com,

More information

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels A complex sound with particular frequency can be analyzed and quantified by its Fourier spectrum: the relative amplitudes

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW Hung-Yan GU Department of EE, National Taiwan University of Science and Technology 43 Keelung Road, Section 4, Taipei 106 E-mail: root@guhy.ee.ntust.edu.tw

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

Basic Characteristics of Speech Signal Analysis

Basic Characteristics of Speech Signal Analysis www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Speech Recognition using FIR Wiener Filter

Speech Recognition using FIR Wiener Filter Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of

More information

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009 ECMA TR/105 1 st Edition / December 2012 A Shaped Noise File Representative of Speech Reference number ECMA TR/12:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2012 Contents

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks Adaptive time scale modification of speech for graceful degrading voice quality in congested networks Prof. H. Gokhan ILK Ankara University, Faculty of Engineering, Electrical&Electronics Eng. Dept 1 Contact

More information

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau

More information

COM325 Computer Speech and Hearing

COM325 Computer Speech and Hearing COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

A DEVICE FOR AUTOMATIC SPEECH RECOGNITION*

A DEVICE FOR AUTOMATIC SPEECH RECOGNITION* EVICE FOR UTOTIC SPEECH RECOGNITION* ats Blomberg and Kjell Elenius INTROUCTION In the following a device for automatic recognition of isolated words will be described. It was developed at The department

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals. XIV. SPEECH COMMUNICATION Prof. M. Halle G. W. Hughes J. M. Heinz Prof. K. N. Stevens Jane B. Arnold C. I. Malme Dr. T. T. Sandel P. T. Brady F. Poza C. G. Bell O. Fujimura G. Rosen A. AUTOMATIC RESOLUTION

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015 University of Colorado at Boulder ECEN 4/5532 Lab 1 Lab report due on February 2, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-3), London, UK, September 8-11, 23 FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION Jean Laroche Creative Advanced Technology

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Geoffroy Peeters, avier Rodet To cite this version: Geoffroy Peeters, avier Rodet. Signal Characterization in terms of Sinusoidal

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis Signal Analysis Music 27a: Signal Analysis Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD November 23, 215 Some tools we may want to use to automate analysis

More information

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph XII. SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph A. STUDIES OF PITCH PERIODICITY In the past a number of devices have been built to extract pitch-period information from speech. These efforts

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Application of The Wavelet Transform In The Processing of Musical Signals

Application of The Wavelet Transform In The Processing of Musical Signals EE678 WAVELETS APPLICATION ASSIGNMENT 1 Application of The Wavelet Transform In The Processing of Musical Signals Group Members: Anshul Saxena anshuls@ee.iitb.ac.in 01d07027 Sanjay Kumar skumar@ee.iitb.ac.in

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

A system for automatic detection and correction of detuned singing

A system for automatic detection and correction of detuned singing A system for automatic detection and correction of detuned singing M. Lech and B. Kostek Gdansk University of Technology, Multimedia Systems Department, /2 Gabriela Narutowicza Street, 80-952 Gdansk, Poland

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

Speech Processing. Simon King University of Edinburgh. additional lecture slides for

Speech Processing. Simon King University of Edinburgh. additional lecture slides for Speech Processing Simon King University of Edinburgh additional lecture slides for 2018-19 assignment Q&A writing exercise Roadmap Modules 1-2: The basics Modules 3-5: Speech synthesis Modules 6-9: Speech

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)

More information