INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Size: px
Start display at page:

Download "INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)"

Transcription

1 INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN (Print) ISSN (Online) Volume 5, Issue 8, August (214), pp IAEME: Journal Impact Factor (214): (Calculated by GISI) IJECET I A E M E MODIFIED SYNTHESIS STRATEGY FOR VOWELS AND SEMI VOWELS (KLATT SYNTHESIZER) Alfred Vivek D Souza 1, Dr. D.J Ravi 2 1 M.Tech, Signal Processing, Vidyavardhaka College of Engineering, Mysore, India 2 Professor and HOD, ECE, Vidyavardhaka College of Engineering, Mysore, India ABSTRACT Klatt synthesizers are one of the most widely used formant synthesizers. Klatt synthesizers are usually implemented either with fixed parameter update rate or with variable parameter update rate. This paper proposes new method to store control parameters and parameter update strategy to improve the naturalness of the synthesized vowel and semi vowel sounds. Keywords: Klatt Synthesizer, Kannada Vowels and Semivowels Synthesis. 1. INTRODUCTION Speech synthesis is one of the most researched domains in speech processing. Many types of speech synthesis strategies are in use, of which the important ones are concatenative synthesis, articulatory synthesis and formant synthesis. Formant synthesis schemes are preferred over the other two due to its simplicity and ease with which they can be implemented on general purpose computers. The idea of Cascade Parallel Synthesizer was first proposed by Dennis H. Klatt [1] in 198 and slightly modified the synthesis strategy in 199 [2] and to this day it remains one of the popular formant synthesizer configurations. Next major revision of this class of synthesizers was KlattGrid synthesizer by David Weenink [3] which was incorporated in Praat software tool. 1.1 Klatt Class Synthesizers Klatt class synthesizers are based on source filter model of speech production as shown in Fig

2 Fig. 1: Block diagram of Klatt synthesizer The Klatt synthesizer can be divided into five parts. 1) Excitation Sources: There are two excitation sources in this model, the voicing source for voiced sounds and frication source for unvoiced sounds. 2) Coupling: It consists of nasal and tracheal pole-zero filters to model the condition of vocal tract [6] during nasal sound production. 3) Cascade Vocal Tract: This part consists of series of band pass filters called as Resonators labelled R1 to R5. F1 to F5 denotes the resonance frequency of each of those resonators and B1 to B5 denotes the resonance bandwidth of each of those resonators. 4) Parallel Vocal Tract: It is another model of vocal tract in which the resonators are arranged in parallel. In Fig. 1 they are denoted by Rp1 to Rp5. Each of these individual resonators has their own amplitude control parameters denoted by A2 to A6. AB is the bypass amplitude control. 5) Radiation Characteristics: This block models the lip radiation characteristics. Usually a first difference of the output sample suffices for this model. The basic principle of speech production is that the waveform generated by excitation source is modified appropriately by vocal tract resonators by mimicking the human vocal tract system. Klatt synthesizer provides two options for excitation source and two for vocal tract model. The selection of appropriate excitation source and vocal tract model depends on the type of sound that is intended to be produced. For the sound units which involve vibration of vocal cords, voicing source and cascade vocal tract model are selected. Examples of this class of sounds are vowels, semi vowels and diphthongs. For the sound units which do not involve vocal cord vibrations, parallel vocal tract model along with frication source is selected. 1.2 Excitation Sources The voicing source generates periodic pulses which represents the glottal pulses occurring due to vibrations of vocal cords and uses Rosenberg model [4] for generating the same. The glottal pulses are generated with fundamental frequency (F) also known as pitch. The amplitude of pulse train is specified as voicing amplitude (Av). Amplitude of aspiration (Ah) and amplitude of breathiness (Ab) are used to simulate breathy sounds. Open Quotient (OQ) is the ratio of open phase of vocal cords to pitch period. Fig 2 shows a sample glottal pulse waveform. The noise like appearance during the open phase of the pulse is to bring the breathiness effect. If breathiness amplitude is set to then the open phase of the pulse traces along the red line shown in the figure. 62

3 Fig. 2: Sample glottal pulse Unvoiced sounds use frication source as the excitation source. The parameter amplitude of frication (Af) is used to control the amplitude of the frication source. 1.3 Digital Resonators and Anti-Resonators Resonators are the building blocks of the synthesizer. Resonance frequency F and resonance bandwidth B are used to characterize the resonators. Resonators used in Klatt and KlattGrid synthesizers are all pole filters. The output sample y(n) for a given input sample x(n) can be calculated using difference equation: y(n) = Ax(n) + By(n-1) + Cy(n-2) (1) where y(n-1) and y(n-2) are the two previous output samples. If the sampling period is T then the coefficients A, B and C can be calculated using the formulae shown below. C = -exp(-2*pi*b*t) B = 2exp(-pi*B*T)cos(2*pi*F*T) (2) A = 1 B C Anti-resonators are used for coupling and for generation of nasal sounds. They are implemented as FIR filters having difference equation: y(n) = A x(n) + B x(n-1) + C x(n-2) (3) The coefficients A, B and C can be derived from A,B and C respectively of the resonators using the following transformations. A = 1/A, B = -B/A, C = -CA (4) 1.4 Database for vowels and semi-vowels For synthesizing vowels and semi-vowels, voicing source and cascade vocal tract model are used. The parameters that are used for vowel and semi vowel synthesis are pitch (F), formant frequencies (F1 to F5) and their bandwidths (B1 to B5), open quotient (OQ), voicing amplitude (Av), 63

4 breathiness amplitude (Ab) and aspiration amplitude (Ah). Other parameters mentioned in [1] and [5] can be kept constant. All the above mentioned parameters vary with time. As a result, for any given sound, each of those parameters is not a single value but a set of values at different times and is known as contour. The database should capture how these parameters change with time. Vowels and semi-vowels are continuants which mean that the parameters like pitch, formant frequencies and their bandwidths etc vary slowly with time. This situation is exploited for creating the database. For Klatt synthesizer, the sample speech utterance is recorded and partitioned into equal frames usually of 5ms duration each and for each of those frames the representative values of the parameters under consideration are stored. Table 1 shows sample database of 5 frames involving pitch and first formant frequency. The same procedure is applied to all other contours as well. Table 1: Sample parameters for Klatt synthesizer for kannada vowel /a/ Frame # F in Hz F1 in Hz B1 in Hz 1(5 ms) (5 ms) (5 ms) (5 ms) (5 ms) The database generation for KlattGrid synthesizer is similar to that of Klatt synthesizer except for the fact that the KlattGrid synthesizer uses variable frame size to capture variation of parameters more precisely. Table 2 shows sample database of 3 frames involving pitch and first formant frequency. Table 2: Sample parameters for KlattGrid synthesizer for kannada vowel /a/ Time in s F in Hz F1 in Hz B1 in Hz Synthesis of vowels and semi-vowels The synthesis strategy is quite straight forward [2] in Klatt synthesizer. The excitation waveform is first generated frame by frame by providing respective pitch (F), Av, Ah, Ab and OQ values to the voicing source block. For the example parameters of Table 1, the first 5ms of voicing waveform is generated with pitch of Hz and next 5ms frame with pitch 15 Hz and so on. The excitation source is filtered with resonator R1. For first 5ms, R1 will have resonance frequency of 72 Hz with bandwidth 15 Hz. The next 5ms of excitation waveform if filtered by R1 with resonance frequency of Hz and bandwidth of 172 Hz and so on. The same principle is applied for remaining resonators also. In other words, the parameters of voicing source block and each of the resonator blocks are updated once in every 5ms. KlattGrid synthesizer also works with similar synthesis strategy [3] but the parameter update rate varies. For the example parameters shown in Table 2, the initial parameters are the row 1 of the table. The first parameter update happens after.2s with parameters of row 2, the second parameter update happens.1s after previous update. Fig 3 shows the spectrogram of recorded and synthesized kannada vowel /a/ using KlattGrid synthesizer available in Praat software with fixed frame size 64

5 Fig. 3: Spectrogram of recorded sound (top) and synthesized sound (bottom) 2. PROPOSED METHOD The spectrogram of the synthesized vowel sound shown in Fig 3 preserves the overall properties and the sound generated is intelligible but lacks naturalness of the original recorded sound. This happens due to improper frame duration selection. If the frame duration is excessively long compared to pitch period, the parameters remains same for more than one pitch period and the parameter values jump to new ones in the next frame and are held constant throughout the new frame, giving a striped appearance in the spectrogram. On the other hand, if the frame is very short and frame duration is not integral multiple of pitch period of the frame, serious distortions can occur and this type of distortion in time domain is shown in Fig 4. Fig. 4: Termination of frame before completion of pitch period In Fig 4, the vertical line indicates the termination of frame n-1 and commencement of frame n. However, the pitch pattern is not complete and is terminated abruptly. 65

6 2.1 Pitch Synchronous Parameter Update Method To avoid the distortions and to make the synthesized sound more natural, the parameters should be updated once every pitch period. This makes sure that the pitch pattern is completed before new parameters are applied to the voicing source and resonators. This also has an implication of sampling the parameters in synchronous with pitch and storing those parameters in the database and to use KlattGrid strategy for synthesis. However pitch synchronous sampling of parameters is tedious job and the number of samples to be stored is also high compared to the fixed time parameter update case. Hence a new method of database creation as described below can be used. 2.2 Database creation and synthesis strategy for Pitch Synchronous Parameter Update Method Pitch Synchronous Parameter Update Method requires parameters to be sampled once every pitch period and stored in database. However if the pitch contours, formant frequency contours and their corresponding bandwidth contours of kannada vowels and semi-vowels are observed carefully, it can be noticed that all those contours vary smoothly with respect to time. This fact helps to avoid sampling all the parameters once for every pitch period. Instead the contours can be fitted to polynomial curves and polynomial coefficients can be stored instead. The n th degree polynomial curve can be mathematically represented as P n (t) = a + a 1 t + a 2 t a n t n (5) where a, a 1,..., a n are the coefficients and t is the time index. Before the curve fitting is performed on any of the contours, the time axis is normalized to the range to 1. Fig 5 shows and example of curve fitting of pitch contour for kannada vowel /aa/ Time (s) 5 Frequency (Hz).3363 Time (s) Fig. 5: Pitch contour curve fitting for kannada vowel /aa/ 66

7 The top diagram of Fig 5 shows the recorded vowel sound /aa/ which is of.3363s duration. The middle graph shows the identified pitch values versus time. The bottom graph shows a fourth order polynomial curve that is fitted to pitch points after normalizing the time axis. The coefficients of the curve are a = , a1 = -24.9, a3 = , a4 = , a5 = In the database, for vowel /aa/, instead of storing the actual parameter contours, their respective polynomial coefficients are stored with one extra parameter of actual duration. In case of vowel /aa/ the actual duration is.3363s. To get back the value of any parameter at any required time instant say t 1. The equation representing contour say P(t) is to be evaluated at t = (t 1 /actual duration). 2.3 Synthesis Strategy Vowel and semi-vowel synthesis can be carried out in two phases. Phase 1 involves generation of voicing waveform and phase 2 involves filtering the voicing waveform generated by series of resonators. The first phase is explained in the below mentioned algorithm. 1) An empty buffer is created to hold the voicing waveform. 2) Sampling rate is fixed as Fs. Synthesis duration is specified. 3) Initialize the next parameter update time say τ = 4) Evaluate F, Av, Ah and OQ from their respective polynomials at t = τ/(synthesis duration). 5) Generate voicing waveform using Rosenberg model for the duration of (1/F) with evaluated F, Av, Ah and OQ. 6) Concatenate the generated voicing waveform with buffer. 7) Set τ = τ + (1/F). 8) If τ >= synthesis duration stop otherwise go to step 4. The second phase involves filtering of the voicing waveform with resonators one by one. The pitch synchronous filtering approach is explained for one resonator in below mentioned algorithm. 1) Read the frequency contour and bandwidth contour from the database corresponding to the resonator and sound and also read the F contour. 2) Initialize the next parameter update time say τ = 3) Evaluate resonance frequency F and resonance bandwidth B and F at t = τ/(synthesis duration). 4) Design filter with calculated F and B using equation (2) 5) Filter the voicing waveform portion τ to τ + (1/F) with above designed filter. 6) Set τ = τ + (1/F). 7) If τ >= synthesis duration go to step 8 otherwise go to step 3. 8) Repeat all above steps for each resonator. 3. IMPLEMENTATION The database for Kannada vowels and semi-vowels was created by first obtaining various contours using Praat tool. Praat provides an implementation of KlattGrid synthesizer which has two parts analysis system and synthesis system. Fig 6 shows pitch and formant frequency contours extracted using Praat s KlattGrid tool for Kannada vowel /a/. The contours were fitted to polynomial curves using Matlab s curve fitting tool. It was observed that most of the contours required a maximum of 6 th degree polynomial with exception of few contours which change rapidly. 67

8 5 Pitch Contour Frequency (Hz).3223 Time (s) 5 Formant Contours Formant frequency (Hz) Time (s) Fig. 6: Pitch and formant contours of kannada vowel /a/ The polynomial coefficients of all the required contours were stored in an XML file. Horner s method was employed to evaluate polynomial at any desired time instant. Fig 7 shows the pitch and formant contours calculated from polynomials for vowel /a/. 5 Pitch Contour 4 P itch in Hz t--> in s 5 Formant Frequency Contours Frequency in Hz t--> in s Fig. 7: Pitch and formant contours calculated from polynomials The Klatt synthesizer with proposed parameter update method was implemented using Matlab. The algorithms mentioned in section 2.3 were used to synthesize the vowel /a/. The synthesized vowel waveform along with its spectrogram and various contours are shown in Fig 8. 68

9 Fig. 8: Vowel /a/ generated with proposed changes Any contour can be shifted to a new level by just changing the a coefficient of the contours to the required value. This property finds its application in pitch level shifting and pitch matching with the adjacent sound units. Also due to process of normalization of time axis while curve fitting, the sound can be synthesized for any desired duration easily just by changing the synthesis duration parameter. Fig 9 shows the waveform of vowel /a/ synthesized for duration of.2 s. Fig. 9: Vowel /a/ generated for.2s 69

10 4. EVALUATION OF PROPOSED METHOD Kannada vowels and semi vowels were recorded and model parameters were extracted. One set of vowels and semi vowels were re-synthesized with existing KlattGrid synthesizer and other set with proposed changes. A group of Kannada language speakers were asked to identify the sounds which were played to them randomly from both the sets. This survey was conducted to check whether the re-synthesized vowels and semi-vowels were intelligible or not. All the sounds generated were correctly identified. A Mean Opinion Score (MOS) was also collected by playing both the sets of sound one after the other. 95% of the survey participants said that the set generated with proposed changes sounded more naturally than the other set. Remaining 5% of the participants said that the there was no difference between both the sets. 5. CONCLUSION The quality of the vowels and semi vowels synthesized using the proposed changes significantly increases the quality. The proposed method of storing the parameters also reduces the size of the databases. However the price to be paid for the increased quality is the increased number of computations to obtain the parameters from the polynomials. With the speed of modern processors, increased computational load does not pose any significant hindrance. 6. REFERENCES [1] Klatt, Dennis H, Software for a cascade/parallel formant synthesizer, The Journal of the Acoustical Society of America 67, No. 3, 198, pp [2] Klatt, Dennis H., and Laura C. Klatt, Analysis, synthesis, and perception of voice quality variations among female and male talkers, The Journal of the Acoustical Society of America 87, No. 2, 199, pp [3] Weenink, David, The klattgrid speech synthesizer, In INTERSPEECH, 29, pp [4] Rosenberg, Aaron E, Effect of glottal pulse shape on the quality of natural vowels, The Journal of the Acoustical Society of America 49, No. 2B, 2, pp [5] Jesus, Luis Miguel Teixeira de, Francisco Vaz, and José Carlos Principe, An Implementation of the Klatt Speech Synthesiser, Electrónica e Telecomunicações 2, No. 1, 212, pp [6] Mermelstein, Paul, Articulatory model for the study of speech production., The Journal of the Acoustical Society of America 53, No. 4, 25, pp

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Speech Synthesis Spring,1999 Lecture 23 N.MORGAN

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

An Implementation of the Klatt Speech Synthesiser*

An Implementation of the Klatt Speech Synthesiser* REVISTA DO DETUA, VOL. 2, Nº 1, SETEMBRO 1997 1 An Implementation of the Klatt Speech Synthesiser* Luis Miguel Teixeira de Jesus, Francisco Vaz, José Carlos Principe Resumo - Neste trabalho descreve-se

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels A complex sound with particular frequency can be analyzed and quantified by its Fourier spectrum: the relative amplitudes

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION by DARYUSH MEHTA B.S., Electrical Engineering (23) University of Florida SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW Hung-Yan GU Department of EE, National Taiwan University of Science and Technology 43 Keelung Road, Section 4, Taipei 106 E-mail: root@guhy.ee.ntust.edu.tw

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

ScienceDirect. Accuracy of Jitter and Shimmer Measurements

ScienceDirect. Accuracy of Jitter and Shimmer Measurements Available online at www.sciencedirect.com ScienceDirect Procedia Technology 16 (2014 ) 1190 1199 CENTERIS 2014 - Conference on ENTERprise Information Systems / ProjMAN 2014 - International Conference on

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8 WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels See Rogers chapter 7 8 Allows us to see Waveform Spectrogram (color or gray) Spectral section short-time spectrum = spectrum of a brief

More information

Improving Sound Quality by Bandwidth Extension

Improving Sound Quality by Bandwidth Extension International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-212 Improving Sound Quality by Bandwidth Extension M. Pradeepa, M.Tech, Assistant Professor Abstract - In recent

More information

Acoustic Phonetics. Chapter 8

Acoustic Phonetics. Chapter 8 Acoustic Phonetics Chapter 8 1 1. Sound waves Vocal folds/cords: Frequency: 300 Hz 0 0 0.01 0.02 0.03 2 1.1 Sound waves: The parts of waves We will be considering the parts of a wave with the wave represented

More information

Chapter 3. Description of the Cascade/Parallel Formant Synthesizer. 3.1 Overview

Chapter 3. Description of the Cascade/Parallel Formant Synthesizer. 3.1 Overview Chapter 3 Description of the Cascade/Parallel Formant Synthesizer The Klattalk system uses the KLSYN88 cascade-~arallel formant synthesizer that was first described in Klatt and Klatt (1990). This speech

More information

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals. XIV. SPEECH COMMUNICATION Prof. M. Halle G. W. Hughes J. M. Heinz Prof. K. N. Stevens Jane B. Arnold C. I. Malme Dr. T. T. Sandel P. T. Brady F. Poza C. G. Bell O. Fujimura G. Rosen A. AUTOMATIC RESOLUTION

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context. Speech Perception Map your vowel space. Record tokens of the 15 vowels of English. Using LPC and measurements on the waveform and spectrum, determine F0, F1, F2, F3, and F4 at 3 points in each token plus

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Acoustic Tremor Measurement: Comparing Two Systems

Acoustic Tremor Measurement: Comparing Two Systems Acoustic Tremor Measurement: Comparing Two Systems Markus Brückl Elvira Ibragimova Silke Bögelein Institute for Language and Communication Technische Universität Berlin 10 th International Workshop on

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

The source-filter model of speech production"

The source-filter model of speech production 24.915/24.963! Linguistic Phonetics! The source-filter model of speech production" Glottal airflow Output from lips 400 200 0.1 0.2 0.3 Time (in secs) 30 20 10 0 0 1000 2000 3000 Frequency (Hz) Source

More information

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Clemson University TigerPrints All Dissertations Dissertations 5-2012 GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Yiqiao Chen Clemson University, rls_lms@yahoo.com

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Wideband Speech Coding & Its Application

Wideband Speech Coding & Its Application Wideband Speech Coding & Its Application Apeksha B. landge. M.E. [student] Aditya Engineering College Beed Prof. Amir Lodhi. Guide & HOD, Aditya Engineering College Beed ABSTRACT: Increasing the bandwidth

More information

On the glottal flow derivative waveform and its properties

On the glottal flow derivative waveform and its properties COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF CRETE On the glottal flow derivative waveform and its properties A time/frequency study George P. Kafentzis Bachelor s Dissertation 29/2/2008 Supervisor: Yannis

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics Derek Tze Wei Chu and Kaiwen Li School of Physics, University of New South Wales, Sydney,

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

Basic Characteristics of Speech Signal Analysis

Basic Characteristics of Speech Signal Analysis www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Source-filter Analysis of Consonants: Nasals and Laterals

Source-filter Analysis of Consonants: Nasals and Laterals L105/205 Phonetics Scarborough Handout 11 Nov. 3, 2005 reading: Johnson Ch. 9 (today); Pickett Ch. 5 (Tues.) Source-filter Analysis of Consonants: Nasals and Laterals 1. Both nasals and laterals have voicing

More information

Digital Signal Representation of Speech Signal

Digital Signal Representation of Speech Signal Digital Signal Representation of Speech Signal Mrs. Smita Chopde 1, Mrs. Pushpa U S 2 1,2. EXTC Department, Mumbai University Abstract Delta modulation is a waveform coding techniques which the data rate

More information

Location of sound source and transfer functions

Location of sound source and transfer functions Location of sound source and transfer functions Sounds produced with source at the larynx either voiced or voiceless (aspiration) sound is filtered by entire vocal tract Transfer function is well modeled

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

Subtractive Synthesis & Formant Synthesis

Subtractive Synthesis & Formant Synthesis Subtractive Synthesis & Formant Synthesis Prof Eduardo R Miranda Varèse-Gastprofessor eduardo.miranda@btinternet.com Electronic Music Studio TU Berlin Institute of Communications Research http://www.kgw.tu-berlin.de/

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Glottal Spectral Separation for Speech Synthesis João P. Cabral, Korin Richmond, Member, IEEE, Junichi Yamagishi, Member, IEEE, and Steve Renals,

More information

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph XII. SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph A. STUDIES OF PITCH PERIODICITY In the past a number of devices have been built to extract pitch-period information from speech. These efforts

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

ENEE408G Multimedia Signal Processing

ENEE408G Multimedia Signal Processing ENEE408G Multimedia Signal Processing Design Project on Digital Speech Processing Goals: 1. Learn how to use the linear predictive model for speech analysis and synthesis. 2. Implement a linear predictive

More information

A Comparative Study of Formant Frequencies Estimation Techniques

A Comparative Study of Formant Frequencies Estimation Techniques A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

Vocal effort modification for singing synthesis

Vocal effort modification for singing synthesis INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Vocal effort modification for singing synthesis Olivier Perrotin, Christophe d Alessandro LIMSI, CNRS, Université Paris-Saclay, France olivier.perrotin@limsi.fr

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Digital Speech Processing- Lecture 14A Algorithms for Speech Processing Speech Processing Algorithms Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Single speech

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature ENEE408G Lecture-6 Digital Speech rocessing URL: http://www.ece.umd.edu/class/enee408g/ Slides included here are based on Spring 005 offering in the order of introduction, image, video, speech, and audio.

More information

The Channel Vocoder (analyzer):

The Channel Vocoder (analyzer): Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

Envelope Modulation Spectrum (EMS)

Envelope Modulation Spectrum (EMS) Envelope Modulation Spectrum (EMS) The Envelope Modulation Spectrum (EMS) is a representation of the slow amplitude modulations in a signal and the distribution of energy in the amplitude fluctuations

More information

CHAPTER 3. ACOUSTIC MEASURES OF GLOTTAL CHARACTERISTICS 39 and from periodic glottal sources (Shadle, 1985; Stevens, 1993). The ratio of the amplitude of the harmonics at 3 khz to the noise amplitude in

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

Nature of Noise source. soundsc (noise, 10000);

Nature of Noise source. soundsc (noise, 10000); Noise Sources Voiceless aspiration can be produced with a noise source at the glottis. (also for voiceless sonorants, including vowels) Noise source that is filtered through VT cascade, so some resonance

More information

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates. Digitized signals Notes on the perils of low sample resolution and inappropriate sampling rates. 1 Analog to Digital Conversion Sampling an analog waveform Sample = measurement of waveform amplitude at

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Quarterly Progress and Status Report. Formant amplitude measurements

Quarterly Progress and Status Report. Formant amplitude measurements Dept. for Speech, Music and Hearing Quarterly rogress and Status Report Formant amplitude measurements Fant, G. and Mártony, J. journal: STL-QSR volume: 4 number: 1 year: 1963 pages: 001-005 http://www.speech.kth.se/qpsr

More information

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER X. SPEECH ANALYSIS Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER Most vowel identifiers constructed in the past were designed on the principle of "pattern matching";

More information

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis Signal Analysis Music 27a: Signal Analysis Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD November 23, 215 Some tools we may want to use to automate analysis

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Page 0 of 23. MELP Vocoder

Page 0 of 23. MELP Vocoder Page 0 of 23 MELP Vocoder Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants Foundations of Language Science and Technology Acoustic Phonetics 1: Resonances and formants Jan 19, 2015 Bernd Möbius FR 4.7, Phonetics Saarland University Speech waveforms and spectrograms A f t Formants

More information

Parameterization of the glottal source with the phase plane plot

Parameterization of the glottal source with the phase plane plot INTERSPEECH 2014 Parameterization of the glottal source with the phase plane plot Manu Airaksinen, Paavo Alku Department of Signal Processing and Acoustics, Aalto University, Finland manu.airaksinen@aalto.fi,

More information

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model Acoust Aust (2016) 44:187 191 DOI 10.1007/s40857-016-0046-7 TUTORIAL PAPER An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model Joe Wolfe

More information

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13 Acoustic Phonetics How speech sounds are physically represented Chapters 12 and 13 1 Sound Energy Travels through a medium to reach the ear Compression waves 2 Information from Phonetics for Dummies. William

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

Source-Filter Theory 1

Source-Filter Theory 1 Source-Filter Theory 1 Vocal tract as sound production device Sound production by the vocal tract can be understood by analogy to a wind or brass instrument. sound generation sound shaping (or filtering)

More information

Glottal source model selection for stationary singing-voice by low-band envelope matching

Glottal source model selection for stationary singing-voice by low-band envelope matching Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,

More information

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II 1 Musical Acoustics Lecture 14 Timbre / Tone quality II Odd vs Even Harmonics and Symmetry Sines are Anti-symmetric about mid-point If you mirror around the middle you get the same shape but upside down

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

A() I I X=t,~ X=XI, X=O

A() I I X=t,~ X=XI, X=O 6 541J Handout T l - Pert r tt Ofl 11 (fo 2/19/4 A() al -FA ' AF2 \ / +\ X=t,~ X=X, X=O, AF3 n +\ A V V V x=-l x=o Figure 3.19 Curves showing the relative magnitude and direction of the shift AFn in formant

More information

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model HMM-based Speech Synthesis Using an Acoustic Glottal Source Model João Paulo Serrasqueiro Robalo Cabral E H U N I V E R S I T Y T O H F R G E D I N B U Doctor of Philosophy The Centre for Speech Technology

More information

Automatic Evaluation of Hindustani Learner s SARGAM Practice

Automatic Evaluation of Hindustani Learner s SARGAM Practice Automatic Evaluation of Hindustani Learner s SARGAM Practice Gurunath Reddy M and K. Sreenivasa Rao Indian Institute of Technology, Kharagpur, India {mgurunathreddy, ksrao}@sit.iitkgp.ernet.in Abstract

More information