STRUCTURE-BASED SPEECH CLASSIFCATION USING NON-LINEAR EMBEDDING TECHNIQUES. A Thesis Proposal Submitted to the Temple University Graduate Board

Size: px
Start display at page:

Download "STRUCTURE-BASED SPEECH CLASSIFCATION USING NON-LINEAR EMBEDDING TECHNIQUES. A Thesis Proposal Submitted to the Temple University Graduate Board"

Transcription

1 STRUCTURE-BASED SPEECH CLASSIFCATION USING NON-LINEAR EMBEDDING TECHNIQUES A Thesis Proposal Submitted to the Temple University Graduate Board in Partial Fulfillment of the Requirements for the Degree Master of Science in Engineering By Uchechukwu Ofoegbu May, 24 Dr. Robert Yantorno Thesis Advisor Dr.Saroj K. Biswas Director of Graduate Studies College of Engineering Committee Member Dr.Musoke H. Sendaula Graduate Director Electrical & Computer Engineering Committee Member

2 ABSTRACT Usable speech is referred to as those portions of corrupted speech which can be used in determining a reasonable amount of distinguishing features of the speaker. It has previously been shown that the use of only voiced segments of speech improves the usable speech detection system, and also, that unvoiced speech does not contributes significant information about the speaker(s) for speaker identification. Therefore, using a voiced/unvoiced speech detection system, voiced portions of co-channel speech are usually detected and extracted for use in usable speech extraction systems. The process of human speech production is complex, nonlinear and nonstationary. Its most precise description can only be realized in terms of nonlinear fluid dynamics. Traditionally, though, it has been described using linear techniques such as source-filter model and spectral analysis. These techniques work very well for many aspects of speech analysis, but they are inherently limited in their ability to describe the true dynamics of speech production. In this research, a non-linear speech classification approach is proposed, which classifies speech based on features extracted after processing the input signal via an embedding technique known as Takens Method of Delays. Unvoiced speech and useable speech are similar in structure, as the former is noise-like in nature, while the latter constitutes the presence of a significant amount of interference. Likewise, the structure of voiced speech is comparable to that of usable speech. Based on this, the proposed technique attempts to classify speech as both voiced or unvoiced and usable or unusable, using different features extracted from the embedded signals. Preliminary experiments have shown that this technique is capable of correctly detecting 96% of voiced speech (with 1% false alarms) and 9% of unvoiced speech (with 4% false alarms) in a noisefree environment. ii

3 TABLE OF CONTENTS ABSTRACT...ii TABLE OF CONTENTS...iii LIST OF EQUATIONS...v LIST OF FIGURES...vi CHAPTERS 1. INTRODUCTION Motivation Nonlinear Embedding Problem Statement and Research Goals Scope of Research Disclaimer Organization of Thesis Proposal BACKGROUND Literature Review Voiced and Unvoiced Speech Usable and Unusable Speech Traditional Voiced/Unvoiced Detection Measures Energy and Zero-Crossings st Order Reflection Coefficients and Residual Energy Usable/Unusable Detection Measures SAPVR APPC CURVATURE MEASURE Introduction to Curvature...21 iii

4 3.2. Preliminary Research: Voiced/Unvoiced Classification Noise and Filtering Experiments and Results End-Point Detection Result Comparisons Proposed Research: Voiced/Unvoiced/Silence Classification NODAL DENSITY MEASURE Introduction to Nodal Density Preliminary Research: Usable/Unusable Classification Preliminary Experiments and Results Discussion Proposed Research: Voiced/Unvoiced Classification DIFFERENCE-MEAN COMPARISON MEASURE Introduction to Difference-Mean Comparison Experiments and Results SUMMARY BIBLIOGRAPHY...55 iv

5 LIST OF EQUATIONS Equation Page 1.1 Vector-Valued Trajectory Formed by Takens Method of Delays Target to Interferer Ratio (TIR) Energy First Order Reflection coefficient Denominator for First Order Reflection coefficient Numerator for First Order Reflection coefficient TNB Matrix (Serret-Frenet) Theorem Curvature Curvature Estimation Elemental Arc Length of the Discrete Embedding Curve Moving Average Filter NxN 1 st Order Difference 27 v

6 LIST OF FIGURES Figure Page 2.1 Error! Reference source not found Voiced/Unvoiced Detection Using Energy and Zero-Crossings Signal Generated by Speech Utterance, its Zero-Crossing Rate and Its Energy Spectrum st Order Reflection Coefficient and Residual Energy Plotted Against Corresponding Speech Utterance SAPVR Based Usable/Unusable Speech Separation Process Sampled Signal, FFT Magnitude and Spectral Autocorrelation of sigle spaeker and co-channel speech Single Speaker Voiced Speech and Its Adjacent Pitch Period Amplitude Comparison Co-channel Voiced Speech and Its Adjacent Pitch Period Amplitude Comparison TNB Frame Classification Embedded Voiced and Unvoiced Speech Frames Curvature and energy plotted against corresponding speech utterance Curvature distribution for clean speech Curvature distribution for clean speech and speech + pink noise at 15db SNR Curvature distribution for clean speech and speech + white noise at 15db SNR Curvature distribution for clean speech and speech + pink noise at 15db SNR after filtering Curvature distribution for clean speech and speech + white noise at 15db SNR after filtering vi

7 3.9 Curvature-based decision process ROC for different noise states + voiced speech ROC for different noise states + unvoiced speech Curvature-based decisions for clean speech Curvature-based decisions for corresponding speech + added pink noise at 15dB SNR Curvature-based decisions for corresponding speech + added white noise at 15dB SNR Speech data, actual classification, curvature-based classification, and difference between actual and curvature-based classifications Comparisons of hits minus false alarms for voiced speech Comparisons of hits minus false alarms for unvoiced speech Embedded voiced, unvoiced and background speech frames with added pink noise at 15dB SNR Embedded voiced, unvoiced and background speech frames with added white noise at 15dB SNR Embedded voiced, unvoiced and background speech frames with added pink noise at 15dB SNR after filtering Embedded voiced, unvoiced and background speech frames with added white noise at 15dB SNR after filtering Embedded frame for co-channel speech of 3dB TIR Embedded frame for co-channel speech of 1dB TIR Embedded frame for co-channel (usable) speech of 3dB TIR, gridded to show nodes spanned Embedded frame for co-channel (unusable) speech of 1dB TIR, gridded to show nodes spanned Nodes spanned by embedded frame for co-channel (usable) speech of 3dB TIR vii

8 4.6 Nodes spanned by embedded frame for co-channel (unusable) speech of 1dB TIR ROC curve for usable speech detection using the nodal density approach Embedded voiced speech, gridded to show nodes spanned Embedded unvoiced speech, gridded to show nodes spanned Nodes spanned embedded voiced speech Nodes spanned by embedded unvoiced speech Difference-Mean Comparison distribution for clean speech Difference-Mean Comparison distribution for clean speech and speech plus pink noise at 15db SNR Difference-Mean Comparison distribution for clean speech and speech plus white noise at 15db SNR Classifier characteristic curves for varying difference-mean comparison values for clean voiced and unvoiced speech Classifier characteristic curves for varying difference-mean comparison values for voiced and unvoiced speech plus pink noise at 15dB SNR Classifier characteristic curves for varying difference-mean comparison values for voiced and unvoiced speech plus white noise at 15dB SNR Hits minus false alarms for voiced speech Hits minus false alarms for unvoiced speech viii

9 CHAPTER 1: INTRODUCTION 1.1. Motivation Speech signals can be corrupted by two types of interference, background noise, or another speaker s speech. The performance of speaker identification systems is known to be adversely affected by the presence of such interferences. Various techniques exists for the reduction or elimination of noise distortions in signals (including speech), however, due to the non-stationary properties of speech, complete removal of speech interferences has been a challenge to the speech processing industry. Speech interference occurs when two or more speakers are speaking simultaneously over the same channel without a significant difference in their overall energy. This research focuses on two speakers speaking through the same channel at the same time. The resulting speech is commonly termed co-channel speech. When the energies of the target and interferer speeches are approximately equal, certain portions still exist in co-channel speech in which the energy of one speaker is greater than the energy of the other speaker. These portions are termed usable while the other portions are termed unusable. The use of only usable portions of speech has been shown improve the performance of speaker identification systems (Lovekin, et. al., 21a), (Iyer et. al., 24). A Target (energy) to Interferer (energy) ratio (TIR) magnitude of 2dB is considered a suitable threshold for usable/unusable speech classification. 1

10 Previous research (Lovekin, et al., 21a) has shown the inability of unvoiced speech to contribute necessary information about the speaker for speaker identification due to its noise-like structure; therefore voiced portions of speech are extracted, using voiced/unvoiced classifiers, for use in usable speech extraction systems. Much research has been performed for the categorization of speech segments as voiced or unvoiced, which has led to the development of traditional voiced/unvoiced classifiers such as Energy and Zero-Crossings (Atal and Rabiner,1976) and First-Order Reflection Coefficients and Residual Energy (Childers, 2). These techniques are restricted in their capability to take into consideration the non-linear characteristics of the signals on question, and will therefore result in the omission of vital acoustic features, thereby leading to reduction in the accuracy of the speech classifier. Usable speech classification techniques have also been introduced, which use linearbased approaches such as Spectral Auto-Correlation Peak-to-Value Ratio (Krishnamachari, et al., 21), and Adjacent Peak Period Comparison (Lovekin, et al., 21b) along with others (Iyer et. al., 24), (Krishnamachari et. al., 2), (Kizhanatham et. al., 22), (Smolenski, et. al., 22), (Sundaram et. al., 23), (Yantorno, 1998). As mentioned, these methods do not take into account all the nonlinear features of the signal, thereby ignoring valuable characteristics arising which could lead to more precise distinctions between heavily and slightly distorted speech signals. Due to the inability of linear-based speech classification systems to account for nonlinear features in speech production, the necessity arises to develop a non-linear based 2

11 method, hence the non-linear embedding technique, which is discussed in the next section. Unvoiced speech and unusable speech are similar in structure, as the former is noise-like in nature, while the latter constitutes the presence of a significant amount of interference. Likewise, the structure of voiced speech is comparable to that of usable speech. Based on this, the proposed technique attempts to classify speech as both voiced or unvoiced and usable or unusable Non-Linear Embedding In this section, Takens method of delays (Takens, 1981), a widely used technique in the analyses of chaotic signals, especially in bio-engineering, is discussed as well as its application to speech classification. Voiced speech is generated by a comparatively low-dimensional nonlinear dynamical system (Kubin, 1995). It is not viable to directly observe the degrees of independence of the state variables of this system. Consequently, the problem arises as to how to obtain and depict the underlying low-dimensional dynamics from the one-dimensional observable speech signal. In other words, how can the apparent one-dimension signal obtained from speech be reconstructed to illustrate the actual dynamics of the speech production system. One of the most popular representations of the chaotic nature of signals can be attained via Takens embedding theorem, which states that it is possible to reconstruct a state space representation topologically equivalent to the original state space of a system from a single observable dimension. Nonlinear dynamic progression of 3

12 speech can be observed as a vector which travels along a phase (or state) space trajectory, where the coordinates of the point are the degrees of independence of the system. The procedure for implementing Takens theorem is as follows: First the time series x(t), which is the speech signals in our case, is accumulated in an array, {x(t)} (usually, the speech signals are given as a vector, and, therefore, the accumulation is already performed). A lag or time delay, d, and an embedding dimension, m, are used so as to form the vector-valued trajectory, V(t) = [v 1 (t), v 2 (t), v 3 (t),, v m (t)] (1.1) Where v 1 (t) = x(t) v 2 (t) = x(t-d) v 3 (t) = x(t-2d).. v m (t) = x(t-(m-1)d) Takens has shown that, provided the embedding dimension, m, is greater than twice the original dimension of the time series, x(t), {V(t)} will be an embedding of {X(t)}, and, in theory, the dynamics of V(t) posses the same qualitative characteristics as those of X(t), regardless of the lag, d. Due to the non-stationary process associated with speech production, the embedding procedure is applied to short consecutive segments. Based on the knowledge that the generation of voiced speech constitutes a low-dimensional system as compared to the higher-dimensional nature of the unvoiced speech generation system (Kubin, 1995), an embedding dimension, m can be chosen to be greater than twice the original dimension of 4

13 the speech signal, and yet, small enough to clearly distinguish between voiced and unvoiced speech. A dimension of 3, for instance, meets the requirement, m > n (where n = 1, the original dimension of speech), and is also sufficient to construct well structured trajectories of the voiced and usable speech signals. However, since unvoiced speech is of a much higher dimension, the structure generated will be chaotic and highly random in nature. Therefore, choosing m = 3 will result in an unambiguous distinction between voiced and unvoiced speech. The delay constant, d, should be large enough for a reconstructed trajectory to be maximally open in state space on average, but relatively small in order to preserve the time resolution of the signal. A constant value of d=12 was found to provide a good discrimination between structured (voiced) and unstructured (unvoiced) speech (Terez, 22). The presence of a significant amount of interfering speech in a voiced speech signal will adversely affect the structure of the signal, giving it a more unvoiced-like structure, hence the use of the embedding technique as a viable candidate for usable speech classification Problem Statement and Research Goals Performing speaker identification on speech that has been corrupted by interfering speech at a small (less than 15dB) Target to Interferer ratio leads to degradation of the system performance. However, since there exist portions of co-channel speech with relatively 5

14 high (above 2dB) TIR (i.e., usable speech), the low TIR portions can be removed in order to minimize the effect of the interfering speech. The idea of usable speech is novel; therefore, a new technique is presented here, which could analyze co-channel speech in ways that currently existing methods cannot. Based on the low information content of unvoiced speech in speaker identification, the separation of voiced and unvoiced speech is necessary in order to process only speech segments that are appropriate for the speaker identification system. A novel voiced/unvoiced classification technique, based on non-linear modeling of speech signals, is presented here Scope of Research In various multiple-way communication systems, co-channel speech is usually encountered, leading to significant distortion in the output of the system, hence the need for an effective usable speech classification system. One possible application of usable speech extraction system is the identification of the target pilot s speech amongst various aircraft pilots speaking over the same channel, at the same time and with about the same overall signal energy. In usable speech detection systems, unvoiced speech segments are usually detected and removed (based on their unimportance to the system) using voiced/unvoiced classifiers. 6

15 Other than for use in usable speech extraction systems, voiced/unvoiced classifiers are also applied in various acoustic speech processing techniques such as speech recognition and speaker recognition Disclaimer It must be noted that all speech data used in this research was obtained from the TIMIT database, which is widely used by most researchers in the speech processing field. Recordings were performed in a very controlled environment, using professional recording equipment, thereby resulting in the production of high quality speech. Therefore, although the addition of various types/levels of noise to the input speech signal has been investigated, the performance of the voiced/unvoiced classifier and usable speech extraction system presented in this research may be degraded with the use of signals of less quality Organization of Thesis Proposal In this thesis proposal, classification of speech signals into structured (voiced, usable) and unstructured (unvoiced/unusable) is investigated. Fundamental descriptions of co-channel speech, voiced-unvoiced speech and non-linear embedding are presented in the current chapter. Chapter 2 covers reviews of voiced/unvoiced and usable/unusable speech classification. In Chapter 3, the curvature measure is introduced, as well as its application in voiced/unvoiced classification. Some preliminary experiments and obtained results are 7

16 presented. The application of this measure to voiced/unvoiced/background classification is also introduced. The density measure is introduced in Chapter 4. The application of this measure to usable in speech detection is also discussed. Also, the implementation of this measure in voiced/unvoiced classification is proposed in this Chapter. In the 5 th Chapter, the difference-mean comparison measure is introduced, along with its application to voiced/unvoiced classification. The 6 th Chapter, which is the summary, concludes this proposal, and discusses possible future work, which includes fusing the introduced features to obtain one optimal voiced/unvoiced classifier. 8

17 CHAPTER 2: BACKGROUND 2.1. Literature Review In this section, the concepts of voiced/unvoiced, and usable/unusable speech are discussed in detail Voiced and Unvoiced Speech Voiced speech is produced by an air flow of pulses caused by the vibration of the vocal cords. The resulting signal could be described as quasi-periodic waveform with high energy and high adjacent sample correlation. On the other hand, unvoiced speech, which is produced by turbulent air flow resulting from constrictions in the vocal tract, is characterized by a random aperiodic waveform with low energy and low correlation. Figure 2.1 below illustrates the difference between voiced and unvoiced speech signals. 1 Voiced Speech 1 Unvoiced Speech Amplitude Sample Number Sample Number 9

18 Figure 2.1: Illustration of periodic nature of voiced speech (left panel) versus aperiodic nature of unvoiced speech (right panel). Note in Figure 2.1 the periodic structure in the voiced frame, as opposed to the random structure of the unvoiced frame. Observe, also, the difference in the maximum amplitude of each of the frames. The maximum amplitude of the voiced frame is 1,, while that of the unvoiced frame is 1,, indicating that voiced speech is much lower in energy than unvoiced speech. Accurately classifying speech signals as voiced or unvoiced is essential in speech analysis techniques such as speaker recognition/identification, speech recognition, speech synthesis and speaker count. As discussed earlier, many features exist in speech signals for distinguishing between voiced and unvoiced portions, and some of these features have been previously investigated and will be discussed in subsequent sections of this proposal Usable and Unusable Speech The concept of usable speech is derived from the fact that, not all portions of speech corrupted by co-channel interference are unusable for speech processing. In this research, usability of speech is defined with respect to Target-to-Interferer Ratio (Yantorno, 1999), (Smolenski 24). 1

19 The ratio of target energy to interferer energy in decibels (db) is referred to as Target to Interferer Ratio (TIR), which is expressed as E t TIR = 1log 1 E db..(2.1) i Where E t is the energy of target speech, and E i is the energy of the interfering speech. Experiments have shown that co-channel speech segments with TIR values of 2dB or greater are only minimally corrupted, and can therefore be effectively used in speaker identification (Yantorno, 1999). Attempts have been made to develop usable speech measures having high correlation with TIR, such that, even without knowledge of its TIR, an input speech frame can be classified as usable or unusable. The portions identified as usable can then be extracted for use in speaker identification and other speech processing systems. Some of the prior usable/unusable speech classification methods are discussed in subsequent sections, and a novel approach to usable/unusable speech detection is being introduced in this research Traditional Voiced/Unvoiced Detection Measures Energy and Zero-Crossings (E/ZC) The energy and zero-crossings approach (Atal, et al., 1976) is one of the traditional voiced/unvoiced speech classification techniques. The energy technique is based on the difference in amplitude (and therefore, energy) between voiced and unvoiced speech. In the previous chapter, it was demonstrated that voiced speech constitutes signals of much higher energy than unvoiced speech. The zero-crossings approach, which involves 11

20 counting the number of times the signal crosses the x-axis, is based on the knowledge that unvoiced speech signals, being more noise-like in nature, oscillate much faster than voiced speech signals. Therefore, the zero-crossing rates of voiced signals should be lower than those of unvoiced signals. The procedure for the energy and zero-crossing method, given in Figure 2.2 below, is as follows: First the input speech signal is passed through a highpass filter for the removal of any dc components that might be present. The output of the highpass filter is then separated into frames of about 128 samples. The number of zero-crossings is then computed for each frame, as well as the energy of the speech frame, which is obtained from the equation: Energy, E= x(n) 2...(2.2), where x(n) is the speech signal. Voiced/unvoiced speech classification is them performed based on the output of the parameters.. 12

21 Speech Signals, s(n) Measurements Highpass Filter Sampling Block X(n) Zero-Crossings Energy Compute Minimum Distance Select Minimum Distance Voiced/Unvoiced Decision Figure 2.2: Voiced/unvoiced detection using energy and zero-crossings. Figure 2.3 below shows the energy and zero-crossings for a speech segment consisting of both voiced and unvoiced speech, computed on a sample-by-sample basis. The file is the representation of the phrase will serve, and the samples between 4 8 represent the unvoiced sound, /s/, in the word, serve. In the figure, the high zero-crossing rate of unvoiced speech is readily observed, along with the high energy of voiced speech signals. 13

22 Figure 2.3: Speech utterance, will serve (top panel), its zero-crossing rate (middle panel) and its energy (bottom panel) First Order Reflection Coefficient/Residual Energy (FR/RE) Voiced/unvoiced classification has also been developed using the first order reflection coefficient and the residual energy of the speech signals (Childers, 2). The reflection coefficient, obtained by modeling the vocal tract as a concatenation of tubes, determines the amount of volume-velocity reflection that can be found at the intersection of two tubes. Due to its high energy, voiced speech possesses a high amount of volume-velocity 14

23 as compared to unvoiced speech. Significant information in speech is usually contained in the first coefficient, hence the use of the first order reflection coefficient, r 1 which can be expressed by: r 1 = Rss (1) Rss ()..(2.3) Where R R ss ss N 1 () = snsn ( ) ( ) N n = 1 (2.4) N 1 1 (1) = s( n) s( n+ 1) N n= 1 (2.5) N is the number of samples in the analysis frame and s(n) are the speech samples. The residual energy is the energy of the signal that has been inverse filtered using the LPC (Linear Predictive Coding) coefficients. The chaotic nature of an unvoiced speech signal results in a low residual energy as compared to a voiced speech signal. Figure 2.4 below shows the first order reflection coefficient and the residual energy of a given speech signal. The green line on the top panel is the threshold for voiced/unvoiced classification. 15

24 Figure 2.4: First order reflection coefficient (top panel) and residual energy (bottom panel) plotted (in blue) against corresponding speech utterance (black voiced and red - unvoiced) Traditional Usable/Unusable Speech Detection Measures Spectral Autocorrelation Peak-to-Valley Ratio (SAPVR) The SAPVR measure (Krishnamachari, et al., 21), was the first usable speech detection technique to be introduced. In this method, the ratio of peaks to valleys of the spectral autocorrelation of the input speech signal is computed. Voiced, single speaker speech (or 16

25 co-channel speech with high TIR) is highly structured, and posses a well-defined harmonic structure in the frequency domain, as opposed to the random structure of multispeaker speech. The spectral autocorrelation of usable co-channel speech, results in welldefined peaks and valleys, and, hence, a high peak to valley ratio as compared to unusable speech. The SAPVR usable/unusable speech classification process (given in Figure 2.5 below) is as follows: A 32-point hamming window is used to sample the input speech signal. The FFT of the windowed samples is computed. Autocorrelation is then performed on the magnitude FFT. The peaks and valleys of the resulting autocorrelation are determined using a peak-picking algorithm. The ratio of the peak to the valley is computed and is compared to a threshold which was chosen to distinguish between usable and unusable frames. Finally, frames above the threshold are considered usable, and extracted for such applications as speaker identification purposes. Speech signal Hamming Window FFT Autocorrelation Peak Picking algorithm Usable/Unusable decision Figure 2.5: SAPVR Based Usable/Unusable Speech Classification Process 17

26 Figure 2.6 below shows a frame of speech and associated FFT and spectral autocorrelation. The speech signals were sampled and windowed with a 32-point hamming window, 5% overlap and 128-point zero-padding. 1 x 1 4 SAPVR Study - fvmh-i--8k & madc-zero-sil - male speec 2 SAPVR Study - madc-zero-sil & mkls-s6i - male speech Amplitude Amplitude Sample Number 4 x 1 5 Magnitude Spec Autocorr Sample Number 6 x Sample Number Sample Number 2 x 1 4 Magnitude Spec. Autocorr Sample Number 6 x Sample Number Figure 2.6: Speech signal (top panel), fft magnitude (middle panel) and spectral autocorrelation (bottom panel) of single speaker (left) and co-channel (right) speech. From the above figure, it is evident that the peaks of the SAPVR of single speaker speech are relatively high (bottom left panel) as compared with that of co-channel speech of low TIR value (bottom right panel). This measure was capable of correctly identifying 73% of usable frames (defined based on TIR value) with about 25% false alarms (Krishnamachari, et al., 21). 18

27 Adjacent Pitch Period Comparison (APPC) Voiced speech is known to be periodic in nature; therefore, its adjacent pitch periods are similar in shape. However, the presence of interfering voiced speech creates dissimilarity in adjacent pitch periods of co-channel speech. The APPC measure (Lovekin, et al., 21), takes advantage of this difference in the adjacent pitch periods of single and cochannel speech in the development of a usable/unusable speech detection system. The concept of this measure is the comparison of sample-by-sample variations of adjacent pitch periods of the speech signal. With single speaker voiced speech, a comparison of adjacent pitch periods will yield minimal sample-by-sample variations, and an accurate length of the pitch period can be easily obtained. However, with the presence of interfering speech, adjacent pitch period comparison results in large variations, and the estimation of the pitch period length could also be inaccurate. Ironically, this inaccurate pitch period estimation, occurring with co-channel speech, leads to an increase in correct usable/unusable speech detection due to the fact that, the more inaccurate the selected pitch period lengths are, the greater the dissimilarity between the pitch periods, and, hence, the larger the variations. The APPC process is as follows: The length, N, of each reference pitch period is computed as the distance between the zero-lag point and the next highest formed by the autocorrelation matrix of the next 1ms. The adjacent pitch period is then considered as samples N+1 to 2N+1. 19

28 It should be noted that, in this method, changes in length from one pitch period to its neighboring pitch period are ignored. Figures 2.7 and 2.8 below show the amplitude comparisons of the single (upper) and cochannel (lower) speech signals, respectively..8 U sable S peech.6 Normalized Amplitude Normalized Amplitude S a m p le N u m b e r A m p litu d e C o m p a r is o n o f A d ja c e n t P itc h P e r io d s.8 R e fe re n c e P itc h P e rio d.6 A d ja c e n t P it c h P e r io d S a m p le N u m b e r Figure 2.7: Single speaker voiced speech (top panel) and its adjacent pitch period comparison (bottom panel)..6 C o-channel S peech.4 Normalized Amplitude S am ple Num ber.6 A m plitude C om parison of A djacent P itch P eriods.4 Normalized Amplitude Reference P itch P eriod A djacent P itch P eriod S am ple Num ber Figure 2.8: Co-channel voiced speech (top panel) and its adjacent pitch period comparison (bottom panel). 2

29 This measure was able to correctly identify 75% of usable frames (defined based on TIR value), with about 25% false alarms (Lovekin, et al., 21). 21

30 CHAPTER 3: CURVATURE MEASURE 3.1. Introduction to Curvature In an attempt to obtain a mathematical quantification for the difference between embedded voiced and unvoiced signals, the curvature measure (Smolenski, 24) was developed using the Serret-Frenet theorem (Rahman & Mulolani, 21). This theorem states that any 3-dimensional space curve can be completely characterized by the following matrix equation: = B N T B N T... κ κ τ τ (3.1) Where κ = curvature, τ = torsion and T, B, and N are the axes shown in Figure 3.1 below, and the derivatives are with respect to s, the arc length of the curve. Figure 3.1: TNB Frame Classification 22

31 The curvature, which is being considered in this research, is defined as the rate of rotation of the tangent at a point, P, as P moves along a given trajectory. In other words, the curvature measures the angle between any three points on the trajectory, and can also be considered as the reciprocal of the radius of curvature. Curvature can be expressed by the following equation: κ = lim s θ (3.2) s Where θ = the angle between the tangents to the curve and φ = the angle between the binormals to the curve. However, the space curve formed by the state space embedding procedure is actually a sampled version of the original phase space trajectory, therefore, the curvature, as well as other variables in the equation, must be approximated from the discrete embedding curve. The discrete curvature estimation is given by: K n = cos 1 Α ( Α n n Α Α n+ 1 n+ 1 ).(3.3) Where Α n = [ x n xn 1, yn yn 1, zn zn 1].(3.4) is the elemental arc length of the discrete embedding curve; and x, y and z are the coordinates of the embedded signals. 23

32 3.2. Preliminary Research: Voiced/Unvoiced Classification Figure 3.2 below shows the embedded signals of voiced and unvoiced speech frame consisting of 128 sample points. Note the difference between the structure of the embedded voiced speech and that of the unvoiced speech. It is evident that the angle between any three points on the trajectory will be much greater for the embedded voiced signals than the embedded unvoiced signal. Embedded Voiced Speech Embedded Unvoiced Speech Figure 3.2: Embedded voiced (left panel) and embedded unvoiced (right panel) speech frames Figure 3.3 below shows the sample-by-sample curvature values (black) plotted against the corresponding speech segment (blue), which consists of voiced and unvoiced speech. The negative of the speech signal energy, computed using the traditional energy measure discussed earlier, was also plotted (in red) against the speech signal for comparison purposes. Note the correlation between energy and curvature in the utterance. 24

33 Figure 3.3: Curvature (black) and energy (red) plotted against speech utterance (blue). Usually, voiced/unvoiced classification is performed on a frame-by-frame basis, due to the difficulty of finding a class using a single sample. Obtaining a common result for each speech frame eliminates any single-sample detection errors made by the curvature algorithm. However, frame-by-frame classification of speech can lead to an overapproximation of some short segments. Moreover, voiced and unvoiced start and endpoints could also be inaccurately detected due to the averaging of the decision value. In this research, this problem is resolved by segmenting the speech signals into relatively small frames (about 15ms) before processing. Figure 3.4 below shows a histogram of curvature of labeled voiced and unvoiced speech signals obtained from the TIMIT database. The blue bars represent the voiced distribution, while the red represents the unvoiced distribution. Note the separation between the two distributions. 25

34 .18 Curvature Distribution for Clean Speech Voiced Unvoiced Relative Counts Curvature Figure 3.4: Curvature distribution for clean speech, voiced blue, unvoiced red Noise and Filtering It must be noted that the data used in Figure 3.4 (in the previous section) was obtained for clean speech. However, due to the presence of noise in most speech communication channels, a robust measure is required. Pink noise, a type of noise that flickers throughout the signal is sometimes found in speech. This category of noise is sometimes referred to as 1/f noise because its power spectra P(f), as a function of frequency, can be expressed as: P(f) = 1/f a, where a is very close to 1. The curvature distribution of speech corrupted by pink noise at 15dB shows that pink noise has minimal effect on the accuracy of the curvature measure. This is illustrated in Figure 3.5 below. 26

35 Curvature Distribution for Clean Speech Curvature Distribution for Speech + Pink Noise of 15dB SNR Voiced IUnvoiced Voiced Unvoiced Relative Counts Relative Counts Curvature Curvature Figure 3.5: Curvature distribution for clean speech (left panel) and speech + pink noise at 15db SNR (right panel), voiced blue, unvoiced red. On the other hand, white noise, the most common type of noise found in speech signals, has an adverse effect on the curvature measure, this is illustrated in Figure 3.6 below. Relative Counts Curvature Distribution for Clean Speech Voiced IUnvoiced Curvature Distribution for Speech + White Noise of 15dB SNR Relative Counts Voiced Unvoiced Curvature Curvature Figure 3.6: Curvature distribution for clean speech (left panel) and speech + white noise at 15dB SNR (right panel), voiced blue, unvoiced red. From Figure 3.6, it is observed that, with the presence of white noise in the speech signal, the curvature pdf for voiced speech shifts to the left of the curve, in other words, 27

36 the discriminative power of the measure reduces. This can be explained by the chaotic nature of white noise, which introduces disorganization in the well-defined structure of the embedded voiced signal, thereby decreasing the curvature value for the voiced samples, i.e., making voiced speech unvoiced-like. In order to minimize the effect of noise on the speech signals, a 1 th order (11 point) moving average filter is used as a pre-processing block for the input speech signal. A moving average has been chosen because it is very easy to implement, and yet, optimal in the simple task of reducing chaotic noise signals while maintaining a relatively sharp impulse response. The expression of an M-point moving average filter is given by:.(3.5) Where x and y are the input and output of the filter, respectively. Since the significant information in speech signals is found in the low frequency components of the signal, the moving average filter, which is a lowpass filter, minimizes the effects of noise on the signal, while retaining the information need for voiced/unvoiced classification. The curvature voiced/unvoiced distributions for clean speech and speech + with pink and white noise at 15dB SNR after filtering are given in Figure 3.7 and 3.8 below. It can be observed, from the figures below, that the performance of the curvature measure is not degraded by filtering clean speech or speech with pink noise. Note, however, that, in the case of white noise, filtering causes the voiced distribution to shift towards the right, making the distribution more like the distribution of clean speech. Therefore, the moving 28

37 average filter is very effective in reducing the effect of noise on the performance of the curvature measure. Curvature Distribution for Filtered Clean Speech Curvature Distribution for Speech + Pink Noise of 15dB SNR.5 Voiced Unvoiced.5 Voiced Unvoiced.4.4 Relative Counts.3.2 Relative Counts Curvature Curvature Figure 3.7: Curvature distribution for clean speech (left panel) and speech + pink noise of 15dB SNR (right panel) after filtering, voiced blue, unvoiced red. Curvature Distribution for Filtered Clean Speech Curvature Distribution for Speech + White Noise of 15dB SNR.5 Voiced Unvoiced.25 Voiced Unvoiced.4.2 Relative Counts.3.2 Relative Counts Curvature Curvature Figure 3.8: Curvature distribution for clean speech (left panel) and speech + white noise at 15dB SNR (right panel) after filtering, voiced blue, unvoiced red. 29

38 Experiments and Results All speech data used in the following experiments were obtained from the TIMIT database (13 female files and 12 male files). Curvature-based voiced/unvoiced decisions were made using the following procedure: The speech signal is filtered using a 1 th order moving average filter The output of the filter is then segmented into frames of 128 samples each. Takens embedding technique is then applied on each frame The curvature values of each embedded frame are then computed and averaged to produce a voiced or unvoiced decision for that frame, based on a threshold of 2.3. The block diagram for the voiced/unvoiced classification process is given in Figure 3.9 below. Speech signal Moving Average Filter Framing Nonlinear Embedding Curvature Algorithm Voiced/Unvoiced Detection Figure 3.9: Curvature-based decision process. In choosing a threshold, one has two options: an optimal (and therefore different) threshold for each noise condition, or an optimal (single) threshold for all conditions; however, since prior knowledge of the noise state cannot be determined (as yet), an optimum threshold has been chosen for all three noise states. Figures 3.1 and 3.11 below show the ROC curves for the voiced and unvoiced hits and false alarms for all three noise states. 3

39 1 ROC for Noise States + Unvoiced Speech dB White 15dB Pink Clean.8 Hits False Alarms Figure 3.1: ROC curves for different noise states + voiced speech ROC for Noise States + Unvoiced Speech dB White 15dB Pink Clean.7.6 Hits False Alarms Figure 3.11: ROC curves for different noise states + unvoiced speech It is observed, from the above figures, that it is possible to achieve a minimum of 95% hits with 5% false alarms for each of the noise states; however, these values are only 31

40 attainable if the optimum threshold for each individual noise state is used. Choosing one threshold that will produce the best overall results for all three noise states put together is more practical, even though it leads to a reduction in the maximum accuracy for each individual noise state. A threshold of 2.3 was found to yield the best overall result for all three noise states. Frames whose curvature values fell below 2.3 were considered unvoiced, and frames whose curvature values were above 2.3 were considered voiced. Figure 3.12 to 3.14 below show curvature-based voiced/unvoiced decision values (voiced: 1, unvoiced: ) plotted against color-coded speech data with different speech classes. The data is coded as follows: Voiced, weak voiced, unvoiced, transition, and silence. It must be noted, however, that in this research, only voiced/unvoiced classification is performed, and all other voicing states are ignored. Curvature Based Decisions for Given Clean Speech (Blk: V; Red: UV) Amplitude Sample Number x 1 4 Figure 3.12: Curvature-based decisions for clean speech. 32

41 Curvature Based Decisions for Speech + Pink Noise of 15dB SNR (Blk: V; Red: UV) Amplitude Sample Number x 1 4 Figure 3.13: Curvature-based decisions for corresponding speech + added pink noise at 15dB SNR. Curvature Based Decisions for Speech + White Noise of 15dB SNR (Blk: V; Red: UV) Amplitude Sample Number x 1 4 Figure 3.14: Curvature-based decisions for corresponding speech + added white noise at 15dB SNR. 33

42 End-Point Detection Based on the inability of the curvature measure to detect speech states other than voiced and unvoiced, an undecided region was created in an attempt to detect voiced and unvoiced end-points as accurately as possible. For the undecided band, two thresholds were chosen, one slightly above the threshold, for voiced speech, and the other slightly below the threshold, for unvoiced speech. The advantages of having such a band are the improvement of accuracy in endpoint detections and the reduction of false alarms in voiced and unvoiced detections. However, some actual voiced and unvoiced frames could fall among the undecided region, resulting in a reduction in the hits and an increase in misses. Figure 3.15 illustrates the endpoint detection accuracy for the curvature measure using clean speech. Voiced Unvoiced Decisions for Clean Speech (With Undecided) => 1:V; :Dont Care; -1:UV Differences Decision Decision Amplitude x x x Sample Number x 1 4 Figure 3.15: Speech data (top level), ground truth (2 nd panel), curvature-based classification (3 rd panel), and difference between ground truth and curvature-based classifications (4 th panel). 34

43 The don t care regions in the actual classification are all speech classes other than voiced or unvoiced, while those in the curvature-based classification are the undecided regions. It should be noted that the don t care regions in both cases are almost the same. Therefore, if accurate endpoint detection is desired, the use of an undecided band could be effective Result Comparisons Figures 3.16 and 3.17 below show the comparison of the performance of the curvature measure with the traditional voiced/unvoiced classifiers presented in preceding chapters. These results were obtained by subtracting the average false alarms from the average hits using 25 different speech files from the TIMIT database. Hits -False Alarms for Voiced Speech FR/RE E/ZC Curvature Clean 15dB Pink 15dB White Figure 3.16: Comparisons of hits minus false alarms for voiced speech 35

44 Hits -False Alarms for Unvoiced Speech FR/RE E/ZC Curvature Clean 15dB Pink 15dB White Figure 3.17: Comparisons of hits minus false alarms for unvoiced speech It is observed in Figures 3.16 and 3.17 that the curvature measure is comparable to the traditional measures in a noiseless environment. However, in the presence of white noise, the curvature measure performs better than the FR/RE measure, and is comparable to the E/ZC measure. Furthermore, with pink noise interference, the curvature measure is decidedly better than either traditional measure for voiced/unvoiced classification 3.3. Proposed Research: Voiced/Unvoiced/Background Classification Separation on unvoiced speech and background noise has been a challenge in speech classification due to their similarity in structure. In this proposal are some preliminary experiments to explore possible differences between the structures of embedded unvoiced speech and background noise in order to extend the classification to voiced/unvoiced/background. Figures 3.18 and 3.19 below show the embedded signals 36

45 of voiced, unvoiced and background speech with added 15B pink noise and 15dB white noise, respectively consisting of 128 sample points. Embedded Voiced Speech Embedded Unvoiced Speech Embedded Background Figure 3.18: Embedded voiced (left panel), unvoiced (middle panel) and background (right panel) frames with added pink noise at 15dB SNR. Embedded Voiced Speech Embedded Unvoiced Speech Embedded Background Figure 3.19: Embedded voiced (left panel), unvoiced (middle panel) and background (right panel) frames with added white noise at 15dB SNR. It is evident, from the above figures, that, although embedded background (or noise) is chaotic in nature, some differentiation does exist between the structures of unvoiced and background speech, and this differentiation can also be measured using the curvature 37

46 algorithm. With white noise, however, the difference between the structures of embedded unvoiced speech and background is not clear, therefore, as in the case of voiced/unvoiced classification, a 1 th order moving average filter was used to pre-process the speech before applying the embedded technique. Figures 3.2 and 3.21 below show the embedded signals of voiced, unvoiced and background speech with added 15B pink noise and 15dB white noise, respectively after filtering. Embedded Voiced Speech Embedded Unvoiced Speech Embedded Background Figure 3.2: Embedded voiced (left panel), unvoiced (middle panel) and background (right panel) speech frames with added pink noise at 15dB SNR after filtering. Embedded Voiced Speech Embedded Unvoiced Speech Embedded Background Figure 3.21: Embedded voiced (left panel), unvoiced (middle panel) and background (right panel) speech frames with added white noise at 15dB SNR after filtering. 38

47 It is readily observed that filtering increases the differentiation between unvoiced speech and background both with both added pink noise and added white noise. 39

48 CHAPTER 4: NODAL DENSITY MEASURE 4.1. Introduction to Nodal Density Another distinguishable feature between embedded voiced and unvoiced signals, observable from Figure 3.2 in the previous chapter, is the density of the signals. The embedded voiced signals appears to be much less dense than the unvoiced signal, however, the presence of an appreciable amount of interfering speech in voiced signals will introduce significant distortion in their structured pattern thereby increasing the apparent density. Figures 4.1 and 4.2 below show 256-sampling point frames of usable and usable voiced speech, respectively, embedded using Takens method of delays with m = 3 and d = 12. The co-channel data was obtained by combining two different frames of speech from different speakers, scaling them to obtain the desired target to interferer ratio TIR, and then extracting out the voiced portions using one of the traditional voiced/unvoiced classifiers. Embedded Co-channel Speech of 3dB TIR Figure 4.1: Embedded data for co-channel speech at 3dB TIR. 4

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Testing the Intelligibility of Corrupted Speech with an Automated Speech Recognition System

Testing the Intelligibility of Corrupted Speech with an Automated Speech Recognition System Testing the Intelligibility of Corrupted Speech with an Automated Speech Recognition System William T. HICKS, Brett Y. SMOLENSKI, Robert E. YANTORNO Electrical & Computer Engineering Department College

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels A complex sound with particular frequency can be analyzed and quantified by its Fourier spectrum: the relative amplitudes

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER X. SPEECH ANALYSIS Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER Most vowel identifiers constructed in the past were designed on the principle of "pattern matching";

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Fundamental Frequency Detection

Fundamental Frequency Detection Fundamental Frequency Detection Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno Fundamental Frequency Detection Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/37

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

System Identification and CDMA Communication

System Identification and CDMA Communication System Identification and CDMA Communication A (partial) sample report by Nathan A. Goodman Abstract This (sample) report describes theory and simulations associated with a class project on system identification

More information

TRANSFORMS / WAVELETS

TRANSFORMS / WAVELETS RANSFORMS / WAVELES ransform Analysis Signal processing using a transform analysis for calculations is a technique used to simplify or accelerate problem solution. For example, instead of dividing two

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of COMPRESSIVE SAMPLING OF SPEECH SIGNALS by Mona Hussein Ramadan BS, Sebha University, 25 Submitted to the Graduate Faculty of Swanson School of Engineering in partial fulfillment of the requirements for

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

Theory of Telecommunications Networks

Theory of Telecommunications Networks Theory of Telecommunications Networks Anton Čižmár Ján Papaj Department of electronics and multimedia telecommunications CONTENTS Preface... 5 1 Introduction... 6 1.1 Mathematical models for communication

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

ME scope Application Note 01 The FFT, Leakage, and Windowing

ME scope Application Note 01 The FFT, Leakage, and Windowing INTRODUCTION ME scope Application Note 01 The FFT, Leakage, and Windowing NOTE: The steps in this Application Note can be duplicated using any Package that includes the VES-3600 Advanced Signal Processing

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Digital Speech Processing- Lecture 14A Algorithms for Speech Processing Speech Processing Algorithms Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Single speech

More information

Measuring the complexity of sound

Measuring the complexity of sound PRAMANA c Indian Academy of Sciences Vol. 77, No. 5 journal of November 2011 physics pp. 811 816 Measuring the complexity of sound NANDINI CHATTERJEE SINGH National Brain Research Centre, NH-8, Nainwal

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Basic Characteristics of Speech Signal Analysis

Basic Characteristics of Speech Signal Analysis www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal.

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal. 1 2.1 BASIC CONCEPTS 2.1.1 Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal. 2 Time Scaling. Figure 2.4 Time scaling of a signal. 2.1.2 Classification of Signals

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION TE 302 DISCRETE SIGNALS AND SYSTEMS Study on the behavior and processing of information bearing functions as they are currently used in human communication and the systems involved. Chapter 1: INTRODUCTION

More information

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION Frank Kurth, Alessia Cornaggia-Urrigshardt

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Lab 8. Signal Analysis Using Matlab Simulink

Lab 8. Signal Analysis Using Matlab Simulink E E 2 7 5 Lab June 30, 2006 Lab 8. Signal Analysis Using Matlab Simulink Introduction The Matlab Simulink software allows you to model digital signals, examine power spectra of digital signals, represent

More information

VHF Radar Target Detection in the Presence of Clutter *

VHF Radar Target Detection in the Presence of Clutter * BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6, No 1 Sofia 2006 VHF Radar Target Detection in the Presence of Clutter * Boriana Vassileva Institute for Parallel Processing,

More information

EE228 Applications of Course Concepts. DePiero

EE228 Applications of Course Concepts. DePiero EE228 Applications of Course Concepts DePiero Purpose Describe applications of concepts in EE228. Applications may help students recall and synthesize concepts. Also discuss: Some advanced concepts Highlight

More information

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Visual Interpretation of Hand Gestures as a Practical Interface Modality Visual Interpretation of Hand Gestures as a Practical Interface Modality Frederik C. M. Kjeldsen Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate

More information

Chapter 2 Direct-Sequence Systems

Chapter 2 Direct-Sequence Systems Chapter 2 Direct-Sequence Systems A spread-spectrum signal is one with an extra modulation that expands the signal bandwidth greatly beyond what is required by the underlying coded-data modulation. Spread-spectrum

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

Chapter 2 Channel Equalization

Chapter 2 Channel Equalization Chapter 2 Channel Equalization 2.1 Introduction In wireless communication systems signal experiences distortion due to fading [17]. As signal propagates, it follows multiple paths between transmitter and

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION Broadly speaking, system identification is the art and science of using measurements obtained from a system to characterize the system. The characterization

More information

Fourier Methods of Spectral Estimation

Fourier Methods of Spectral Estimation Department of Electrical Engineering IIT Madras Outline Definition of Power Spectrum Deterministic signal example Power Spectrum of a Random Process The Periodogram Estimator The Averaged Periodogram Blackman-Tukey

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

System analysis and signal processing

System analysis and signal processing System analysis and signal processing with emphasis on the use of MATLAB PHILIP DENBIGH University of Sussex ADDISON-WESLEY Harlow, England Reading, Massachusetts Menlow Park, California New York Don Mills,

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 BACKGROUND The increased use of non-linear loads and the occurrence of fault on the power system have resulted in deterioration in the quality of power supplied to the customers.

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

PR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan.

PR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan. XVIII. DIGITAL SIGNAL PROCESSING Academic Research Staff Prof. Alan V. Oppenheim Prof. James H. McClellan Graduate Students Bir Bhanu Gary E. Kopec Thomas F. Quatieri, Jr. Patrick W. Bosshart Jae S. Lim

More information

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Stefan Wunsch, Johannes Fink, Friedrich K. Jondral Communications Engineering Lab, Karlsruhe Institute of Technology Stefan.Wunsch@student.kit.edu,

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal

A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal International Journal of ISSN 0974-2107 Systems and Technologies IJST Vol.3, No.1, pp 11-16 KLEF 2010 A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal Gaurav Lohiya 1,

More information

A Survey and Evaluation of Voice Activity Detection Algorithms

A Survey and Evaluation of Voice Activity Detection Algorithms A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson

More information

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi International Journal on Electrical Engineering and Informatics - Volume 3, Number 2, 211 Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms Armein Z. R. Langi ITB Research

More information

Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope

Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope Product Note Table of Contents Introduction........................ 1 Jitter Fundamentals................. 1 Jitter Measurement Techniques......

More information

Digital Processing of Continuous-Time Signals

Digital Processing of Continuous-Time Signals Chapter 4 Digital Processing of Continuous-Time Signals 清大電機系林嘉文 cwlin@ee.nthu.edu.tw 03-5731152 Original PowerPoint slides prepared by S. K. Mitra 4-1-1 Digital Processing of Continuous-Time Signals Digital

More information

DESIGN OF GLOBAL SAW RFID TAG DEVICES C. S. Hartmann, P. Brown, and J. Bellamy RF SAW, Inc., 900 Alpha Drive Ste 400, Richardson, TX, U.S.A.

DESIGN OF GLOBAL SAW RFID TAG DEVICES C. S. Hartmann, P. Brown, and J. Bellamy RF SAW, Inc., 900 Alpha Drive Ste 400, Richardson, TX, U.S.A. DESIGN OF GLOBAL SAW RFID TAG DEVICES C. S. Hartmann, P. Brown, and J. Bellamy RF SAW, Inc., 900 Alpha Drive Ste 400, Richardson, TX, U.S.A., 75081 Abstract - The Global SAW Tag [1] is projected to be

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM After developing the Spectral Fit algorithm, many different signal processing techniques were investigated with the

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

BIT SYNCHRONIZERS FOR PSK AND THEIR DIGITAL IMPLEMENTATION

BIT SYNCHRONIZERS FOR PSK AND THEIR DIGITAL IMPLEMENTATION BIT SYNCHRONIZERS FOR PSK AND THEIR DIGITAL IMPLEMENTATION Jack K. Holmes Holmes Associates, Inc. 1338 Comstock Avenue Los Angeles, California 90024 ABSTRACT Bit synchronizers play an important role in

More information