Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice

Size: px
Start display at page:

Download "Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice"

Transcription

1 Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice Yanmeng Guo, Qiang Fu, and Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing Abstract. This paper presents an algorithm of speech endpoint detection in noisy environments, especially those with non-stationary noise. The input signal is firstly decomposed into several sub-bands. In each sub-band, an energy sequence is tracked and analyzed separately to decide whether a temporal segment is stationary or not. An algorithm of voiced speech detection based on the harmonic structure of voice is brought forward, and it is applied in the non-stationary segment to check whether it contain speech or not. The endpoints of speech are finally determined according to the combination of energy detection and voice detection. Experiments in real noise environments show that the proposed approach is more reliable compared with some standard methods. 1 Introduction Speech endpoint detection (EPD) is to detect the beginning and ending boundaries of speech in the input signal, which is important in many areas of speech processing. An accurate speech endpoint detection is crucial for the recognition performance in improving the recognition accuracy and reducing the computing complexity. Endpoint detection discriminates speech from noise by some features of the signal, such as energy[1][2], entropy[3][4], LSPE[5], statistic properties [6][7][8], etc.. Some methods treat speech and noise as separate classes and detect speech by models of speech and noise. These methods perform well in specific environments, but degrade rapidly when the models mismatch the environment. However, if the discrimination is based on some heuristically derived rules relating to the signal features, its performance relies on the properties directly, and it is easier to adapt to the unknown environments. For the practical speech recognition, it is critical to detect speech reliably under diverse circumstances. This paper develops a robust endpoint detection method that combines the advantages of several features by rules. Short-time energy is the most widely used parameter in endpoint detection[1][2][9][10]. But it is not sufficient if the noise level is high. Fortunately, the This work is (partly) supported by Chinese 973 program (2004CB318106), National Natural Science Foundation of China ( , ), and Beijing Municipal Science and Technology Commission (Z )

2 spectral energy distributions of speech and noise are often different, so speech is not corrupted equally in different frequency. This fact is exploited in this paper by analyzing and tracking the energy in 4 sub-bands, and taking more importance to the sub-bands with drastic energy variation. Another shortcoming of energy parameter is to misclassify high level noise as speech when the noise is time-varying. In this paper, this problem is solved by involving voice detection. If the non-stationary segment contains voiced speech, speech is detected, otherwise, it is classified as noise. Detecting voiced speech is also an important strategy to distinguish between speech and noise. Generally, voiced speech can be detected by tracking pitch[11], measuring periodicity[5][12][13], etc., but those methods are often disturbed by low-frequency noise or the abrupt changes of noise. However, voice has obvious harmonic structure in frequency domain even in very noisy case, and his paper proposes a robust algorithm that detects voice by adaptively checking such structure in frequency domain. This paper is organized as follows. The theory of the proposed algorithm is described in section 2. Section 3 evaluates and analyzes the performance of the algorithm. The conclusion is given in Section 4. 2 Algorithm Assuming that the speech and additional noise are independent, the short-time energy of input signal is given by E x = E s + E n, where E s and E n represent the energy of speech and noise, respectively. Thus the position of speech signal can be determined by searching the segments where E x > E n. However, noise may be non-stationary, and its energy E n can hardly be estimated precisely. To solve this problem, we classify the input signal into two categories: the stationary, which is assumed to exist all the time, and the non-stationary, which may contain speech, noise, or both. The stationary component can be tracked using specified model mentioned in Sect.2.2. Then we apply voice detection method proposed in Sect.2.3 in the non-stationary segments to detect speech. The structure of the algorithm is shown in Fig Preprocessing The 8KHz sampled noisy speech is divided into L frames with each frame 20ms long and overlapped 50%. After being applied by a window function and analyzed by short-time Fourier transform (STFT) of N (N 256) points, the energy of kth frequency bin in the ith frame can be derived from the spectrum, and represented as P i (k), where 0 k < N/2. Setting borders at {0,500,1000,2000,4000}Hz, the signal is divided into 4 non-overlapped sub-bands. Thus the energy of subband m in frame i can be obtained by summing up the energy of its frequency components, and denoted as E x, m (i), where m = 0, 1, 2, 3, and i = 1, 2,...L.

3 Input Signal Stationary Noise N Detect Voice Update Noise Model Y N Contain Voice Y Search Endpoints Output Endpoints Fig. 1. Flowchart of the proposed algorithm 2.2 Energy Detection Based on the character of energy sequence, additional noise can be classified to 5 classes here: stable noise, slow-varying noise, impulse noise, fluctuant noise and step noise. All the kinds of noise are independent and additional, and their sum is the input noise. Stable noise, such as thermal noise or the noise of running machine, basically has stable energy distribution, and its energy sequence follows ergodic Gaussian distribution. Slow-varying noise denotes the noise whose energy distribution changes slowly, and the noise of wind blowing or coming car can be classified into this kind. In a short interval, it can be looked approximately as stable noise. Impulse noise involves those whose energy rise and fall rapidly, and its energy only keeps nonzero in short period. The typical examples are smack and click. Fluctuant noise has varying energy all the time, and it includes the babble noise, continual bump in car and the noise of several passing vehicles. Step noise is the noise whose energy distribution changes abruptly like steps, and it includes noise from turning on a machine as well as the noise from abrupt changes in telecommunication channel. It can be classified separately to stable, slow-change or fluctuant noise before and after the step. Accordingly, the noise energy of sub-band m in frame i is expressed as E n, m (i) = E p, m (i)+e q, m (i) in the duration of ms, which is about the length of a syllable. E p, m (i) is an ergodic stationary Gaussian random sequence composed of stable noise, slow varying noise and the stationary section of step noise, and E q, m (i) is a non-stationary sequence made from other noise. Hence, in the total energy of {E x, m (i) = E s, m (i)+e p, m (i)+e q, m (i)}, {E s, m (i)} and {E q, m (i)} are both non-stationary sequences that are difficult to be discriminated only by energy, and that s the reason to apply the voice detection. Stationary noise modeling An adaptive model is set up to track the stationary noise for each sub-band. For clarity, we omit the argument m hereafter in description of the model initialization and update. Let {E p (i)} denote the energy sequence of stationary noise in sub-band m, and its probability distribution function is f(e p ) = (1/ 2πσ)exp( (E p

4 µ) 2 /2σ 2 ) in a short period, where µ and σ are mean and variance respectively. Define the normalized variance λ = σ/µ, then f(e p ) = (1/ 2πλµ)exp( (E p /µ 1) 2 /2λ 2 ), where λ represents the relatively dynamic range. {E p (i)}is the only stationary component in{e x (i)}, so its distribution can be estimated in the segments where {E x (i)} is stationary. However, {E p (i)} only keeps stationary and ergodic in short period, and it dominates the signal in even less time. Therefore, its distribution is assumed to be stable in ms ( l frames), and µ and λ can be estimated by the beginning ms signal ( r frames) of it. jth model (j-1)th model Update model jth model Detect by energy r frames l-r frames jth Analysis Window ( j+1)th model Update model Detect by energy Delete 1 frame Shift 1 frame Input 1 frame (j+1)th Analysis Window Fig. 2. Strategy of update the analysis window Accordingly, set the analysis window of l frames and calculate model parameters by its beginning r frames. Then set energy threshold as θ = µ + µ λ/α, and apply it to test the latter l r frames, where α is the sensitivity coefficient and 0 < α < 1. When a new frame is inputted, the analysis window shifts one frame, and the model is updated to calculate new θ, as shown in Fig.2. Model initialization and update The model for the sub-band is initialized in the first analysis window by its energy of beginning r frames. Set µ to their mean ε 1 = 1 r r i=1 E x(i), and set λ to the normal variance ξ 1 = [ r i=1 (E x(i) ε 1 ) 2 ] 1/2 /(ε 1 r). Initial signal may compose non-stationary components, and the distribution of {E p (i)} is also time-varying, so the model is adjusted in all the following analysis windows to track the distribution of {E p (i)}. Take the jth analysis window for example, as shown in Fig.2, get the mean and normal variance of the beginning r frame, denoted as ε j and ξ j, then update λ and µ in the following 5 cases. 1. The input signal occasionally contains short silence or just constant component because of hardware errors. Hence, if ε j < µ sil, set µ = µ sil where µ sil is the experimental minimum of µ. 2. For the same reason, if ξ j < λ sil, and ε j < ε j 1, set λ = λ sil where λ sil is the experimental minimum of λ.

5 3. If ε j < µ c and ξ j < λ c, then set µ = ε j and λ = ξ j, where c is a constant and 1 < c < 1.5. This is to track the decreasing or slow varying noise; 4. If ε j < µ c and ξ j < ξ j 1 < ξ j 2, the noise is getting stationary and its level is lower, so set µ = ε j and λ = ξ j. 5. If ε j (1 + ξ j ) < µ (1 + λ j ), the noise is decreasing too, so set µ = ε j and λ = ξ j as well. The above cases are checked one by one, and once a condition is met, update the parameters by it and neglect all the following cases. If none condition is met, keep the current λ and µ. Band selection and threshold setting The presence of speech improves the energy level in every sub-band, and in most cases, this is obvious in the sub-bands that are dominated by speech. Hence, the non-stationary signal are detected in the latter l r frames of the current analysis window. If the mean energy of r continuous frames in a sub-band is higher than θ, then the sub-band detects non-stationary signal. And for the consecutive r frames, if 2 of the 4 sub-bands detects non-stationary signal, and the mean energy in the other 2 subands is higher than µ, the non-stationary signal is detected. 2.3 Voice Detection Based on Harmonic Structure The voice detection is carried out in the current analysis window after detecting non-stationary signal. In general, voice is modeled as the production of vocal tract excited by periodic glottal flow, so the short-term spectrum voice has energy peaks on pitch and harmonic frequencies. It is reflected in narrow band spectrograms as parallel bright lines, because pitch varies slowly. Most noise don t have such character, so checking harmonics is an effective method for voice detection. The harmonic components dominant the energy of voice, so the harmonic character remains outstanding even with background noise. However, the spectral energy envelop of speech varies with pitch and formants, and the energy distribution of noise is also time-varying. Hence, the speech spectrum is not corrupted equally in different frequency, and the bands with clear harmonic character are also time-varying. In this paper, the voice is detected by an adaptive method of searching clear harmonic character in a wide band, and the information of neighbor frames is considered as well. This strategy keeps robust against distortions, low-frequency noise and pitch tracking error. Peak picking in a frame The spectral energy of voiced speech usually has peaks at harmonics, which are multiples of the fundamental frequency. However, some peaks will be submerged in corruption of noise, while a lot of spurious peaks are brought up. Fortunately, under most circumstances, at least 3 4 consecutive harmonics will keep clear, that is to say, a frame of corrupted voice has 3 4 spectral energy peaks with a spacing of fundamental frequency(60 450Hz) between adjacent ones. To detect the harmonics in frame i, peaks are picked from P i (k) as follows.

6 1. Extract all the local peaks in the spectrum. 2. Eliminate the peaks that are lower than an experimental threshold to delete some peaks caused by noise. 3. Merge the trivial (low and narrow) peaks into the dominant ones nearby. 4. Eliminate the remaining peaks with relatively small height or width. Matching peaks with harmonics If frame i contains voice, there will be spectral energy peaks at the multiples of fundamental frequency. We take various F 0 to see if {Q i (n)} match its multiples, in which F 0 is incremented in a step length of F = 1.5Hz within the range of [60Hz, 450Hz]. If {Q i (n)} contains peaks in the position of at least 4 consecutive harmonics, or 3 peaks matching the 1,2,3 multiples of F 0, then record the peaks as potential harmonics for F 0. It is assumed that every frame contains voice of one speaker at most. To eliminate the spurious harmonics, the continuity of F 0 and harmonics are checked. For the consecutive frames numbered from i b to i e, if F 0 fluctuates within a limited extent and its harmonics, n F 0 (n + 3) F 0, are all matched in those frames, then frame i b to i e is detected as voice. The case of n = 0 means that the harmonics of F 0, 2F 0 and 3F 0 are concerned. 2.4 Speech Endpoint Determination Start point determination As is shown in Fig.1, if there exists voice in the analysis window, the non-stationary signal can be confirmed as speech, and its start point is searched in every sub-band based on energy. Take sub-band m as example (and number m omitted), after finding voice in the analysis window, the start point of speech is searched forward by ε i and ξ i from the first voiced frame. If a frame satisfies ε i > ε i+1 and ε i > θ, then it is detected as the start point for this sub-band. The earliest start point of all the sub-bands are detected as the temporary start frame b s. The onset of speech usually increases the signal energy abruptly, so the noise model keeps stable in the beginning of speech, and the energy threshold keeps low. Moreover, voice usually locates after unvoice, and it has much higher energy than unvoice, so the temporary start point probably locates in the unvoice section or before the unvoice. To get the refined start point, the second step is to find the local maximum of ξ i near frame b s, and set it as the beginning point. If there isn t any ξ i maximum for a sub-band in a predefined boundary, it keeps b s. At last, the earliest start frame of the sub-bands is detected as the start of speech. End point determination The end point is searched after finding the start point. End threshold θ is initialized as θ = µ + µ λ. The parameters µ and λ are updated by ε i and ξ i based only on the criteria 1, 2 and 5 whenever a new frame shifts into the analysis window. Thus the noise can be updated by the weaker or more stationary noise in speech pauses. For every sub-band, if none of the successive l frames in the analysis window meets the rule that every frame has energy higher than θ, where l is an experimental threshold with range of

7 8 < l < 20, then the end frame is detected as the first frame of that analysis window. And if 3 sub-bands detect end point, and there exists no voice in the analysis frame, the endpoint is determined. 3 Evaluation and Analysis The accuracy of endpoint detection has a strong influence on the performance of speech recognition by triggering on and off the recognizer on speech boundaries. In this paper, the performance of the proposed algorithm is tested using a grammar-based speaker-independent speech recognition system, and the reference endpoint detection approaches are ETSI AFE[14] and VAD in G.729B[9]. Due to the wide range of speech recognition equipments and circumstances, the test data are recorded in several environments by PDA, telephone, and mobile phones, and each test file contains a segment of noisy speech with 2 6 syllables of Chinese words. Table 1. Recognition Performance Results Correct Rate(%) Error Rate(%) Rejection Rate(%) Data total ETSI G729 Proposed ETSI G729 Proposed ETSI G729 Proposed Table 1 shows the comparative results for different EPDs. Data 1 and data 2 were both recorded in quiet office by telephone, but data 2 was in the hands-free mode, so there are much more noise of clicks, electric fan and so on. Data 3 was recorded by PDA in office with window opened, so there exists much more impulse noise than data 1 and 2. Data 4 was recorded by PDA in supermarket when there were not too many clients. Hence, the most important noise was the stepping and clicking of the clients, and occasionally with some voice of them. Data 5 was recorded in an airport lounge by PDA, and the broadcast was not playing. The main interference was from people talking, stepping and baggage moving nearby. Data 6 was recorded near a noisy roadside, and there were people talking, moving and vehicles running. Data 7 was recorded by PDA mobile outside the gate of a park. There people talking and vehicles moving around. Data 8 was recorded by GSM mobile on the side of a high way, so the

8 noise from wind and vehicle was really serious. Data 9 was also recorded by GSM mobile near the a high way, but there was music played and the mobile telephone was in mode of adaptive noise-canceling, so whenever the speech begins, the volume of noise is suppressed automatically. Data 10 was recorded by CDMA mobile in office with window opened, and the speech volume was very low because of the telecom channel. Data 11 was recorded in office of opened window, but the equipment was PAS mobile, so the volume was a little higher than data 10. As can be seen in Table 1, the proposed algorithm has performance comparative to G.729B and ETSIAFE in quiet environment, and outperforms them in most noisy environments, especially in the time-varying noise. For the database with noise of vehicles, steps, clicks and other environment noise, such as data 2, 3 and 8, the proposed algorithm is much superior than the standards. Even if the energy of non-stationary noise is high, it is still rejected by voice detection and can not enter the recognizer, because there is not harmonic structure in its spectrum. However, the voice detection cannot be so effective for the noise from other human s voice, as can be seen in data 4, 5, 6 and 7. Such noise has the character of harmonic as well, so some of them could be detected as voice. Fortunately, the energy detection serves as the first step for speech detection, and the noise with low energy is rejected first. Then, the voice detection also discriminates some interfering voice from the user s voice, because the harmonics in far-away speech are usually not as continuous and clear as the user s. For the interfering talkers nearby, some more effective approaches are still needed. In the case of quite environment, as is in data 1, the proposed algorithm has better performance than G.729B, whereas not as good as ETSIAFE. This is because of the mis-hit in voice detection when the user s speech is short or hoarse. After all, it is still acceptable for most practical purposes. The advantage of adaptive energy tracking is clear in the case of data 9, 10 and 11, in which the volume is low or time-varying. By tracking the stationary noise in 4 sub-bands, the non-stationary signal is detected to start the voice detection, so the final detection is not affected by the level or the variance of signal. 4 Conclusion This paper puts forward a speech endpoint detection algorithm for real noisy environment. It performs reliably in most noisy environments, especially in those with abrupt changes of noise energy, which is typical in mobile and portable circumstances. The following research will focus on the environments with interfering speech and music. 5 Acknowledgement The authors would like to thank Heng Zhang for his helpful suggestions.

9 References 1. J.F.Lynch, J.G.Josenhans, R.E.Crochiere: Speech/silence segmentation for realtime coding via rule based adaptive endpoint detection. Proc. ICASSP, Dallas (1987) Li, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3) (2002) Huang, L.S., Yung, C.H.: A novel approach to robust speech endpoint detection in car environments. Proc. ICASSP (2000) Weaver, K., Waheed, K., Salem, F.M.: An entropy based robust speech boundary detection algorithm for realistic noisy environments. IEEE Conference on Neural Networks (2003) R.Tucker: Voice activity detection using a periodicity measure. IEE Proceedings 139(4) (1992) H.Othman, T.Aboulnasr: A semi-continuous state transition probability hmmbased voice activity detection. Proc. ICASSP V (2004) S.Gazor, W.Zhang: A soft voice activity detector based on a laplacian-gaussian model. IEEE Transactions on Speech and Audio Processing 11(5) (2003) Li, K., Swamy, M.N.S., Ahmad, M.O.: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold. IEEE Transactions on Speech and Audio Processing 13(5) (2005) Benyassine, A., Shlomot, E., Su, H.Y.: Itu recommendation g.729 annex b: A silence compression scheme for use with g.729 optimized for v.70 digital simultaneous voice and data applications. IEEE Communication Magazine (1997) Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(2) (2002) Zhang, T., Kuo, C.C.J.: Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and audio Processing 9(4) (2001) Tanyer, S.G., Ozer, H.: Voice activity detection in nonstationary noise. IEEE Transactions on Speech and Audio Processing 8(4) (2000) Seneff, S.: Real-time harmonic pitch detector. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-26(4) (1978) ETSI: Es recommendation: Speech processing, transmission and quality aspects (stq); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. (2002)

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

A multi-class method for detecting audio events in news broadcasts

A multi-class method for detecting audio events in news broadcasts A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding. Volume 5, Issue 2, February 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Speech Enhancement

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Combining Voice Activity Detection Algorithms by Decision Fusion

Combining Voice Activity Detection Algorithms by Decision Fusion Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

A Survey and Evaluation of Voice Activity Detection Algorithms

A Survey and Evaluation of Voice Activity Detection Algorithms A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION Jian Li 1,2, Shiwei Wang 1,2, Renhua Peng 1,2, Chengshi Zheng 1,2, Xiaodong Li 1,2 1. Communication Acoustics Laboratory, Institute of Acoustics,

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

Real time noise-speech discrimination in time domain for speech recognition application

Real time noise-speech discrimination in time domain for speech recognition application University of Malaya From the SelectedWorks of Mokhtar Norrima January 4, 2011 Real time noise-speech discrimination in time domain for speech recognition application Norrima Mokhtar, University of Malaya

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Automatic Evaluation of Hindustani Learner s SARGAM Practice

Automatic Evaluation of Hindustani Learner s SARGAM Practice Automatic Evaluation of Hindustani Learner s SARGAM Practice Gurunath Reddy M and K. Sreenivasa Rao Indian Institute of Technology, Kharagpur, India {mgurunathreddy, ksrao}@sit.iitkgp.ernet.in Abstract

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Study on the Algorithm of Vibration Source Identification Based on the Optical Fiber Vibration Pre-Warning System

Study on the Algorithm of Vibration Source Identification Based on the Optical Fiber Vibration Pre-Warning System PHOTONIC SENSORS / Vol. 5, No., 5: 8 88 Study on the Algorithm of Vibration Source Identification Based on the Optical Fiber Vibration Pre-Warning System Hongquan QU, Xuecong REN *, Guoxiang LI, Yonghong

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Dynamical Energy-Based Speech/Silence Detector for Speech Enhancement Applications

Dynamical Energy-Based Speech/Silence Detector for Speech Enhancement Applications Proceedings of the World Congress on Engineering 29 Vol I WCE 29, July - 3, 29, London, U.K. Dynamical Energy-Based Speech/Silence Detector for Speech Enhancement Applications Kirill Sakhnov, Member, IAENG,

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage: Signal Processing 9 (2) 55 6 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Minima-controlled speech presence uncertainty

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Estimation of Non-stationary Noise Power Spectrum using DWT

Estimation of Non-stationary Noise Power Spectrum using DWT Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies

A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies Journal of Physics: Conference Series PAPER OPEN ACCESS A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies To cite this article:

More information

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan USC SAIL Lab INTERSPEECH Articulatory Data

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT

EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT T-ASL-03274-2011 1 EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT Navin Chatlani and John J. Soraghan Abstract An Empirical Mode Decomposition based filtering (EMDF) approach

More information

Speech/Music Discrimination via Energy Density Analysis

Speech/Music Discrimination via Energy Density Analysis Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,

More information

Voice Activity Detection Using Spectral Entropy. in Bark-Scale Wavelet Domain

Voice Activity Detection Using Spectral Entropy. in Bark-Scale Wavelet Domain Voice Activity Detection Using Spectral Entropy in Bark-Scale Wavelet Domain 王坤卿 Kun-ching Wang, 侯圳嶺 Tzuen-lin Hou 實踐大學資訊科技與通訊學系 Department of Information Technology & Communication Shin Chien University

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

ORTHOGONAL frequency division multiplexing (OFDM)

ORTHOGONAL frequency division multiplexing (OFDM) 144 IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 1, MARCH 2005 Performance Analysis for OFDM-CDMA With Joint Frequency-Time Spreading Kan Zheng, Student Member, IEEE, Guoyan Zeng, and Wenbo Wang, Member,

More information

Transcription of Piano Music

Transcription of Piano Music Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk

More information

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION Tenkasi Ramabadran and Mark Jasiuk Motorola Labs, Motorola Inc., 1301 East Algonquin Road, Schaumburg, IL 60196,

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Speaker and Noise Independent Voice Activity Detection

Speaker and Noise Independent Voice Activity Detection Speaker and Noise Independent Voice Activity Detection François G. Germain, Dennis L. Sun,2, Gautham J. Mysore 3 Center for Computer Research in Music and Acoustics, Stanford University, CA 9435 2 Department

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Measuring the complexity of sound

Measuring the complexity of sound PRAMANA c Indian Academy of Sciences Vol. 77, No. 5 journal of November 2011 physics pp. 811 816 Measuring the complexity of sound NANDINI CHATTERJEE SINGH National Brain Research Centre, NH-8, Nainwal

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

University of Bristol - Explore Bristol Research. Peer reviewed version Link to published version (if available): /ISCAS.1999.

University of Bristol - Explore Bristol Research. Peer reviewed version Link to published version (if available): /ISCAS.1999. Fernando, W. A. C., Canagarajah, C. N., & Bull, D. R. (1999). Automatic detection of fade-in and fade-out in video sequences. In Proceddings of ISACAS, Image and Video Processing, Multimedia and Communications,

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 11, Issue 1, Ver. III (Jan. - Feb.216), PP 26-35 www.iosrjournals.org Denoising Of Speech

More information

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 3, MAY 1999 333 Correspondence Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm Sassan Ahmadi and Andreas

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

A Robust Acoustic Echo Canceller for Noisy Environment 1

A Robust Acoustic Echo Canceller for Noisy Environment 1 A Robust Acoustic Echo Canceller for Noisy Environment 1 Shenghao Qin, Sha Meng, and Jia Liu Department of Electronic Engineering, Tsinghua University, Beijing 184 {qinsh99, mengs4}@mails.tsinghua.edu.cn,

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

Research Article DOA Estimation with Local-Peak-Weighted CSP

Research Article DOA Estimation with Local-Peak-Weighted CSP Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 21, Article ID 38729, 9 pages doi:1.11/21/38729 Research Article DOA Estimation with Local-Peak-Weighted CSP Osamu

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

A Robust Voice Activity Detector Using an Acoustic Doppler Radar

A Robust Voice Activity Detector Using an Acoustic Doppler Radar MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com A Robust Voice Activity Detector Using an Acoustic Doppler Radar Rongqiang Hu, Bhiksha Raj TR25-59 November 25 Abstract This paper describes

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Speech/Music Change Point Detection using Sonogram and AANN

Speech/Music Change Point Detection using Sonogram and AANN International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 6, Number 1 (2016), pp. 45-49 International Research Publications House http://www. irphouse.com Speech/Music Change

More information

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection. Global Journal of Researches in Engineering: J General Engineering Volume 15 Issue 4 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information