ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
|
|
- Shauna Warren
- 5 years ago
- Views:
Transcription
1 Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu 2 Electrical Engineering Department, Stanford University, kimsora@stanfordedu ASTRACT: A new robust algorithm for estimating the tch period of a speech signal is dected This algorithm puts emphasis on frequency components where noise spectral leakage have less impact on the signal At the same time, it uses smaller analysis windows to improve time resolution avoid jitter tch doubling effects As a result, experiments show lower fine tch errors besides better voiced unvoiced segment detection INTRODUCTION The robust estimation of the tch period plays an important role on speech processing applications Many methods to extract the tch have been proposed Some of the most widely accepted methods are the Cepstrum method ( the SIFT method In the first method jitter tch doubling are shown to be a problem The accuracy of the second method depends on how stationary is the speech signal in the analysis interval Other methods like the autocorrelation function method the average magnitude difference function method keep the formant structure of the signal, making the tch estimation hard when there are high energy high frequency harmonics in the signal The objective of this paper is to develop a tch tracking algorithm that can overcome the problems of the algorithms described above On one h, avoid the effect of high energy high frequency harmonics On the other h, avoid the use of long analysis windows to avoid tch doubling jitter effects This objectives would directly result in lower gross error counts lower fine tch errors (2 For this, we use a time delay estimation technique described in section 2, where we provide a new theoretical framework that explains why this method should work when spectral leakage is an issue In sections 4 we provide robustness to our method by improving the phase unwrapng process by weighting the contribution of the phase of the different frequency bins to the tch estimation In section we evaluate the performance of our method with both clean noisy speech We also compare our method with the cepstrum method ( the autocorrelation method ( LINEAR RERESSION OF THE PHASE Let s assume two different frames of a sampled voiced speech signal: "! *,+ / : *;=< 7>? A@ "! (!$&( (2!$&( where is periodic with fourier coefficients C is the tch period in units of time (D must be a multiple of the sampling period E represents, in units of time, how far is the beginning of frame from the start of the tch period that is next to the beginning off is assumed to range from to H A very similar problem is stated in (6, where they aim to find the time delay 9I J between The objective of this work is to find, given that we know,j the time delay between F : 9K Pitch Synchronous Frames M M 4A 4A N; <7> Assuming that both frames are tch synchronous (ie L then: S!!$T$ ( Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 26
2 Ramon E Prieto et al Robust Pitch Tracking where M M are the DFT coefficients off respectively We can see that the unwrapped phase of formula is bilinear in Then, if we knowf, their time delay we can do a linear regression of that unwrapped phase vs The result gives us which gives us the value of Although this is an unrealistic case ( is known before calculating formula it illustrates the basics of our tch tracking method Non Pitch Synchronous Frames In our real problem, we don t know T we wish to calculate it In the present section we will see that applying a linear regression of the phase when the frames are not tch synchronous still gives us the period or a good estimate We will also suggest which modifications we have to apply to the basic method in order to make it more reliable We wish to calculate applying a linear regression to the unwrapped phase of: D : M M D D (4 "!! T( where q is the frequency bin Formula doesn t apply in this case Then, analyzing the DFTs of implies a spectral leakage effect iven the formulas 2, the DFTs of are: M D :,+ 4 /2 D M D,+ /FK 4A ;=< O D ( S!!$T$ D :? /F ; + < > Q N; + Q (6 Figure a shows the magnitude of the contribution of harmonic D M to C when C The location of the frequency bins of q depends on the values of We can see that the ideal case of no interference of harmonic k in the frequency bins far away from happens when (tch synchronous case As the difference between is very small, D will be close to N for the frequency bin closest to close to zero for the others Figure b shows the phase of D Making the same analysis as in a, the phase added to the contribution of D M in D will be close to zero for the frequency bins close to when the difference between > is very small That phase can be very high (even higher than when the difference between is around > If the phase distortion of a harmonic in our non-tch synchronous analysis can be as high as, what guaranties that phase regression will work for non-tch synchronous frames? The answer is given by Phase Interpolation Subtraction of Phases Phase Interpolation Imagine we have two harmonics, k, k+ Imagine also that those two harmonics are conflicting in the frequency bin "! $ & iven the characteristics of the side lobes of ( ( in Figure 4 C : * ( :,+ 4 ( a, the contribution of any other harmonics is assumed to be small As such, from formula, we have: M D D : ( : &( ; 98 ;=<- ( 7 & 7 C ;: = C / : ;=<- :2 46 (7 R <: &$ (8 (9 7 We can see from formula 7 that, if 7 > M are properly unwrapped then the phase of D 7 is an interpolation of the phases 7 > The weights of that interpolation are driven by ( 4 (, Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 27
3 Ramon E Prieto et al Robust Pitch Tracking Phase [rad] Magnitude a Magnitude N*k*Ts/T N/2 N*k*Ts/T N*k*Ts/T + N/2 b Phase /2 /2 N*k*Ts/T N/2 N*k*Ts/T N*k*Ts/T + N/2 Magnitude 2 b Magnitude of two neighboring harmonics Frecuency bin q Figure : a Magnitude of centered in The circles show the magnitude at the frequency bins when b Phase of frequency bins in the same case as a c Magnitude of (dotted (dashed when Also DFT of the sum of harmonics! " " (solid upscaled by 2 Phase [rad] Phase [rad] Phase [rad] /2 /2 2 /2 4 2 a c e Freq [KHz] /2 /2 /2 /2 b d f Freq [KHz] Figure 2: Comparison of the three unwrapng methods a Phase of $ using no Unwrapng b us- Phase of $ using U Unwrapng c Phase of $ using SFU Unwrapng, & d Phase of $ ing LRSFU Unwrapng, e Phase of $ using SFU Unwrapng, f Phase of $ LRSFU Unwrapng, ( 4 (, If the weights are the same for both phases (ie ( 4A ( ( 4H ( M, then, the interpolated phase eliminates the phase distortion the resulting phase of D will be - E - E & M The same will happen to D where the phase will be E - - E & > & ( *, perfect situation, since the frequency bin q is adding zero error to the linear regression As the difference between, ( C ( ( 4: ( is small, M we will have small phase distortion However, if the difference is big enough so the resulting phase of D 7 tends to be more 7 than & or the opposite, the solution of that problem is going to be given by Subtraction of Phases using Subtraction of Phases This time lets assume that in formula 7, the difference between or the difference between ( C ( ( 4S ( makes the resulting phase to M 7 be closer to 7 rather than & (the opposite case works just the same Then the resulting phases of D M D are: 7 M D 7, : 98 7 M D R : 98 ( &$ ;: &( ;: 7 D R 7 M D R 7 M D R -+/ ( allowing our regression method to work with some regression error, since the frequency bin q next to the harmonics k k+ are showing a phase that is proportional to k The analysis above uses a rectangular window For the rest of this work we will use hamming windows since the amplitude of the worst-case side lobe level will be lower The analysis done in this section can be generalized to hamming windows Even though Phase Interpolation Subtraction of Phases would solve the phase distortion problem for the frequency bins closest to a harmonic, there is still the problem of high energy harmonics contributing to the phase of frequency bins far away from the harmonic itself This problem can seriously modify the result of the regression method This also tells us that the frequency bins with more energy have a more reliable phase The solution to this problem is solved by using a weighted linear regression Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 28
4 ( : Ramon E Prieto et al Robust Pitch Tracking PHASE UNWRAPPIN Work has been done on the field of phase Unwrapng Phase unwrapng has been used to to calculate the Complex Cepstrum Several methods have been proposed to unwrap the phase of one dimensional signals among which we preferred to compare the following ones: asic Unwrapng (U If we consider the phase response as a continuous function of frequency, then unwrapng is meant to make the phase more continuous As such our asic Unwrapng method (U adds or or greater than respectively the phase of all the frequency bins greater or equal than q if the difference between the phase of the frequency bins q q- is lower than Slope Forced Unwrapng (SFU iven that the phase of the frequency bins closest to the first harmonic wont wrap unless ( ( is bigger than approximately, we can consider those phases as good information to calculate an initial slope Then, at frequency bin, we calculate the slope of the line that departs from frequency bin zero to frequency bin An estimate of the phase at will be calculated using that slope, the actual phase at frequency bin will be unwrapped around that estimate Since we want only reliable frequency bins to modify the estimated slope, the slope will be recalculated only in the frequency bins where the magnitude is greater or equal than times the maximum magnitude in the spectrum Linear Regression Slope Forced Unwrapng (LRSFU The most widely used method for phase unwrapng is (4, a less general version of it was implemented in ( According to (, for intermediate estimate at frequency bin, frequency bins to are used to perform a linear regression The calculated slope is used to predict an estimate of the phase ( of frequency bin, unwrapng the actual phase around that estimate We call this method Linear Regression Slope Forced Unwrapng (LRSFU The value was used in the same way as in SFU A comparison of the unwrapng methods is in figure 2 As an example we show the results of two frames separated by exactly a tch period that, as a result, should give a slope equal to zero Parts b c show that the U SFU methods are too sensible to spectral leakage For this specific example, LRSFU, with is the most robust method since it is the only one that didn t add in any bin From figure 2 is important to see how dramatic the change in the slope would be if our method is not robust enough We can also see in all the unwrapng methods of figure 2 that there is no incorrect unwrap of the initial frequency bins with high magnitude in the DFT (speech usually has high energy until -4KHz When doing the linear regression, if we put more weight on the frequencies with high amplitude, we would be reducing the effect of spectral leakage not avoided by the unwrapng method WEIHTED LINEAR RERESSION We want to apply a linear regression to the unwrapped phase of formula 4 The problem solution are stated as: ( ( where W is a NxN diagonal matrix with the weights as the diagonal elements Q is a vector containing the frequency bin indexes H to N, is the vector containing the unwrapped phase of each of the frequency bins of formula 4 is the regression error The work in (6 uses the magnitude squared coherence function as defined in (7 to define a weighting scheme However, since come from the same microphone, the magnitude squared coherence Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 29 ( 8 > to (2
5 Ramon E Prieto et al Robust Pitch Tracking T (ms Error T (ms a U j T+delta (ms Error d g 2 b SFU e h k T+delta (ms c LRSFU f i l T+delta (ms Figure : Estimated Regression Errors vs for different weighting schemes unwrapng methods a, U, no weighting b, SFU, no weighting c, LRSFU, no weighting d Reg Error, U, no weighting e Reg Error, SFU, no weighting f Reg Error, LRSFU, no weighting g, U, h, SFU, i, LRSFU, j Reg Error, U, k Reg Error, SFU, l Reg Error, LRSFU, function will give a strong correlation of the noise In (8, they prefilter the signal to emphasize the frequencies where the signal-to-noise ratio is high Following this reasoning section we propose the following weighting scheme: K (M D ( ( vs $ vs are where is a real number greater than one to emphasize the frequencies with high amplitude over the ones with low amplitude In figure, several plots of the estimated tch shown The actual tch period of the signal is 7ms If we use weighting we can see that becomes more reliable in a bigger region of ( that the regression error becomes a discriminant between a good estimate a bad estimate of the tch period RESULTS are below We can see from parts i l of figure that we can perform several iterations of our method fixing the position of framef, shifting the time delay to the last estimated until certain thresholds This method is what we call Iterative Linear Regressions of the Phase (ILRP To approximate the method to the ideal tch synchronous case, we implement a variation of ILRP where we set the frame length off J their time delay to be equal to the last tch period found at each iteration This method is called Adaptive Frame Length Iterative Linear Regression of the Phase (AFLILRP it is applied only after the first tch period has been successfully found by ILRP This variation avoids jitter tch doubling effects allows the use of a lower value A frame will be labeled as voiced if went below the thresholds before a maximum number of iterations Otherwise, the frame will be labeled as unvoiced For the results in this section we used 64 seconds of speech among male speakers 96 seconds of speech among 2 female speakers Table shows the performance measure in each row for the different phase unwrapng methods in each column Number sts for SFU 2 sts for LRSFU For example, method 2- means LRSFU in ILRP SFU in AFLILRP We also compared the performance of our method with the Cepstrum tch detection method ( the Autocorrelation method ( We used for both SFU LRSFU The performance measures used are gross tch error (PE, voiced-unvoiced error rate (V-UV, unvoiced-voiced error rate (UV-V, gross error count (EC, fine tch errors average (FPEAV fine tch errors stard deviation (FPESD, as defined in (2 Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 26
6 Ramon E Prieto et al Robust Pitch Tracking Table : Performance Of The Pitch Estimation For Different Phase Unwrapng Methods In ILRP And AFLILRP Male Data, Clean Speech Measure ceps ac PE( V-UV( UV-V( EC( FPEAV(ms FPESD(ms Female Data, Clean Speech ceps ac Male Data, SNR = db Measure 2-2 Cepstrum Autocorr V-UV( 84 7 UV-V( EC( FPEAV(ms 27 2 FPESD(ms 4 Female Data, SNR = db Measure 2-2 Cepstrum Autocorr V-UV( UV-V( 7 8 EC( FPEAV(ms FPESD(ms For clean speech, in terms of V-UV UV-V, 2-2 performs the best for male data, while it performs almost the same as -2 cepstrum for female data In terms of EC, FPEAV FPESD, are the best perform almost the same However, 2-2 is faster more efficient in finding out if a segment is voiced or unvoiced For Noisy data at db SNR, 2-2 performs clearly better than cepstrum 2-2 performs considerably better than autocorrelation in the UV-V, EC FPESD measures The high UV-V measure in the autocorrelation method makes it hard to make a comparison regarding V-UV PE CONCLUSIONS We have described a method that uses a time delay estimation technique phase information to detect the tch frequency of a speech signal We brought a new theoretical explanation as to why this method should work, we have described different approaches of phase unwrapng to come with a robust fast finding of the tch We have proposed to eliminate the contribution of unreliable low energy phase components by making a weighted linear regression of the phase As a result, compared to cepstrum autocorrelation, method 2-2 performs better always in terms of EC FPESD, while it performs similarly or better in the rest of the measures depending on if the data is from male or female for both clean db SNR speech REFERENCES [] AM Noll, Cepstrum Pitch Determination, J Acoust Soc America Vol 4, pp 29-9, 967 [2] LR Rabiner, MJ Cheng, AE Rosenberg, CA Mcgonegal, A comparative performance study of several tch detection algorithms, [] Secrest, R Doddington, An integrated tch tracking algorithm for speech systems, Proc IEEE Int Conf Acoustics, Speech, Signal Processing, 98, pp 2- [4] JM Tribolet, A new phase unwrapng algorithm, in IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-2, pp 7-77, 977 [] Michael S rstein, John E Adcock, Harvey F Silverman, A Practical Time-Delay Estimator for Localizing Speech Sources with a Microphone Array, Computer, Speech, Language, April 99, pp -69 [6] Y Chan, R Hattin, J Plant, The Least Squares Estimation of Time Delay Its Use in Signal Detection, IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-24, pp , Jun 978 [7] C Carter, C H Knapp, A H Nutall, Estimation of the Magnitude-Squared Function Via Overlapped Fast Fourier Transform Processing, IEEE Transactions Audio Electroacustics, pp 7-44, Aug 97 [8] C H Knapp, C Carter, The eneralized Correlation Method for Estimation of Time Delay, IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-24, pp 2-26, Aug 976 Melbourne, December 2 to, 22 c Australian Speech Science & Technology Association Inc Accepted after full review page 26
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationDiscrete Fourier Transform (DFT)
Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationCorrespondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 3, MAY 1999 333 Correspondence Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm Sassan Ahmadi and Andreas
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationImproving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research
Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using
More informationReal-Time Digital Hardware Pitch Detector
2 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-24, NO. 1, FEBRUARY 1976 Real-Time Digital Hardware Pitch Detector JOHN J. DUBNOWSKI, RONALD W. SCHAFER, SENIOR MEMBER, IEEE,
More informationVoice Activity Detection for Speech Enhancement Applications
Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationTranscription of Piano Music
Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationMeasurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2
Measurement of values of non-coherently sampled signals Martin ovotny, Milos Sedlacek, Czech Technical University in Prague, Faculty of Electrical Engineering, Dept. of Measurement Technicka, CZ-667 Prague,
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationVIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering
VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,
More informationWIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY
INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationPhase estimation in speech enhancement unimportant, important, or impossible?
IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech
More informationVoice Excited Lpc for Speech Compression by V/Uv Classification
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationWARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS
NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationNon-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationFundamental frequency estimation of speech signals using MUSIC algorithm
Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationREpeating Pattern Extraction Technique (REPET)
REpeating Pattern Extraction Technique (REPET) EECS 32: Machine Perception of Music & Audio Zafar RAFII, Spring 22 Repetition Repetition is a fundamental element in generating and perceiving structure
More informationImplementing Speaker Recognition
Implementing Speaker Recognition Chase Zhou Physics 406-11 May 2015 Introduction Machinery has come to replace much of human labor. They are faster, stronger, and more consistent than any human. They ve
More informationKeywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.
Global Journal of Researches in Engineering: J General Engineering Volume 15 Issue 4 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationMonophony/Polyphony Classification System using Fourier of Fourier Transform
International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye
More informationROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION Frank Kurth, Alessia Cornaggia-Urrigshardt
More informationImproved Detection by Peak Shape Recognition Using Artificial Neural Networks
Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Stefan Wunsch, Johannes Fink, Friedrich K. Jondral Communications Engineering Lab, Karlsruhe Institute of Technology Stefan.Wunsch@student.kit.edu,
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationSpeech Enhancement Using a Mixture-Maximum Model
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationPrinceton ELE 201, Spring 2014 Laboratory No. 2 Shazam
Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam 1 Background In this lab we will begin to code a Shazam-like program to identify a short clip of music using a database of songs. The basic procedure
More informationEnvelope Modulation Spectrum (EMS)
Envelope Modulation Spectrum (EMS) The Envelope Modulation Spectrum (EMS) is a representation of the slow amplitude modulations in a signal and the distribution of energy in the amplitude fluctuations
More informationTRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION
TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION Jian Li 1,2, Shiwei Wang 1,2, Renhua Peng 1,2, Chengshi Zheng 1,2, Xiaodong Li 1,2 1. Communication Acoustics Laboratory, Institute of Acoustics,
More informationAn Efficient Pitch Estimation Method Using Windowless and Normalized Autocorrelation Functions in Noisy Environments
An Efficient Pitch Estimation Method Using Windowless and ormalized Autocorrelation Functions in oisy Environments M. A. F. M. Rashidul Hasan, and Tetsuya Shimamura Abstract In this paper, a pitch estimation
More informationSpeech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationADDITIVE synthesis [1] is the original spectrum modeling
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,
More informationA variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP
7 3rd International Conference on Computational Systems and Communications (ICCSC 7) A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP Hongyu Chen College of Information
More informationVariable Step-Size LMS Adaptive Filters for CDMA Multiuser Detection
FACTA UNIVERSITATIS (NIŠ) SER.: ELEC. ENERG. vol. 7, April 4, -3 Variable Step-Size LMS Adaptive Filters for CDMA Multiuser Detection Karen Egiazarian, Pauli Kuosmanen, and Radu Ciprian Bilcu Abstract:
More informationAccurate Delay Measurement of Coded Speech Signals with Subsample Resolution
PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,
More informationLive multi-track audio recording
Live multi-track audio recording Joao Luiz Azevedo de Carvalho EE522 Project - Spring 2007 - University of Southern California Abstract In live multi-track audio recording, each microphone perceives sound
More informationA Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion
American Journal of Applied Sciences 5 (4): 30-37, 008 ISSN 1546-939 008 Science Publications A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion Zayed M. Ramadan
More informationENF PHASE DISCONTINUITY DETECTION BASED ON MULTI-HARMONICS ANALYSIS
U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 4, 2015 ISSN 2286-3540 ENF PHASE DISCONTINUITY DETECTION BASED ON MULTI-HARMONICS ANALYSIS Valentin A. NIŢĂ 1, Amelia CIOBANU 2, Robert Al. DOBRE 3, Cristian
More informationEmanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas
Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor Presented by Amir Kiperwas 1 M-element microphone array One desired source One undesired source Ambient noise field Signals: Broadband Mutually
More informationReal-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p.
Title Real-time fundamental frequency estimation by least-square fitting Author(s) Choi, AKO Citation IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p. 201-205 Issued Date 1997 URL
More information/$ IEEE
614 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009 Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals B. Yegnanarayana, Senior Member,
More informationWindows Connections. Preliminaries
Windows Connections Dale B. Dalrymple Next Annual comp.dsp Conference 21425 Corrections Preliminaries The approach in this presentation Take aways Window types Window relationships Windows tables of information
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationSeparating Voiced Segments from Music File using MFCC, ZCR and GMM
Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.
More informationUsing sound levels for location tracking
Using sound levels for location tracking Sasha Ames sasha@cs.ucsc.edu CMPE250 Multimedia Systems University of California, Santa Cruz Abstract We present an experiemnt to attempt to track the location
More informationDistortion products and the perceived pitch of harmonic complex tones
Distortion products and the perceived pitch of harmonic complex tones D. Pressnitzer and R.D. Patterson Centre for the Neural Basis of Hearing, Dept. of Physiology, Downing street, Cambridge CB2 3EG, U.K.
More informationRobust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping
100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru
More informationMidterm Examination CS 534: Computational Photography
Midterm Examination CS 534: Computational Photography November 3, 2015 NAME: SOLUTIONS Problem Score Max Score 1 8 2 8 3 9 4 4 5 3 6 4 7 6 8 13 9 7 10 4 11 7 12 10 13 9 14 8 Total 100 1 1. [8] What are
More informationON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1
ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationSignal Processing for Digitizers
Signal Processing for Digitizers Modular digitizers allow accurate, high resolution data acquisition that can be quickly transferred to a host computer. Signal processing functions, applied in the digitizer
More informationA system for automatic detection and correction of detuned singing
A system for automatic detection and correction of detuned singing M. Lech and B. Kostek Gdansk University of Technology, Multimedia Systems Department, /2 Gabriela Narutowicza Street, 80-952 Gdansk, Poland
More informationA Faster Method for Accurate Spectral Testing without Requiring Coherent Sampling
A Faster Method for Accurate Spectral Testing without Requiring Coherent Sampling Minshun Wu 1,2, Degang Chen 2 1 Xi an Jiaotong University, Xi an, P. R. China 2 Iowa State University, Ames, IA, USA Abstract
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationCarrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm
Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Seare H. Rezenom and Anthony D. Broadhurst, Member, IEEE Abstract-- Wideband Code Division Multiple Access (WCDMA)
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationFOURIER analysis is a well-known method for nonparametric
386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationDesign and Implementation on a Sub-band based Acoustic Echo Cancellation Approach
Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper
More informationS PG Course in Radio Communications. Orthogonal Frequency Division Multiplexing Yu, Chia-Hao. Yu, Chia-Hao 7.2.
S-72.4210 PG Course in Radio Communications Orthogonal Frequency Division Multiplexing Yu, Chia-Hao chyu@cc.hut.fi 7.2.2006 Outline OFDM History OFDM Applications OFDM Principles Spectral shaping Synchronization
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationLecture 9: Time & Pitch Scaling
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM) 2. Time-Domain Approaches 3. The Phase Vocoder 4. Sinusoidal Approach Dan Ellis Dept. Electrical Engineering,
More informationAudio Watermarking Based on Multiple Echoes Hiding for FM Radio
INTERSPEECH 2014 Audio Watermarking Based on Multiple Echoes Hiding for FM Radio Xuejun Zhang, Xiang Xie Beijing Institute of Technology Zhangxuejun0910@163.com,xiexiang@bit.edu.cn Abstract An audio watermarking
More informationHUMAN speech is frequently encountered in several
1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,
More informationTime Delay Estimation: Applications and Algorithms
Time Delay Estimation: Applications and Algorithms Hing Cheung So http://www.ee.cityu.edu.hk/~hcso Department of Electronic Engineering City University of Hong Kong H. C. So Page 1 Outline Introduction
More informationReducing comb filtering on different musical instruments using time delay estimation
Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering
More information