Advances in Speech Signal Processing for Voice Quality Assessment

Size: px
Start display at page:

Download "Advances in Speech Signal Processing for Voice Quality Assessment"

Transcription

1 Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab Bilbao, 2011 September

2 1 Multi-linear Algebra Features selection 2 Introduction Application: Vocal Fatigue 3 4

3 Multi-linear Algebra Features selection

4 In equations Multi-linear Algebra Features selection First Step: STFT where: X m (k) = n= k = 0,..., I 1 1, h(mm n)x(n)w kn I 1, I 1 denotes the number of frequency bins in the acoustic frequency axis, W I1 = exp ( jπ/i 1 ), M is the shift parameter (or, hop size) in the computation of the STFT, h(n) is the acoustic frequency analysis window.

5 In equations Multi-linear Algebra Features selection Second Step: frequencies estimation of the Subband Envelopes where: X l (k, i) = m= i = 0,..., I 2 1, g(ll m) X m (k) W im I 2, I 2 is the number of frequency bins along the modulation frequency axis, W I2 = exp ( j(f M /F s ) π/i 2 ), f M and F s denoting the maximum modulation frequency we search for, and the sampling frequency, respectively, L is the shift parameter of the second STFT, and g(m) is the modulation frequency analysis window.

6 Example I: one speaker (left), mean of speakers (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Frequency (khz) # Speakers Pitch (Hz) frequency (Hz)

7 Example II: polyps (left), spasmodic dysphonia (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Energy Frequency (khz) frequency (Hz) Pitch energy

8 Example III: keratosis (left), nodules (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Energy Frequency (khz) frequency (Hz) Pitch energy

9 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection 1 Create tensors: D R I 1 I 2 I 3 2 Decompose of tensor D to its n mode singular vectors: D = S 1 U af 2 U mf 3 U samples where S and U are referred to as core tensor and unitary matrix, respectively and n denotes the n mode product. 3 Rank the n mode singular values 4 Near-optimal projections (PCs): truncate Singular Matrices so that we keep a% energy of D

10 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection n mode singular vectors: Let consider tensor D R I 1 I 2 I 3 Unfold D to D (n) : 1 I1 I 2 I 3 matrix D (1) 2 I2 I 3 I 1 matrix D (2) 3 I3 I 1 I 2 matrix D (3) The n-mode singular values and vectors: SVD of D (n).

11 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Definition (Unitary matrix) An (I n I n ) unitary matrix U (n), n = 1, 2, 3, contains the n-mode singular vectors (SVs): [ ] U (n) = U (n) 1 U (n) 2... U (n) I n. (1) Each matrix U (n) can directly be obtained as the matrix of left singular vectors of the matrix unfolding D (n) of D along the corresponding mode.

12 Dimensionality Reduction, HO-SVD D = S 1 U af 2 U mf 3 U samples Multi-linear Algebra Features selection S is referred to as core tensor (same dimensions as D) U af R I 1 I 1, is the unitary matrix of the acoustic frequency subspace. U mf R I 2 I 2, is the unitary matrix of the modulation frequency subspace. U s R I 3 I 3 is the samples subspace matrix. n denotes n mode product.

13 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Defining n-product S n U (n) : S R I 1 I 2 I 3 U (n) R In In Example; for n = 2 this is an (I 1 I 2 I 3 ) tensor given by ( S 2 U (2)) def = i 1 i 2 i 3 I 2 i 2 =1 s i1 i 2 i 3 u i2 i 2.

14 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection 1 Create tensors: D R I 1 I 2 I 3 and decompose it to its n mode singular vectors: D = S 1 U af 2 U mf 3 U samples 2 Rank the n mode singular values 3 Near-optimal projections (PCs): truncate Singular Matrices so that we keep a% energy of D

15 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Contribution of the j th n-mode singular vector U (n) j : α n,j = λ n,j / I n λ n,j j=1 where λ n,j is the corresponding singular value Put a threshold on α n,j and retain the R n (n = 1, 2) singular vectors Truncate matrices: Û (1) Û af R I 1 R 1 and Û (2) Ûmf R I 2 R 2 Project new MS data on to the truncated matrices: Z = Û T af B Û mf where B X l (k, i) R I 1 I 2 and Z R R 1 R 2

16 Redundancy Reduction with HOSVD 10 1 Redundancy: packed features Redundancy: original features Multi-linear Algebra Features selection P.D.F. of MI values Extrapolated MI

17 Mutual Information Multi-linear Algebra Features selection Mutual Information between two random variables x i and x j is defined as: [ ] Pij (x i, x j ) I (x i ; x j ) = dx i dx j P ij (x i, x j ) log 2 P i (x i )P j (x j ) where P ij (x i, x j ) denotes the joint probability density function (pdf) P i (x i ) and P j (x j ) denote the marginal pdfs

18 Maximal Relevance Criterion Multi-linear Algebra Features selection Select the most relevant features to the target class c: 1 Compute the mutual information I (xj ; c) between feature x j and class c 2 Rank all the computed I (xj ; c) 3 Select the top m features

19 Database & Conditions Multi-linear Algebra Features selection Sustained vowel /AH/ from MEEI Subset of the database (53 normophonic, 173 dysphonic speakers) Signals sampled at 25 khz Classifier: SVM with a radial basis function (RBF) kernel 4-fold stratified cross-validation, repeated 400 times Training/Testing: 75%25% Decision per segment Evaluation: Detection Error Trade-off (DET) curves

20 Feature extraction Multi-linear Algebra Features selection Data tensor D R Û af R Û mf R Z R 34 34

21 Results: Detection Multi-linear Algebra Features selection Normophonic/Dysphonic: Optimal detection accuracy (DCF opt ): 94.08% (±0.86) using the top m = 25 features (AUC = 97.75% in terms of ROC)

22 Results: Classification Multi-linear Algebra Features selection Classify: vocal fold polyp, adductor spasmodic dysphonia, keratosis leukoplakia, and vocal nodules MSMR FD-GA DCF opt (%) AUC (%) m DR (%) Pol/Add ± Pol/Ker ± Pol/Mod ± where: FD-GA stands for Fisher distance and Genetic Algorithms (Hosseini et al. 2008)

23 MEEI: comparison 60 MFCC SVM maxrel maxcontrib Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)

24 MEEI: fusion 60 MFCC mrms Fusion Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)

25 PdA: fusion 60 MFCC mrms Fusion Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)

26 Cross-database experiment Multi-linear Algebra Features selection Train on PdA, test on MEEI Miss probability (in %) False Alarm probability (in %) MFCC mrms Fusion

27 Cross-database experiment Multi-linear Algebra Features selection MFCC MRMS Fusion MEEI (125) 3.63 PdA (125) PdA-MEEI (125) MEEI-PdA (450) 21.86

28 for the work on Spectra Multi-linear Algebra Features selection 1 Maria Markaki and : Voice Pathology Detection and Discrimination based on Spectral Features. IEEE Trans. on Audio, Speech and Language Processing. TASL , Jan J.D. Arias-Londono, J.I. Godino-Llorente, M. Markaki, and Y. : On combining information from Spectra and Mel-Frequency Cepstral coefficients for Automatic Detection of Pathological Voices. Logopedics, Phoniatrics, Vocology (LPV), Nov Maria Markaki and : Discrimination of Speech from Nonspeeech in Broadcast News Based on Frequency Features Speech Communication, Special Issue on Speech Communication on Perceptual and Statistical Audition,

29 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

30 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

31 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

32 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

33 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

34 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

35 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.

36 Vocal Introduction Application: Vocal Fatigue Use of an AM-FM decomposition algorithm based on the adaptive time-varying quasi-harmonic model for speech. High resolution in Time-Frequency plane. of Vocal for any sinusoidal component of speech. Time dependent Vocal estimations.

37 AM-FM Decomposition using aqhm Introduction Application: Vocal Fatigue Speech is modeled as a sum of AM-FM sinusoids: s(t) = K a k (t)cos(φ k (t)) k=1 K is the number of components, a k (t) is the instantaneous amplitude of the k th sinusoid, φ k (t) is the instantaneous phase of the k th sinusoid, and f k (t) = 1 sinusoid. 2π dφ k (t) dt is the instantaneous frequency of the k th AM-FM decomposition algorithm tries to estimate the instantaneous components.

38 Example of AM-FM decomposition on Speech Introduction Application: Vocal Fatigue Frequency (Hz) Time (s)

39 Preprocessing of Inst. Component Introduction Application: Vocal Fatigue Downsample inst. component to f s = 1000Hz Remove the very slow (< 2Hz) modulations of the instantaneous component. This is performed by Savinzky-Golay smoothing filter. S-G smoothing filter performs a local polynomial regression. S-G filter parameters: 4th order polynomial & 1sec frame size. Advantage: Preserve features of the time-series such as relative maxima, minima and width.

40 S-G Filter Output Introduction Application: Vocal Fatigue Frequency (Hz) Magnitude Time (s) (a) Frequency (Hz) (b)

41 Compute Frequency & Level Introduction Application: Vocal Fatigue Assuming that the processed inst. component has a single but time-varying modulation frequency and modulation level. x(t) = m(t)cos(ψ(t)) Apply for second time the AM-FM dec. alg. to the processed inst. component. Thus, 1 dψ(t) frequency, 2π dt, is estimated from the FM component of AM-FM dec. alg. level, m(t), is estimated from the respective AM component.

42 Compute Frequency & Level Introduction Application: Vocal Fatigue Frequency (Hz) Magnitude Time (s) (a) Frequency (Hz) (b)

43 Compute Frequency & Level Freq. (Hz) Introduction Application: Vocal Fatigue Level (%) Time (s) (a) Time (s) (b)

44 Voice Fatigue and Acoustic Features of Vocal Loading Introduction Application: Vocal Fatigue Voice Fatigue Strain of the laryngeal tissues. Relation between occupational voice fatigue and voice pathologies. Acoustic Features Fundamental frequency raise. Sound pressure raise. Vocal tremor attributes raise (Boucher et 2008) strain of the laryngeal muscles may affect the speaker s ability to sustain constant tension of the vocal folds.

45 Examining the Relationship between Vocal Loading and Attributes Introduction Application: Vocal Fatigue Estimating vocal tremor attributes: extract instantaneous frequency and instantaneous amplitude. Comparing vocal tremor attributes before and after vocal loading: compare the modulation frequencies and the modulation levels of two voiced signals of the same speaker before and after vocal loading.

46 Definitions Introduction Application: Vocal Fatigue Vocal Loading Amplitude Indicator (VLAI) = Mean modulation level after loading - Mean modulation level before loading. Vocal Loading Frequency Indicator (VLFI) = Mean modulation frequency after loading - Mean modulation frequency before loading. positive value: increase of vocal tremor attributes possible degradation of voice. negative value: decrease of vocal tremor attributes possible enhancement of voice.

47 DB1: Comparing VLAI and VLFI to Subjective Evaluations (SE) Introduction Application: Vocal Fatigue Female Male Speakerid VLAI VLFI Student SE Trainer SE Pre:Post : : : : : :-2 Reminder: Student SE: 0 (no tired) to -3 (very tired). Trainer SE: -3 (being very poor voice) to +3 (being excellent).

48 Summary for this task Introduction Application: Vocal Fatigue No relation seems to be between vocal loading and voice tremor. There is a correlation between objective and subjective evaluations for voice quality assessment.

49 for the work on Introduction Application: Vocal Fatigue 1 Pantazis, Maria Koutsoyannaki and : A novel method for the extraction of vocal tremor, MAVEBA-2009, Florence, Italy, Dec, Maria Koutsoyannaki, Pantazis,, and Philippe Dejonckere: in speakers with spasmodic dysphonia, MAVEBA-2011, Florence Italy, Aug 2011

50 My students: Maria Markaki, Maria Koutsoyannaki My ex-student: Pantazis. Prof. Juan Ignacio Godino-Llorente, and J.D. Arias-Londono (PhD) (UPM, Spain) Prof. Anne-Maria Laukkanen (Un. of Tampere, Finland) for providing the database with vocal fatigue examples.

51 THANK YOU for your attention

52

Voice Pathology Detection and Discrimination based on Modulation Spectral Features

Voice Pathology Detection and Discrimination based on Modulation Spectral Features Voice Pathology Detection and Discrimination based on Modulation Spectral Features Maria Markaki, Student Member, IEEE, and Yannis Stylianou, Member, IEEE 1 Abstract In this paper, we explore the information

More information

For Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features

For Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features is obtained. Based on the second approach, spectral related features have been defined such as the spectral flatness of the inverse filter (SFF) and the spectral flatness of the residue signal (SFR) [].

More information

Discrimination of Speech from Nonspeeech in Broadcast News Based on Modulation Frequency Features

Discrimination of Speech from Nonspeeech in Broadcast News Based on Modulation Frequency Features Discrimination of Speech from Nonspeeech in Broadcast News Based on Modulation Frequency Features Maria Markaki a, Yannis Stylianou a,b a Computer Science Department, University of Crete, Greece b Institute

More information

Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices

Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

A Full-Band Adaptive Harmonic Representation of Speech

A Full-Band Adaptive Harmonic Representation of Speech A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y.

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

An Optimization of Audio Classification and Segmentation using GASOM Algorithm An Optimization of Audio Classification and Segmentation using GASOM Algorithm Dabbabi Karim, Cherif Adnen Research Unity of Processing and Analysis of Electrical and Energetic Systems Faculty of Sciences

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Acoustic Tremor Measurement: Comparing Two Systems

Acoustic Tremor Measurement: Comparing Two Systems Acoustic Tremor Measurement: Comparing Two Systems Markus Brückl Elvira Ibragimova Silke Bögelein Institute for Language and Communication Technische Universität Berlin 10 th International Workshop on

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

SPEECH - NONSPEECH DISCRIMINATION BASED ON SPEECH-RELEVANT SPECTROGRAM MODULATIONS

SPEECH - NONSPEECH DISCRIMINATION BASED ON SPEECH-RELEVANT SPECTROGRAM MODULATIONS 5th European Signal Processing Conference (EUSIPCO 27), Poznan, Poland, September 3-7, 27, copyright by EURASIP SPEECH - NONSPEECH DISCRIMINATION BASED ON SPEECH-RELEVANT SPECTROGRAM MODULATIONS Michael

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING Ryan Stables [1], Dr. Jamie Bullock [2], Dr. Cham Athwal [3] [1] Institute of Digital Experience, Birmingham City University,

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Distributed Speech Recognition Standardization Activity

Distributed Speech Recognition Standardization Activity Distributed Speech Recognition Standardization Activity Alex Sorin, Ron Hoory, Dan Chazan Telecom and Media Systems Group June 30, 2003 IBM Research Lab in Haifa Advanced Speech Enabled Services ASR App

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

Speech/Music Discrimination via Energy Density Analysis

Speech/Music Discrimination via Energy Density Analysis Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS Hui Su, Ravi Garg, Adi Hajj-Ahmad, and Min Wu {hsu, ravig, adiha, minwu}@umd.edu University of Maryland, College Park ABSTRACT Electric Network (ENF) based forensic

More information

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING K.Ramalakshmi Assistant Professor, Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore R.N.Devendra Kumar Assistant

More information

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music Tuomas Virtanen, Annamaria Mesaros, Matti Ryynänen Department of Signal Processing,

More information

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1 ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

Indoor Location Detection

Indoor Location Detection Indoor Location Detection Arezou Pourmir Abstract: This project is a classification problem and tries to distinguish some specific places from each other. We use the acoustic waves sent from the speaker

More information

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM

CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM Nuri F. Ince 1, Fikri Goksu 1, Ahmed H. Tewfik 1, Ibrahim Onaran 2, A. Enis Cetin 2, Tom

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Adaptive noise level estimation

Adaptive noise level estimation Adaptive noise level estimation Chunghsin Yeh, Axel Roebel To cite this version: Chunghsin Yeh, Axel Roebel. Adaptive noise level estimation. Workshop on Computer Music and Audio Technology (WOCMAT 6),

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

CHAPTER 8 MIMO. Xijun Wang

CHAPTER 8 MIMO. Xijun Wang CHAPTER 8 MIMO Xijun Wang WEEKLY READING 1. Goldsmith, Wireless Communications, Chapters 10 2. Tse, Fundamentals of Wireless Communication, Chapter 7-10 2 MIMO 3 BENEFITS OF MIMO n Array gain The increase

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Discriminative methods for the detection of voice disorders 1

Discriminative methods for the detection of voice disorders 1 ISCA Archive http://www.isca-speech.org/archive ITRW on Nonlinear Speech Processing (NOLISP 05) Barcelona, Spain April 19-22, 2005 Discriminative methods for the detection of voice disorders 1 Juan Ignacio

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial

More information

Efficient Target Detection from Hyperspectral Images Based On Removal of Signal Independent and Signal Dependent Noise

Efficient Target Detection from Hyperspectral Images Based On Removal of Signal Independent and Signal Dependent Noise IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 6, Ver. III (Nov - Dec. 2014), PP 45-49 Efficient Target Detection from Hyperspectral

More information

Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi

Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Abstract Voices from patients with voice disordered tend to be less periodic and contain larger perturbations.

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA

FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA Enayet B. Halim M. A. A. Shoukat Choudhury Sirish L. Shah, Ming J. Zuo Chemical and Materials Engineering Department, University

More information

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure

More information

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems

Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Jesús Villalba and Eduardo Lleida Communications Technology Group (GTC), Aragon Institute for Engineering Research (I3A),

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION TE 302 DISCRETE SIGNALS AND SYSTEMS Study on the behavior and processing of information bearing functions as they are currently used in human communication and the systems involved. Chapter 1: INTRODUCTION

More information

Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex

Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex Shihab Shamma Jonathan Simon* Didier Depireux David Klein Institute for Systems Research & Department of Electrical Engineering

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru

More information

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain Determination o Pitch Range Based on Onset and Oset Analysis in Modulation Frequency Domain A. Mahmoodzadeh Speech Proc. Research Lab ECE Dept. Yazd University Yazd, Iran H. R. Abutalebi Speech Proc. Research

More information

A Comparative Study of Formant Frequencies Estimation Techniques

A Comparative Study of Formant Frequencies Estimation Techniques A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax

More information

Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA

Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA Muhammad WAQAS, Shouhei KIDERA, and Tetsuo KIRIMOTO Graduate School of Electro-Communications, University of Electro-Communications

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information