Advances in Speech Signal Processing for Voice Quality Assessment
|
|
- Lenard Nelson
- 5 years ago
- Views:
Transcription
1 Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab Bilbao, 2011 September
2 1 Multi-linear Algebra Features selection 2 Introduction Application: Vocal Fatigue 3 4
3 Multi-linear Algebra Features selection
4 In equations Multi-linear Algebra Features selection First Step: STFT where: X m (k) = n= k = 0,..., I 1 1, h(mm n)x(n)w kn I 1, I 1 denotes the number of frequency bins in the acoustic frequency axis, W I1 = exp ( jπ/i 1 ), M is the shift parameter (or, hop size) in the computation of the STFT, h(n) is the acoustic frequency analysis window.
5 In equations Multi-linear Algebra Features selection Second Step: frequencies estimation of the Subband Envelopes where: X l (k, i) = m= i = 0,..., I 2 1, g(ll m) X m (k) W im I 2, I 2 is the number of frequency bins along the modulation frequency axis, W I2 = exp ( j(f M /F s ) π/i 2 ), f M and F s denoting the maximum modulation frequency we search for, and the sampling frequency, respectively, L is the shift parameter of the second STFT, and g(m) is the modulation frequency analysis window.
6 Example I: one speaker (left), mean of speakers (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Frequency (khz) # Speakers Pitch (Hz) frequency (Hz)
7 Example II: polyps (left), spasmodic dysphonia (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Energy Frequency (khz) frequency (Hz) Pitch energy
8 Example III: keratosis (left), nodules (right) Multi-linear Algebra Features selection Energy Frequency (khz) frequency (Hz) Pitch energy Energy Frequency (khz) frequency (Hz) Pitch energy
9 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection 1 Create tensors: D R I 1 I 2 I 3 2 Decompose of tensor D to its n mode singular vectors: D = S 1 U af 2 U mf 3 U samples where S and U are referred to as core tensor and unitary matrix, respectively and n denotes the n mode product. 3 Rank the n mode singular values 4 Near-optimal projections (PCs): truncate Singular Matrices so that we keep a% energy of D
10 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection n mode singular vectors: Let consider tensor D R I 1 I 2 I 3 Unfold D to D (n) : 1 I1 I 2 I 3 matrix D (1) 2 I2 I 3 I 1 matrix D (2) 3 I3 I 1 I 2 matrix D (3) The n-mode singular values and vectors: SVD of D (n).
11 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Definition (Unitary matrix) An (I n I n ) unitary matrix U (n), n = 1, 2, 3, contains the n-mode singular vectors (SVs): [ ] U (n) = U (n) 1 U (n) 2... U (n) I n. (1) Each matrix U (n) can directly be obtained as the matrix of left singular vectors of the matrix unfolding D (n) of D along the corresponding mode.
12 Dimensionality Reduction, HO-SVD D = S 1 U af 2 U mf 3 U samples Multi-linear Algebra Features selection S is referred to as core tensor (same dimensions as D) U af R I 1 I 1, is the unitary matrix of the acoustic frequency subspace. U mf R I 2 I 2, is the unitary matrix of the modulation frequency subspace. U s R I 3 I 3 is the samples subspace matrix. n denotes n mode product.
13 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Defining n-product S n U (n) : S R I 1 I 2 I 3 U (n) R In In Example; for n = 2 this is an (I 1 I 2 I 3 ) tensor given by ( S 2 U (2)) def = i 1 i 2 i 3 I 2 i 2 =1 s i1 i 2 i 3 u i2 i 2.
14 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection 1 Create tensors: D R I 1 I 2 I 3 and decompose it to its n mode singular vectors: D = S 1 U af 2 U mf 3 U samples 2 Rank the n mode singular values 3 Near-optimal projections (PCs): truncate Singular Matrices so that we keep a% energy of D
15 Dimensionality Reduction, HO-SVD Multi-linear Algebra Features selection Contribution of the j th n-mode singular vector U (n) j : α n,j = λ n,j / I n λ n,j j=1 where λ n,j is the corresponding singular value Put a threshold on α n,j and retain the R n (n = 1, 2) singular vectors Truncate matrices: Û (1) Û af R I 1 R 1 and Û (2) Ûmf R I 2 R 2 Project new MS data on to the truncated matrices: Z = Û T af B Û mf where B X l (k, i) R I 1 I 2 and Z R R 1 R 2
16 Redundancy Reduction with HOSVD 10 1 Redundancy: packed features Redundancy: original features Multi-linear Algebra Features selection P.D.F. of MI values Extrapolated MI
17 Mutual Information Multi-linear Algebra Features selection Mutual Information between two random variables x i and x j is defined as: [ ] Pij (x i, x j ) I (x i ; x j ) = dx i dx j P ij (x i, x j ) log 2 P i (x i )P j (x j ) where P ij (x i, x j ) denotes the joint probability density function (pdf) P i (x i ) and P j (x j ) denote the marginal pdfs
18 Maximal Relevance Criterion Multi-linear Algebra Features selection Select the most relevant features to the target class c: 1 Compute the mutual information I (xj ; c) between feature x j and class c 2 Rank all the computed I (xj ; c) 3 Select the top m features
19 Database & Conditions Multi-linear Algebra Features selection Sustained vowel /AH/ from MEEI Subset of the database (53 normophonic, 173 dysphonic speakers) Signals sampled at 25 khz Classifier: SVM with a radial basis function (RBF) kernel 4-fold stratified cross-validation, repeated 400 times Training/Testing: 75%25% Decision per segment Evaluation: Detection Error Trade-off (DET) curves
20 Feature extraction Multi-linear Algebra Features selection Data tensor D R Û af R Û mf R Z R 34 34
21 Results: Detection Multi-linear Algebra Features selection Normophonic/Dysphonic: Optimal detection accuracy (DCF opt ): 94.08% (±0.86) using the top m = 25 features (AUC = 97.75% in terms of ROC)
22 Results: Classification Multi-linear Algebra Features selection Classify: vocal fold polyp, adductor spasmodic dysphonia, keratosis leukoplakia, and vocal nodules MSMR FD-GA DCF opt (%) AUC (%) m DR (%) Pol/Add ± Pol/Ker ± Pol/Mod ± where: FD-GA stands for Fisher distance and Genetic Algorithms (Hosseini et al. 2008)
23 MEEI: comparison 60 MFCC SVM maxrel maxcontrib Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)
24 MEEI: fusion 60 MFCC mrms Fusion Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)
25 PdA: fusion 60 MFCC mrms Fusion Multi-linear Algebra Features selection Miss probability (in %) False Alarm probability (in %)
26 Cross-database experiment Multi-linear Algebra Features selection Train on PdA, test on MEEI Miss probability (in %) False Alarm probability (in %) MFCC mrms Fusion
27 Cross-database experiment Multi-linear Algebra Features selection MFCC MRMS Fusion MEEI (125) 3.63 PdA (125) PdA-MEEI (125) MEEI-PdA (450) 21.86
28 for the work on Spectra Multi-linear Algebra Features selection 1 Maria Markaki and : Voice Pathology Detection and Discrimination based on Spectral Features. IEEE Trans. on Audio, Speech and Language Processing. TASL , Jan J.D. Arias-Londono, J.I. Godino-Llorente, M. Markaki, and Y. : On combining information from Spectra and Mel-Frequency Cepstral coefficients for Automatic Detection of Pathological Voices. Logopedics, Phoniatrics, Vocology (LPV), Nov Maria Markaki and : Discrimination of Speech from Nonspeeech in Broadcast News Based on Frequency Features Speech Communication, Special Issue on Speech Communication on Perceptual and Statistical Audition,
29 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
30 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
31 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
32 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
33 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
34 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
35 Define Vocal Introduction Application: Vocal Fatigue Vocal : Involuntary modulations of frequency and/or amplitude in sustained phonation. Pathological & Physiological Vocal. Pathological : From diseases like Parkinson, essential tremor, etc. Strong motor synchronization. Physiological : Natural stochastic modulations in the interval [2, 15]Hz with low amplitude. Acoustic Vocal Attributes: Frequency: How fast are the modulations. Level: How strong are the modulations.
36 Vocal Introduction Application: Vocal Fatigue Use of an AM-FM decomposition algorithm based on the adaptive time-varying quasi-harmonic model for speech. High resolution in Time-Frequency plane. of Vocal for any sinusoidal component of speech. Time dependent Vocal estimations.
37 AM-FM Decomposition using aqhm Introduction Application: Vocal Fatigue Speech is modeled as a sum of AM-FM sinusoids: s(t) = K a k (t)cos(φ k (t)) k=1 K is the number of components, a k (t) is the instantaneous amplitude of the k th sinusoid, φ k (t) is the instantaneous phase of the k th sinusoid, and f k (t) = 1 sinusoid. 2π dφ k (t) dt is the instantaneous frequency of the k th AM-FM decomposition algorithm tries to estimate the instantaneous components.
38 Example of AM-FM decomposition on Speech Introduction Application: Vocal Fatigue Frequency (Hz) Time (s)
39 Preprocessing of Inst. Component Introduction Application: Vocal Fatigue Downsample inst. component to f s = 1000Hz Remove the very slow (< 2Hz) modulations of the instantaneous component. This is performed by Savinzky-Golay smoothing filter. S-G smoothing filter performs a local polynomial regression. S-G filter parameters: 4th order polynomial & 1sec frame size. Advantage: Preserve features of the time-series such as relative maxima, minima and width.
40 S-G Filter Output Introduction Application: Vocal Fatigue Frequency (Hz) Magnitude Time (s) (a) Frequency (Hz) (b)
41 Compute Frequency & Level Introduction Application: Vocal Fatigue Assuming that the processed inst. component has a single but time-varying modulation frequency and modulation level. x(t) = m(t)cos(ψ(t)) Apply for second time the AM-FM dec. alg. to the processed inst. component. Thus, 1 dψ(t) frequency, 2π dt, is estimated from the FM component of AM-FM dec. alg. level, m(t), is estimated from the respective AM component.
42 Compute Frequency & Level Introduction Application: Vocal Fatigue Frequency (Hz) Magnitude Time (s) (a) Frequency (Hz) (b)
43 Compute Frequency & Level Freq. (Hz) Introduction Application: Vocal Fatigue Level (%) Time (s) (a) Time (s) (b)
44 Voice Fatigue and Acoustic Features of Vocal Loading Introduction Application: Vocal Fatigue Voice Fatigue Strain of the laryngeal tissues. Relation between occupational voice fatigue and voice pathologies. Acoustic Features Fundamental frequency raise. Sound pressure raise. Vocal tremor attributes raise (Boucher et 2008) strain of the laryngeal muscles may affect the speaker s ability to sustain constant tension of the vocal folds.
45 Examining the Relationship between Vocal Loading and Attributes Introduction Application: Vocal Fatigue Estimating vocal tremor attributes: extract instantaneous frequency and instantaneous amplitude. Comparing vocal tremor attributes before and after vocal loading: compare the modulation frequencies and the modulation levels of two voiced signals of the same speaker before and after vocal loading.
46 Definitions Introduction Application: Vocal Fatigue Vocal Loading Amplitude Indicator (VLAI) = Mean modulation level after loading - Mean modulation level before loading. Vocal Loading Frequency Indicator (VLFI) = Mean modulation frequency after loading - Mean modulation frequency before loading. positive value: increase of vocal tremor attributes possible degradation of voice. negative value: decrease of vocal tremor attributes possible enhancement of voice.
47 DB1: Comparing VLAI and VLFI to Subjective Evaluations (SE) Introduction Application: Vocal Fatigue Female Male Speakerid VLAI VLFI Student SE Trainer SE Pre:Post : : : : : :-2 Reminder: Student SE: 0 (no tired) to -3 (very tired). Trainer SE: -3 (being very poor voice) to +3 (being excellent).
48 Summary for this task Introduction Application: Vocal Fatigue No relation seems to be between vocal loading and voice tremor. There is a correlation between objective and subjective evaluations for voice quality assessment.
49 for the work on Introduction Application: Vocal Fatigue 1 Pantazis, Maria Koutsoyannaki and : A novel method for the extraction of vocal tremor, MAVEBA-2009, Florence, Italy, Dec, Maria Koutsoyannaki, Pantazis,, and Philippe Dejonckere: in speakers with spasmodic dysphonia, MAVEBA-2011, Florence Italy, Aug 2011
50 My students: Maria Markaki, Maria Koutsoyannaki My ex-student: Pantazis. Prof. Juan Ignacio Godino-Llorente, and J.D. Arias-Londono (PhD) (UPM, Spain) Prof. Anne-Maria Laukkanen (Un. of Tampere, Finland) for providing the database with vocal fatigue examples.
51 THANK YOU for your attention
52
Voice Pathology Detection and Discrimination based on Modulation Spectral Features
Voice Pathology Detection and Discrimination based on Modulation Spectral Features Maria Markaki, Student Member, IEEE, and Yannis Stylianou, Member, IEEE 1 Abstract In this paper, we explore the information
More informationFor Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features
is obtained. Based on the second approach, spectral related features have been defined such as the spectral flatness of the inverse filter (SFF) and the spectral flatness of the residue signal (SFR) [].
More informationDiscrimination of Speech from Nonspeeech in Broadcast News Based on Modulation Frequency Features
Discrimination of Speech from Nonspeeech in Broadcast News Based on Modulation Frequency Features Maria Markaki a, Yannis Stylianou a,b a Computer Science Department, University of Crete, Greece b Institute
More informationNovel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices
Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationA Full-Band Adaptive Harmonic Representation of Speech
A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y.
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationChange Point Determination in Audio Data Using Auditory Features
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features
More informationIntroducing COVAREP: A collaborative voice analysis repository for speech technologies
Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationAN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH
AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationFundamental frequency estimation of speech signals using MUSIC algorithm
Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,
More informationAn Optimization of Audio Classification and Segmentation using GASOM Algorithm
An Optimization of Audio Classification and Segmentation using GASOM Algorithm Dabbabi Karim, Cherif Adnen Research Unity of Processing and Analysis of Electrical and Energetic Systems Faculty of Sciences
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationAcoustic Tremor Measurement: Comparing Two Systems
Acoustic Tremor Measurement: Comparing Two Systems Markus Brückl Elvira Ibragimova Silke Bögelein Institute for Language and Communication Technische Universität Berlin 10 th International Workshop on
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationVQ Source Models: Perceptual & Phase Issues
VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationSPEECH - NONSPEECH DISCRIMINATION BASED ON SPEECH-RELEVANT SPECTROGRAM MODULATIONS
5th European Signal Processing Conference (EUSIPCO 27), Poznan, Poland, September 3-7, 27, copyright by EURASIP SPEECH - NONSPEECH DISCRIMINATION BASED ON SPEECH-RELEVANT SPECTROGRAM MODULATIONS Michael
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationTHE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING
THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING Ryan Stables [1], Dr. Jamie Bullock [2], Dr. Cham Athwal [3] [1] Institute of Digital Experience, Birmingham City University,
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationAn Improved Voice Activity Detection Based on Deep Belief Networks
e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationStudents: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationDistributed Speech Recognition Standardization Activity
Distributed Speech Recognition Standardization Activity Alex Sorin, Ron Hoory, Dan Chazan Telecom and Media Systems Group June 30, 2003 IBM Research Lab in Haifa Advanced Speech Enabled Services ASR App
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More informationSpeech/Music Discrimination via Energy Density Analysis
Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS
ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS Hui Su, Ravi Garg, Adi Hajj-Ahmad, and Min Wu {hsu, ravig, adiha, minwu}@umd.edu University of Maryland, College Park ABSTRACT Electric Network (ENF) based forensic
More informationSPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING
SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING K.Ramalakshmi Assistant Professor, Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore R.N.Devendra Kumar Assistant
More informationCombining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music
Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music Tuomas Virtanen, Annamaria Mesaros, Matti Ryynänen Department of Signal Processing,
More informationON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1
ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El
More informationCO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM
CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,
More informationIndoor Location Detection
Indoor Location Detection Arezou Pourmir Abstract: This project is a classification problem and tries to distinguish some specific places from each other. We use the acoustic waves sent from the speaker
More informationThe Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals
The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,
More informationEpoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE
1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract
More informationAudio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23
Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationCLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM
CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM Nuri F. Ince 1, Fikri Goksu 1, Ahmed H. Tewfik 1, Ibrahim Onaran 2, A. Enis Cetin 2, Tom
More informationAudio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands
Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationAdaptive noise level estimation
Adaptive noise level estimation Chunghsin Yeh, Axel Roebel To cite this version: Chunghsin Yeh, Axel Roebel. Adaptive noise level estimation. Workshop on Computer Music and Audio Technology (WOCMAT 6),
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationCHAPTER 8 MIMO. Xijun Wang
CHAPTER 8 MIMO Xijun Wang WEEKLY READING 1. Goldsmith, Wireless Communications, Chapters 10 2. Tse, Fundamentals of Wireless Communication, Chapter 7-10 2 MIMO 3 BENEFITS OF MIMO n Array gain The increase
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationNon-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationDiscriminative methods for the detection of voice disorders 1
ISCA Archive http://www.isca-speech.org/archive ITRW on Nonlinear Speech Processing (NOLISP 05) Barcelona, Spain April 19-22, 2005 Discriminative methods for the detection of voice disorders 1 Juan Ignacio
More informationVoice Activity Detection for Speech Enhancement Applications
Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationBlind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model
Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial
More informationEfficient Target Detection from Hyperspectral Images Based On Removal of Signal Independent and Signal Dependent Noise
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 6, Ver. III (Nov - Dec. 2014), PP 45-49 Efficient Target Detection from Hyperspectral
More informationPerturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi
Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Abstract Voices from patients with voice disordered tend to be less periodic and contain larger perturbations.
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationSeparating Voiced Segments from Music File using MFCC, ZCR and GMM
Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationLong Range Acoustic Classification
Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire
More informationCan binary masks improve intelligibility?
Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationFAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA
FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA Enayet B. Halim M. A. A. Shoukat Choudhury Sirish L. Shah, Ming J. Zuo Chemical and Materials Engineering Department, University
More informationSUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle
SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationVIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering
VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationSound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska
Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure
More informationMUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting
MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)
More informationDiscriminative Training for Automatic Speech Recognition
Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,
More informationDetecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems
Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Jesús Villalba and Eduardo Lleida Communications Technology Group (GTC), Aragon Institute for Engineering Research (I3A),
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationTE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION
TE 302 DISCRETE SIGNALS AND SYSTEMS Study on the behavior and processing of information bearing functions as they are currently used in human communication and the systems involved. Chapter 1: INTRODUCTION
More informationSpectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex
Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex Shihab Shamma Jonathan Simon* Didier Depireux David Klein Institute for Systems Research & Department of Electrical Engineering
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationRobust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping
100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru
More informationDetermination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain
Determination o Pitch Range Based on Onset and Oset Analysis in Modulation Frequency Domain A. Mahmoodzadeh Speech Proc. Research Lab ECE Dept. Yazd University Yazd, Iran H. R. Abutalebi Speech Proc. Research
More informationA Comparative Study of Formant Frequencies Estimation Techniques
A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax
More informationDetection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA
Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA Muhammad WAQAS, Shouhei KIDERA, and Tetsuo KIRIMOTO Graduate School of Electro-Communications, University of Electro-Communications
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More information