Automatic Speech Recognition handout (1)

Size: px
Start display at page:

Download "Automatic Speech Recognition handout (1)"

Transcription

1 Automatic Speech Recognition handout (1) Jan - Mar 2012 Revision : 1.1 Speech Signal Processing and Feature Extraction Hiroshi Shimodaira (h.shimodaira@ed.ac.uk)

2 Speech Communication Intention Language Motion Control Articulate organ (vocal tract) Understanding Language Auditory processing Auditory organs Signal source (vocal cords) speech sound Speaker Listener ASR (H. Shimodaira) I : 1

3 Spectrogram Waveform 8.0 Spectrogram Cross-section of spectrogram Frequency [khz] Time [s] 80 Intensity (db) Frequency (khz) ASR (H. Shimodaira) I : 2

4 Speech Production Model F1 F2 nasal cavity X( Ω) F3 ( 6dB/oct.) Nasal Cavity lips teeth oral cavity tongue pharynx H( Ω) lips x(t) +6dB/oct. F1 F2 Mouth Cavity (formants) F3 Larynx + Pharynx lungs larynx vocal folds Vocal Organs & Vocal Tract V( Ω) F0 =1/T Time domain: x(t) = h(t) v(t) = Fourier transform 12dB/oct. frequency 0 0 Ω vocal folds T 0 v(t) h(τ)v(t τ)dτ Frequency domain: X(Ω) = H(Ω)V (Ω) Ω : angular frequency (= 2πF ) F : frequency t ASR (H. Shimodaira) I : 3

5 Automatic Speech Recognition Find the word sequence W such that max W P (W X) = max W P (X W )P (W ) P (X) ASR (H. Shimodaira) I : 4

6 Signal Analysis for ASR Front-end analysis Convert acoustic signal into a sequence of feature vectors e.g. MFCCs, PLP cepstral coefficients x (t) c LPF (low pass filter) A/D conversion Sampling frequency F s Pre emphasis x[n] Spectral analysis Feature extraction Analysis window Frame shift c [k] m m: frame number k: feature index ASR (H. Shimodaira) I : 5

7 Feature parameters for ASR Features should contain sufficient information to distinguish phonemes / phones good time-resolutions [e.g. 10ms] good frequency-resolutions [e.g. 20 channels/bark-scale] not contain (or be separated from) F 0 and its harmonics be robust against speaker variation be robust against noise / channel distortions have good characteristics in terms of pattern recognition The number of features is as few as possible Features are independent of each other ASR (H. Shimodaira) I : 6

8 Converting analogue signals to machine readable form Discretisation (sampling) x c (t) x[n] continuous time discrete time continuous amplitude discrete amplitude Problem: information can be lost by sampling ASR (H. Shimodaira) I : 7

9 Sampling of continuous-time signals Continuous-time signal: x c (t) Modulated signal by a periodic impulse train: x s (t) = x c (t) δ(t nt s ) = n= n= x c (nt s )δ(t nt s ) Sampled signal: x[n] = x s (nt s ) discrete-time signal T s : Sampling interval ASR (H. Shimodaira) I : 8

10 Sampling of continuous-time signals(cont. 2) Q: Is the C/D conversion invertible? x c (t) C/D x[n] D/C x c (t)? ASR (H. Shimodaira) I : 9

11 Sampling of continuous-time signals(cont. 3) Q: Is the C/D conversion invertible? x c (t) C/D x[n] D/C x c (t)? A: No in general, but Yes under a special condition: Nyquist sampling theorem If x c (t) is band-limited (i.e. no frequency components > F s /2), then x c (t) can be fully reconstructed by x[n]. x c (t) = h Ts (t) k= h Ts (t) = sinc(t/t s ) = sin(πt/t s) πt/t s F s /2 : Nyquist Frequency, x[k]δ(t kt s ) = k= x[k]h Ts (t kt s ) F s = 1/T s : Sampling Frequency ASR (H. Shimodaira) I : 10

12 Sampling of continuous-time signals(cont. 4) Interpretation in frequency domain: X s (Ω) }{{} = Spectrum of x s (t) 1 T s k= Xc(Ω kω s ) }{{} Spectrum of x c (t) ASR (H. Shimodaira) I : 11

13 Sampling of continuous-time signals(cont. 5) x (t) c (low pass filter) Questions LPF A/D conversion Sampling frequency F s Pre emphasis x[n] Spectral analysis Feature extraction Analysis window Frame shift c [k] m m: frame number k: feature index 1. What sampling frequencies (F s ) are used for ASR? microphone voice: 12kHz 20kHz telephone voice: 8kHz 2. What are the advantages / disadvantages of using higher F s? 3. Why is pre-emphasis (+6dB/oct.) employed? x[n] = x 0 [n] ax 0 [n 1], a = ASR (H. Shimodaira) I : 12

14 Spectral analysis: Fourier Transform FT for continuous-time signals (& continuous-frequency) X c (Ω) = x c (t)e jωt dt x c (t) = 1 2π X c (Ω)e jωt dω (time domain freq. domain) (freq. domain time domain) FT for discrete-time signals (& continuous-frequency) X(e jω ) = x[n] = 1 2π n= π π x[n]e jωn X(e jω )e jωn dω X(e jω ) 2 Power spectrum log X(e jω ) 2 Log power spectrum where ω = T s Ω = 2πf, e jωn = cos(ωn) + j sin(ωn), j : the imaginary unit ASR (H. Shimodaira) I : 13

15 An interpretation of FT Inner product between two vectors (Linear Algebra) 2-dimensional case a = (a 1, a 2 ) t a b = (b 1, b 2 ) t a b = a t b = a 1 b 1 + a 2 b 2 = a b cos θ b b = Infinite-dimensional case θ a cos θ x {x[n]} e ω { e jωn} = {cos(ωn) + j sin(ωn)} cos ω + jsin ω X(e jω ) = n= if 1 x[n]e jωn = x e jωn = x cos ω + jx sin ω x cos ω : proportion of how much cos ω component is contained in x ASR (H. Shimodaira) I : 14

16 Short-time Spectrum Analysis Problem with FT Assuming signals are stationary: signal properties do not change over time If signals are non-stationary loses information on time varying features Short-time Fourier transform (STFT) (Time-dependent Fourier transform) Divide the signal x[n] into short-time segments (frames) x k [m] and apply FT to each segment. x[n] x 1 [m], x 2 [m],..., x k [m],... X(ω) X 1 (ω), X 2 (ω),..., X k (ω),... ASR (H. Shimodaira) I : 15

17 Short-time Spectrum Analysis(cont. 2) windowing shift frame 70 Intensity Discrete Fourier Transform Time (frame) 60 Frequency Short time power spectrum Frequency 0 10 ASR (H. Shimodaira) I : 16

18 Short-time Spectrum Analysis(cont. 3) Trade-off problem of short time spectrum analysis frequency resolution time resolution a compromise for ASR: window width short long window width (frame width): ms window shift (frame shift): 5 15 ms ASR (H. Shimodaira) I : 17

19 The Effect of Windowing in STFT Time domain: y k [n] = w k [n]x[n], w k [n] : time-window for k-th frame Simply cutting out a short segment (frame) from x[n] implies applying a rectangular window on to x[n]. causes discontinuities at the edges of the segment. Instead, a tapered window is usually used.. e.g. Hamming (α = ) or Hanning (α = 0.5) window) ( ) 2πl w[l] = (1 α) α cos N : window width N 1 1 rectangle 1 hammin 1 hannin 1 blackman 1 bartlett rectangle Hamming Hanning Blackman Bartlett ASR (H. Shimodaira) I : 18

20 The Effect of Windowing in STFT(cont. 2) Frequency domain: Y k (e jω ) = 1 2π π π W k (e jθ )X(e j(ω θ) )dθ Periodic convolution Power spectrum of the frame is given as a periodic convolution between the power spectra of x[n] and w k [n]. If we want Y k (e jω ) = X(e jω ), the necessary and sufficient condition for this is W k (e jω ) = δ(ω), i.e. w k [n] = F 1 δ(ω) = 1, which means the length of w k [n] is infinite. there is no window function of finite length that causes no distortion. NB: hereafter x[n] will be also used to denote a segmented signal for simplicity. ASR (H. Shimodaira) I : 19

21 The Effect of Windowing in STFT(cont. 3) Spectral analysis of two sine signals of close frequencies ASR (H. Shimodaira) I : 20

22 Problems with STFT The estimated power spectrum contains harmonics of F 0, which makes it difficult to estimate the envelope of the spectrum. Frequency bins of STFT are highly correlated each other, i.e. power spectrum representation is highly redundant Log X(w) ASR (H. Shimodaira) I : 21

23 Cepstrum Analysis Idea: split(deconvolve) the power spectrum into spectrum envelope and F 0 harmonics Log X(w) Cepstrum Log-spectrum [freq. domain] Inverse Fourier Transform Cepstrum [time domain] (quefrency) Liftering to get low/high part (lifter: filter used in cepstral domain) Fourier Transform Envelope (Lag=30) Residue Smoothed-spectrum [freq. domain] (low-part of cepstrum) Log-spectrum of high-part of cepstrum ASR (H. Shimodaira) I : 22

24 Cepstrum Analysis(cont. 2) Log spectrum h[n] : vocal tract x[n] = h[n] v[n] v[n] : glottal sounds F X(e jω ) = H(e jω )V (e jω ) log (Fourier transform) log X(e jω ) = log H(e jω ) }{{} + log V (ejω ) }{{} Cepstrum (spectral envelope) F 1 c(τ) = F 1 { log X(e jω ) } (spectral fine structure) = F 1 { log H(e jω ) } + F 1 { log V (e jω ) } ASR (H. Shimodaira) I : 23

25 LPC Analysis Linear Predictive Coding (LPC): a model-based / parametric spectrum estimation Assume a linear system for human speech production sound source v[n] vocal tract speech x[n] v[n] h[n] x[n] h[n] : impulse response x[n] = h[n] v[n] = k=0 h[k] v[n k] Using a model enables us to estimate a spectrum of vocal tract from small amount of observations represent the spectrum with a small number of parameters synthesise speech with the parameters ASR (H. Shimodaira) I : 24

26 LPC Analysis(cont. 2) Predict x[n] from x[n 1], x[n 2], ˆx[n] = N k=1 a k x[n k] e[n] = x[n] ˆx[n] = x[n] N k=1 a k x[n k] prediction error Optimisation problem Find {a k } that minimises the mean square (MS) error: P e = E { e 2 [n] } ( ) 2 N = E x[n] a k x[n k] k=1 {a k } : LPC coefficients ASR (H. Shimodaira) I : 25

27 Spectrums estimated by FT & LPC ASR (H. Shimodaira) I : 26

28 LPC summary Spectrum can be modelled/coded with around 14LP Cs. LPC family PARCOR (Partial Auto-Correlation Coefficient) LSP (Line Spectral Pairs) / LSF (Line Spectrum Frequencies) CSM (Composite Sinusoidal Model) LPC can be used to predict log-area ratio coefficients lossless tube model LPC-(Mel)Cepstrum: LPC based cepstrum. Drawback: LPC assumes AR model which does not suit to model nasal sounds that have zeros in spectrum. Difficult to determine the prediction order N. ASR (H. Shimodaira) I : 27

29 Taking into Perceptual Attributes Physical quality Intensity Fundamental frequency Spectral shape Onset/offset time Phase difference in binaural hearing Technical terms equal-loudness contours masking auditory filters (critical-band filters) critical bandwidth Perceptual quality Loudness Pitch Timbre Timing Location ASR (H. Shimodaira) I : 28

30 Taking into Perceptual Attributes(cont. 2) ASR (H. Shimodaira) I : 29

31 Taking into Perceptual Attributes(cont. 3) Non-linear frequency scale Bark scale b(f) = 13 arctan( f) arctan((f/7500) 2 ) [Bark] Mel scale B(f) = 1127 ln(1 + f/700) Bark frequency [Bark] linear frequency [Hz] warped normalized frequency linear frequency [Hz] ln Bark Mel ASR (H. Shimodaira) I : 30

32 Filter Bank Analysis Speech x[n] Bandpass Filter 1 Bandpass Filter K x [n] 1 x [n] K ω ω ω K ω x i [n] = h i [n] x[n] = M i 1 k=0 h i [k]x[n k] h i [n]: Impulse response of Bandpass filter i ω perceptual scale ASR (H. Shimodaira) I : 31

33 Filter Bank Analysis(cont. 2) Speech Bandpass Filter 1 x [n] 1 Nonlinearity v [n] 1 Lowpass Filter y [n] 1 Down Sampling x[n] Bandpass Filter K x [n] K Nonlinearity v [n] K Lowpass Filter y [n] K Down Sampling v 0 ω x 0 ω Trade-off problem Freq. resolution # of filters length of filter Time resolution ASR (H. Shimodaira) I : 32

34 Filter Bank Analysis(cont. 3) Another implementation of filter banks: apply a mel-scale filter bank to STFT power spectrum to obtain mel-scale power spectrum DFT(STFT) power spectrum Triangular band pass filters Frequency bins Mel scale power spectrum ASR (H. Shimodaira) I : 33

35 MFCC MFCC: Mel-frequency Cepstral Coefficients c[n] x[n] DFT X[k] X[k] 2 DCT: c[n] = 2 N N i=1 Mel-frequency filterbank ( ) πn(i 0.5) s[i] cos N log S[m] DCT c[n], where s[i] = log S[i] DFT: discrete Fourier transform, DCT: discrete cosine transform MFCCs are widely used in HMM-based ASR systems. The first 12 MFCCs (c[1] c[12]) are generally used. ASR (H. Shimodaira) I : 34

36 MFCC(cont. 2) MFCCs are less correlated each other than DCT/Filter-bank based spectrum. Good compression rate. Feature dimensionality / frame Speech wave 400 DCT Spectrum Filter-bank MFCC 12 where F s = 16kHz, frame-width = 25ms, frame-shift = 10ms are assumed. MFCCs show better ASR performance than filter-bank features, but MFCCs are not robust against noises. ASR (H. Shimodaira) I : 35

37 Perceptually-based Linear Prediction (PLP) [Hermansky, 1985,1990] PLP had been shown experimentally to be more noise robust more speaker independent than MFCCs ASR (H. Shimodaira) I : 36

38 Other features with low dimensionality Formants (F 1, F 2, F 3, ) They are not used in modern ASR systems, but why? ASR (H. Shimodaira) I : 37

39 Using temporal features: dynamic features In SP lab-sessions on speech recognition using HTK, MFCCs, and energy MFCCs, energy 2 MFCCs, 2 energy, 2 : delta features (dynamic features / time derivatives) [Furui, 1986] continuous time discrete time c(t) c[n] c (t) = dc(t) M c[n] w i c[n + i] e.g. c[n] = dt i= M c (t) = d2 c(t) 2 M c[n] w dt 2 i c[n + i] i= M c[n + 1] c[n 1] 2 ASR (H. Shimodaira) I : 38

40 Using temporal features: dynamic features(cont. 2) c(t) c (t ) 0 t 0 time ASR (H. Shimodaira) I : 39

41 Using temporal features: dynamic features(cont. 3) An acoustic feature vector, eg MFCCs, representing part of a speech signal is highly correlated with its neighbours. HMM based acoustic models assume there is no dependency between the observations. Those correlations can be captured to some extent by augmenting the original set of static acoustic features, eg. MFCCs, with dynamic features. ASR (H. Shimodaira) I : 40

42 General Feature Transformation Orthogonal transformation (orthogonal bases) DCT (discrete cosine transform) PCA (principal component analysis) Transformation based on the bases that maximises the separability between classes. LDA (linear discriminant analysis) / Fisher s linear discrminant HLDA (heteroscedastic linear discriminant analysis) ASR (H. Shimodaira) I : 41

43 A comparison of speech features I. Mporas, et al., Comparison of Speech Features on the Speech Recognition Task, Journal of Computer Science, Vol.3, pp , NB SBC WPSR OWPF WPSR LFCC-FB HFCC-FB Feature WER(%) SER(%) SBC (16) WPSR125 (16) OWPF (16) LFCC-FB HFCC-FB HFCC-FB PLP-FB MFCC-FB Subband-based Cepstral Coefficients Wavelet packet features Overlapping wavelet packet features Wavelet packet-based speech features Linear-spaced filter-bank based cepstral coefficients Human factor cepstral coefficients The above result was obtained for TIMIT speech corpus. Results might change a lot under different conditions (e.g. noise, tasks, ASR systems) ASR (H. Shimodaira) I : 42

44 Further topics on feature extraction Feature normalisation/enhancement in terms of noise / environments speakers / speaking styles speech recognition Pitch (F 0 ) adapted feature extraction ASR (H. Shimodaira) I : 43

45 SUMMARY Nyquist Sampling theory Short-time Spectrum Analysis Non-parametric method Short-time Fourier Transform Cepstrum, MFCC Filter bank Parametric methods LPC, PLP Windowing effect: trade-off between time and frequency resolutions Dynamic features (delta features) There is no best feature that can be used for any purposes, but MFCC is widely used for ASR and TTS. ASR (H. Shimodaira) I : 44

46 SUMMARY(cont. 2) Front-end analysis has a great influence on ASR performance. For robust ASR in real environments, various techniques for front-end processing have been proposed. e.g. spectral subtraction (SS), cepstral mean normalisation (CMN) Spectrum analysis and feature extraction involve information loss and non-linear distortions. There is always a tradeoff between accuracy and efficiency. (e.g. spatial resolution vs. temporal resolution) ASR (H. Shimodaira) I : 45

47 References John N. Holmes, Wendy J. Holmes, Speech Synthesis and Recognition, Taylor and Francis (2001), 2nd edition (chapter 2, 4, 10) ajr/speechanalysis/ B. Gold, N. Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley and Sons (1999). Spoken language processing: a guide to theory, algorithm, and system development, Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Prentice Hall (2001). isbn: ASR (H. Shimodaira) I : 46

48 References(cont. 2) Robusness in Automatic Speech Recognition, J-C Junqua and J-P Hanton,, Kluwer Academic Publications (1996). isbn: A Comparative Study of Traditional and Newly Proposed Features for Recognition of Speech Under Stress, Sahar Bou-Ghazale and John H.L. Hansen, IEEE Trans SAP, vol. 8, no. 4, pp , July ASR (H. Shimodaira) I : 47

Speech Production. Automatic Speech Recognition handout (1) Jan - Mar 2009 Revision : 1.1. Speech Communication. Spectrogram. Waveform.

Speech Production. Automatic Speech Recognition handout (1) Jan - Mar 2009 Revision : 1.1. Speech Communication. Spectrogram. Waveform. Speech Production Automatic Speech Recognition handout () Jan - Mar 29 Revision :. Speech Signal Processing and Feature Extraction lips teeth nasal cavity oral cavity tongue lang S( Ω) pharynx larynx vocal

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio Topic Spectrogram Chromagram Cesptrogram Short time Fourier Transform Break signal into windows Calculate DFT of each window The Spectrogram spectrogram(y,1024,512,1024,fs,'yaxis'); A series of short term

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1

More information

SAMPLING THEORY. Representing continuous signals with discrete numbers

SAMPLING THEORY. Representing continuous signals with discrete numbers SAMPLING THEORY Representing continuous signals with discrete numbers Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University ICM Week 3 Copyright 2002-2013 by Roger

More information

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois.

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois. UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Audio DSP basics Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Basics of digital audio Signal representations

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing Lecture 9 Discrete-Time Processing of Continuous-Time Signals Alp Ertürk alp.erturk@kocaeli.edu.tr Analog to Digital Conversion Most real life signals are analog signals These

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Lecture 6: Speech modeling and synthesis

Lecture 6: Speech modeling and synthesis EE E682: Speech & Audio Processing & Recognition Lecture 6: Speech modeling and synthesis 1 2 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music) Topic 2 Signal Processing Review (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music) Recording Sound Mechanical Vibration Pressure Waves Motion->Voltage Transducer

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Lecture 5: Speech modeling. The speech signal

Lecture 5: Speech modeling. The speech signal EE E68: Speech & Audio Processing & Recognition Lecture 5: Speech modeling 1 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models Speech synthesis

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Chapter 7. Frequency-Domain Representations 语音信号的频域表征 Chapter 7 Frequency-Domain Representations 语音信号的频域表征 1 General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z) 2 DTFT and DFT of Speech The

More information

CS3291: Digital Signal Processing

CS3291: Digital Signal Processing CS39 Exam Jan 005 //08 /BMGC University of Manchester Department of Computer Science First Semester Year 3 Examination Paper CS39: Digital Signal Processing Date of Examination: January 005 Answer THREE

More information

T Automatic Speech Recognition: From Theory to Practice

T Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing EE123 Digital Signal Processing Lecture 5A Time-Frequency Tiling Subtleties in filtering/processing with DFT x[n] H(e j! ) y[n] System is implemented by overlap-and-save Filtering using DFT H[k] π 2π Subtleties

More information

Robust Algorithms For Speech Reconstruction On Mobile Devices

Robust Algorithms For Speech Reconstruction On Mobile Devices Robust Algorithms For Speech Reconstruction On Mobile Devices XU SHAO A Thesis presented for the degree of Doctor of Philosophy Speech Group School of Computing Sciences University of East Anglia England

More information

Synthesis Techniques. Juan P Bello

Synthesis Techniques. Juan P Bello Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

8.3 Basic Parameters for Audio

8.3 Basic Parameters for Audio 8.3 Basic Parameters for Audio Analysis Physical audio signal: simple one-dimensional amplitude = loudness frequency = pitch Psycho-acoustic features: complex A real-life tone arises from a complex superposition

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Analysis/synthesis coding

Analysis/synthesis coding TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT Filter Banks I Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany 1 Structure of perceptual Audio Coders Encoder Decoder 2 Filter Banks essential element of most

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Audio processing methods on marine mammal vocalizations

Audio processing methods on marine mammal vocalizations Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio http://labrosa.ee.columbia.edu Sound to Signal sound is pressure

More information

FFT analysis in practice

FFT analysis in practice FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular

More information

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 3: Feature Computation 30 Jan 2013 1 First Step: Feature Extraction Speech recognition is a type of pattern recognition problem

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015 University of Colorado at Boulder ECEN 4/5532 Lab 1 Lab report due on February 2, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1

More information

Outline. Introduction to Biosignal Processing. Overview of Signals. Measurement Systems. -Filtering -Acquisition Systems (Quantisation and Sampling)

Outline. Introduction to Biosignal Processing. Overview of Signals. Measurement Systems. -Filtering -Acquisition Systems (Quantisation and Sampling) Outline Overview of Signals Measurement Systems -Filtering -Acquisition Systems (Quantisation and Sampling) Digital Filtering Design Frequency Domain Characterisations - Fourier Analysis - Power Spectral

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003 CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D

More information

Discrete Fourier Transform (DFT)

Discrete Fourier Transform (DFT) Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 ECE 556 BASICS OF DIGITAL SPEECH PROCESSING Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 Analog Sound to Digital Sound Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Lecture Schedule: Week Date Lecture Title

Lecture Schedule: Week Date Lecture Title http://elec3004.org Sampling & More 2014 School of Information Technology and Electrical Engineering at The University of Queensland Lecture Schedule: Week Date Lecture Title 1 2-Mar Introduction 3-Mar

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Robust telephone speech recognition based on channel compensation

Robust telephone speech recognition based on channel compensation Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT) 5//0 EE6B: VLSI Signal Processing Wavelets Prof. Dejan Marković ee6b@gmail.com Shortcomings of the Fourier Transform (FT) FT gives information about the spectral content of the signal but loses all time

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy Signal Analysis Using Autoregressive Models of Amplitude Modulation Sriram Ganapathy Advisor - Hynek Hermansky Johns Hopkins University 11-18-2011 Overview Introduction AR Model of Hilbert Envelopes FDLP

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar Biomedical Signals Signals and Images in Medicine Dr Nabeel Anwar Noise Removal: Time Domain Techniques 1. Synchronized Averaging (covered in lecture 1) 2. Moving Average Filters (today s topic) 3. Derivative

More information

Signals and Systems Lecture 6: Fourier Applications

Signals and Systems Lecture 6: Fourier Applications Signals and Systems Lecture 6: Fourier Applications Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2012 arzaneh Abdollahi Signal and Systems Lecture 6

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Signals and Systems Lecture 6: Fourier Applications

Signals and Systems Lecture 6: Fourier Applications Signals and Systems Lecture 6: Fourier Applications Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2012 arzaneh Abdollahi Signal and Systems Lecture 6

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information