Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress!

Size: px
Start display at page:

Download "Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress!"

Transcription

1 Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress! Richard Stern (with Chanwoo Kim, Yu-Hsiang Chiu, and others) Department of Electrical and Computer Engineering and Language Technologies Institute Carnegie Mellon University Pittsburgh, Pennsylvania Telephone: ; FAX: Frederick Jelinek Memorial Workshop on Meaning Representations in Language and Speech Processing Prague, Czech Republic July 16, 2014

2 Introduction auditory processing and automatic speech recognition n I was originally trained in auditory perception, and my original work was in binaural hearing n Over the past years, I have been spending the bulk of my time trying to improve the accuracy of automatic speech recognition systems in difficult acoustical environments n In this talk I would like to discuss some of the ways in my group (and many others) have been attempting to apply knowledge of auditory perception to improve ASR accuracy Comment: approaches can be more or less faithful to physiology and psychophysics Slide 2

3 The big questions. n How can knowledge of auditory physiology and perception improve speech recognition accuracy? n Can speech recognition results tell us anything we don t already know about auditory processing? n What aspects of the processing are most valuable for robust feature extraction? Slide 3

4 Two historical notes n Everything is changing with deep learning Is there a role for traditional robust speech technologies? n Knowledge-based versus statistically-based processing Slide 4

5 So what I will do is. n Briefly review some of the major physiological and psychophysical results that motivate the models n Briefly review and discuss the major classical auditory models of the 1980s Seneff, Lyon, and Ghitza n Review some of the major new trends in today s models n Talk about some representative issues that have driven work as of late at CMU and what we have learned from them Slide 5

6 Speech recognition as pattern classification Speech features Utterance hypotheses Feature extraction Decision making! procedure n Major functional components: Signal processing to extract features from speech waveforms Comparison of features to pre-stored representations n Important design choices: Choice of features Specific method of comparing features to stored representations Slide 6

7 Default signal processing: Mel frequency cepstral coefficients (MFCCs) input speech Multiply by Brief Window Fourier Transform Magnitude Triangular Weighting Log Inverse Fourier Transform MFCCs Comment: 20-ms time slices are modeled by smoothed spectra, with attention paid to auditory frequency selectivity Slide 7

8 What the speech recognizer sees. An original spectrogram: Spectrum recovered from MFCC: Slide 8

9 Comments on the MFCC representation n It s very blurry compared to a wideband spectrogram! n Aspects of auditory processing represented: Frequency selectivity and spectral bandwidth (but using a constant analysis window duration!)» Wavelet schemes exploit time-frequency resolution better Nonlinear amplitude response n Aspects of auditory processing NOT represented: Detailed timing structure Lateral suppression Enhancement of temporal contrast Other auditory nonlinearities Slide 9

10 Basic auditory anatomy n Structures involved in auditory processing: Slide 10

11 Excitation along the basilar membrane (courtesy James Hudspeth, HHMI) Slide 11

12 Central auditory pathways n There is a lot going on! n For the most part, we only consider the response of the auditory nerve It is in series with everything else Slide 12

13 Transient response of auditory-nerve fibers n Histograms of response to tone bursts (Kiang et al., 1965): Comment: Onsets and offsets produce overshoot Slide 13

14 Frequency response of auditory-nerve fibers: tuning curves n Threshold level for auditory-nerve response to tones: n Note dependence of bandwidth on center frequency and asymmetry of response Slide 14

15 Typical response of auditory-nerve fibers as a function of stimulus level n Typical response of auditory-nerve fibers to tones as a function of intensity: n Comment: Saturation and limited dynamic range Slide 15

16 Synchronized auditory-nerve response to low-frequency tones n Comment: response remains synchronized over a wide range of intensities Slide 16

17 Comments on synchronized auditory response n Nerve fibers synchronize to fine structure at low frequencies, signal envelopes at high frequencies n Synchrony clearly important for auditory localization n Synchrony could be important for monaural processing of complex signals as well Slide 17

18 Lateral suppression in auditory processing n Auditory-nerve response to pairs of tones: n Comment: Lateral suppression enhances local contrast in frequency Slide 18

19 Auditory frequency selectivity: critical bands n Measurements of psychophysical filter bandwidth by various methods: n Comments: Bandwidth increases with center frequency Solid curve is Equivalent Rectangular Bandwidth (ERB) Slide 19

20 Three perceptual auditory frequency scales Bark scale: (DE) Bark( f ) =.01 f, 0 f < f +1.5, 500 f < ln( f ) 32.6, 1220 f Mel scale: (USA) Mel( f ) = 2595 log 10 (1 + f 700 ) ERB scale: (UK) ERB( f ) = 24.7(4.37 f +1) Slide 20

21 Comparison of normalized perceptual frequency scales n Bark scale (in blue), Mel scale (in red), and ERB scale (in green): 100 Relative perceptual scale Bark Mel ERB Frequency, Hz Slide 21

22 Perceptual masking of adjacent spectrotemporal components n Spectral masking: Intense signals at a given frequency mask adjacent frequencies (asymmetrically) n Temporal masking: Intense signals at a given frequency can mask successive input at that frequency (and to some extent before the masker occurs!) n These phenomena are an important part of the auditory models used in perceptual audio coding (such as in creating MP3 files) Slide 22

23 The loudness of sounds n Equal loudness contours (Fletcher-Munson curves): Slide 23

24 Summary of basic auditory physiology and perception n Major monaural physiological attributes: Frequency analysis in parallel channels Preservation of temporal fine structure Limited dynamic range in individual channels Enhancement of temporal contrast (at onsets and offsets) Enhancement of spectral contrast (at adjacent frequencies) n Most major physiological attributes have psychophysical correlates n Most physiological and psychophysical effects are not preserved in conventional representations for speech recognition Slide 24

25 Auditory models in the 1980s: the Seneff model n Overall model: Envelope Detector Mean-Rate Spectrum An early well-known auditory model Critical-Band Filter Bank Stage I Hair-Cell Model Stage II n Detail of Stage II: Synchrony Detector Stage III Synchrony Spectrum In addition to mean rate, used Generalized Synchrony Detector to extract synchrony Saturating Half-Wave Rectifier Short-term AGC Lowpass Filter Rapid AGC Slide 25

26 Auditory models in the 1980s: Ghitza s EIH model COCHLEAR FILTER-1 COCHLEAR FILTER-i LEVEL CROSSINGS L-1 INTERVAL HISTOGRAMS IH-1 L-7 IH-7 Estimated timing information from ensembles of zero crossings with different thresholds COCHLEAR FILTER-85 Slide 26

27 Auditory models in the 1980s: Lyon s auditory model n Single stage of the Lyon auditory model: HALF-WAVE RECTIFIER 1-kHZ LOWPASS GAIN A GAIN B GAIN C Target LIMIT H-B + H-C Lyon model included nonlinear compression, lateral suppression, temporal effects Also added correlograms (autocorrelation and crosscorrelation of model outputs) Slide 27

28 And one more Cohen s model (1989) 512-Point FFT Loudness normalization and transient enhancement novel for the time CRITICAL-BAND FILTERS LOUDNESS NORMALIZATION POWER-LAW COMPRESSION SHORT-TERM ADAPTATION Used successfully as part of many IBM systems Slide 28

29 The other standard approach: Perceptual Linear Prediction (PLP, Hermansky 90) n Comments: A pragmatic approach to auditory modeling Pre-emphasis, loudness normalization based on threshold of hearing RASTA enhancement provides cepstral normalization and modulation filtering Widely used with success today Slide 30

30 Auditory modeling was expensive: Computational complexity of Seneff model n Number of multiplications per ms of speech (from Ohshima and Stern, 1994): 25,000 20,000 15,000 10, Seneff Model Stage III Stage II Stage I LPC Slide 31

31 Summary: early auditory models n The models developed in the 1980s included: Realistic auditory filtering Realistic auditory nonlnearity Synchrony extraction Lateral suppression Higher order processing through auto-correlation and cross-correlation n Every system developer had his or her own idea of what was important Slide 32

32 Evaluation of early auditory models (Ohshima and Stern, 1994) n Not much quantitative evaluation actually performed n General trends of results: Physiological processing did not help much (if at all) for clean speech More substantial improvements observed for degraded input Benefits generally do not exceed what could be achieved with more prosaic approaches (e.g. CDCN/VTS in our case). Slide 33

33 Other reasons why work on auditory models subsided in the late 1980s n Failure to obtain a good statistical match between characteristics of features and speech recognition system Ameliorated by subsequent development of continuous HMMs n More pressing need to solve other basic speech recognition problems Slide 34

34 Renaissance in the 1990s! By the late 1990s, physiologically-motivated and perceptuallymotivated approaches to signal processing began to flourish Some major new trends. n Computation no longer such a limiting factor n Serious attention to temporal evolution n Attention to reverberation n Binaural processing n More effective and mature approaches to information fusion Slide 35

35 Peripheral auditory modeling at CMU 2004 now n Foci of activities: Representing synchrony The shape of the rate-intensity function Revisiting analysis duration Revisiting frequency resolution Onset enhancement Modulation filtering Binaural and polyaural techniques Auditory scene analysis: common frequency modulation Slide 36

36 Speech representation using mean rate n Representation of vowels by Young and Sachs using mean rate: n Mean rate representation does not preserve spectral information Slide 37

37 Speech representation using average localized synchrony rate n Representation of vowels by Young and Sachs using ALSR: Slide 38

38 Physiologically-motivated signal processing: the Zhang-Carney model of the periphery n We used the synapse output as the basis for further processing Slide 40

39 Physiologically-motivated signal processing: synchrony and mean-rate detection (Kim/Chiu 06) n Synchrony response is smeared across frequency to remove pitch effects n Higher frequencies represented by mean rate of firing n Synchrony and mean rate combined additively n Much more processing than MFCCs Slide 41

40 Comparing auditory processing with cepstral analysis: clean speech Original spectrogram MFCC reconstruction Auditory analysis Slide 42

41 Comparing auditory processing with cepstral analysis: 20-dB SNR Slide 43

42 Comparing auditory processing with cepstral analysis: 10-dB SNR Slide 44

43 Comparing auditory processing with cepstral analysis: 0-dB SNR Slide 45

44 Auditory processing is more effective than MFCCs at low SNRs, especially in white noise Accuracy in background noise: Accuracy in background music: n Curves are shifted by db (greater improvement than obtained with VTS or CDCN) [Results from Kim et al., Interspeech 2006] Slide 46

45 Do auditory models really need to be so complex? n Model of Zhang et al. 2001: A much simpler model: P(t) Gammatone Filters Nonlinear Rectifiers Lowpass Filters s(t) Slide 47

46 Comparing simple and complex auditory models n Comparing MFCC processing, a trivial (filter rectify compress) auditory model, and the full Carney-Zhang model (Chiu 2006): Complex Auditory Simple Auditory MFCC 100 WER (%) SNR (db) Slide 48

47 The nonlinearity seems to be the most important attribute of the Seneff model (Chiu 08) Envelope Detector Mean-Rate Spectrum Critical-Band Filter Bank Hair-Cell Model Synchrony Detector Synchrony Spectrum Stage I Stage II Stage III Saturating Half-Wave Rectifier Short-term AGC Lowpass Filter Rapid AGC Slide 49

48 Why the nonlinearity seems to help rate level function compression traditional log compression clean noisy Traditional log compression output 10 8 output Count input energy histogram of clean speech input energy histogram of 20 db white noise channel index Rate level function compression 9 clean 8 noisy t output Cou unt input energy channel index Slide 50

49 Impact of auditory nonlinearity (Chiu) Accuracy 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Learned nonlinearity Baseline nonlinearity MFCC clean 20 db 15 db 10 db 5 db 0 db -5 db (a) Test set A Slide 51

50 PNCC processing (Kim and Stern, 2010,2014) n A pragmatic implementation of a number of the principles described: Gammatone filterbanks Nonlinearity shaped to follow auditory processing Medium-time environmental compensation using nonlinearity cepstral highpass filtering in each channel Enhancement of envelope onsets Computationally efficient implementation Slide 52

51 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT Slide 53

52 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT H(! ) 2 RASTA-PLP Processing STFT Crit-Band Freq Wtg Nonlinear Compression RASTA Filter Nonlinear Expansion Power-Law Nonlinearity IFFT Compute LPCbased Cepstra Exponent 1/3 Slide 54

53 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT H(! ) 2 RASTA-PLP Processing STFT Crit-Band Freq Wtg Nonlinear Compression RASTA Filter Nonlinear Expansion Power-Law Nonlinearity IFFT Compute LPCbased Cepstra Exponent 1/3 H(! ) 2 PNCC Processing STFT Gammatone Freq Wtg Noise Reduction Temporal Masking Frequency Weighting Power Normaliz.. Power-Law Nonlinearity DCT Exponent 1/15 Slide 55

54 The nonlinearity in PNCC processing (Kim) Rate (spikes / sec) Human Rate Intensity Model Cube Root Power Law Approximation MMSE Power Law Approximation Logarithmic Approximation Pressure (Pa) Rate (spikes / sec) Human Rate Intensity Model Cube Root Power Law Approximation MMSE Power Law Approximation Logarithmic Approximation Tone Level (db SPL) Slide 56

55 Frequency resolution n Examined several types of frequency resolution MFCC triangular filters Gammatone filter shapes Truncated Gammatone filter shapes n Most results do not depend greatly on filter shape n Some sort of frequency integration is helpful when frequencybased selection algorithms are used H(e j! ) Frequency (Hz) Slide 57

56 Analysis window duration (Kim) n Typical analysis window duration for speech recognition is ~25-35 ms n Optimal analysis window duration for estimation of environmental parameters is ~ ms n Best systems measure environmental parameters (including voice activity detection over a longer time interval but apply results to a short-duration analysis frame Slide 58

57 Temporal Speech Properties: modulation filtering Output of speech and noise segments from 14 th Mel filter (1050 Hz) n Speech segment exhibits greater fluctuations Slide 59 59!

58 Nonlinear noise processing n Use nonlinear cepstral highpass filtering to pass speech but not noise n Why nonlinear? Need to keep results positive because we are dealing with manipulations of signal power Slide 60

59 Asymmetric lowpass filtering (Kim, 2010) n Overview of processing: Assume that noise components vary slowly compared to speech components Obtain a running estimate of noise level in each channel using nonlinear processing Subtract estimated noise level from speech n An example: Note: Asymmetric highpass filtering is obtained by subtracting the lowpass filter output from the input Slide 61

60 Implementing asymmetric lowpass filtering Basic equation: Dependence on parameter values: Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.9,! b = 0.9) Time (s) Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.5,! b = 0.95) Time (s) Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.999,! b = 0.5) Time (s) Slide 62

61 Computational complexity of front ends Mults & Divs per Frame MFCC PLP PNCC Truncated PNCC Slide 64

62 Performance of PNCC in white noise (RM) 56678(6901#!!0!0:;/4 #!!,! +! *! B.%% <C%%0D>?=0EF- $! <C%% /5-F5!BGB!! " #! #" $! %&'() -./01234 Slide 65

63 Performance of PNCC in white noise (WSJ) 56678(6901#!!0!0:;/4 #!!,! +! *! C.%% $! DE%% /5-H5!CIC!! " #! #" $! %&'() -./01234 Slide 66

64 Performance of PNCC in background music 56678(6901#!!0!0:;/4 #!!,! +! *! B.%% $! >C%% /5-H5!BIB!! " #! #" $! %&'() -./01234 Slide 67

65 Performance of PNCC in reverberation #!! *! (! &! $!?AB!!'C58+,-,./,.01234: DEFF GHFF5I21J5K6A GHFF +;A6;!DLD!!!"#!"$!"%!"&!"'!"(!") #"$ +,-,./, ,589: Slide 68

66 Contributions of PNCC components: white noise (WSJ) -../0(.123#!!2!24567 #!!,! +! *! $! + Temporal masking! + Noise suppression! + Medium-duration processing! Baseline MFCC + CMN!!! " #! #" $! %&'() 89623:;7 Slide 69

67 Contributions of PNCC components: background music (WSJ) -../0(.123#!!2!24567 #!!,! +! *! $! + Temporal masking! + Noise suppression! + Medium-duration processing! Baseline MFCC + CMN!!! " #! #" $! %&'() 89623:;7 Slide 70

68 Contributions of PNCC components: reverberation (WSJ) #!! *! (! &! $!?AB!!'C58+,-,./,.01234: D4!EE24,5FGHH5I21J5HKG + Temporal masking! D4!EE24,5FGHH5I21J3=15K09C24L + Noise suppression! D4!EE24,5FGHH5 5I21J3=15K09C24L504M5N2E1,.24L + Medium-duration processing! O09,E24,5KNHH5I21J5HKG Baseline MFCC + CMN!!!!"#!"$!"%!"&!"'!"(!") #"$ +,-,./, ,589: Slide 71

69 Effects of onset enhancement/temporal masking (SSF processing, Kim 10) #!! /<#01<7=>60.?>='4 #!! 5A# /:;<= (6901#!!0!0:;/4,! +! *! $! /5-A5!HIH0D>EF0%<.!! " #! #" $! %&'() -./ ,,-./,012#!!1!13456 *! (! &! $!!!!"#!"$!"%!"&!"'!"(!") #"$ Slide 72

70 PNCC and Slide 73

71 Summary so what matters? n Knowledge of the auditory system can certainly improve ASR accuracy: Use of synchrony Consideration of rate-intensity function Onset enhancement Nonlinear modulation filtering Selective reconstruction Consideration of processes mediating auditory scene analysis Slide 74

72 Summary: PNCC processing n PNCC processing includes More effective nonlinearity Parameter estimation for noise compensation and analysis based on longer analysis time and frequency spread Efficient noise compensation based on modulation filtering Onset enhancement Computationally-efficient implementation n Not considered yet Synchrony representation Lateral suppression Slide 75

73

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax: Robust Automatic Speech Recognition In the 21 st Century Richard Stern (with Alex Acero, Yu-Hsiang Chiu, Evandro Gouvêa, Chanwoo Kim, Kshitiz Kumar, Amir Moghimi, Pedro Moreno, Hyung-Min Park, Bhiksha

More information

ROBUST SPEECH RECOGNITION. Richard Stern

ROBUST SPEECH RECOGNITION. Richard Stern ROBUST SPEECH RECOGNITION Richard Stern Robust Speech Recognition Group Mellon University Telephone: (412) 268-2535 Fax: (412) 268-3890 rms@cs.cmu.edu http://www.cs.cmu.edu/~rms Short Course at Universidad

More information

IN recent decades following the introduction of hidden. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

IN recent decades following the introduction of hidden. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. X, NO. X, MONTH, YEAR 1 Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim and Richard M. Stern, Member,

More information

Signal Processing for Robust Speech Recognition Motivated by Auditory Processing

Signal Processing for Robust Speech Recognition Motivated by Auditory Processing Signal Processing for Robust Speech Recognition Motivated by Auditory Processing Chanwoo Kim CMU-LTI-1-17 Language Technologies Institute School of Computer Science Carnegie Mellon University 5 Forbes

More information

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and Richard M. Stern, Fellow, IEEE

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and Richard M. Stern, Fellow, IEEE IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 1315 Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and

More information

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,

More information

Robust speech recognition using temporal masking and thresholding algorithm

Robust speech recognition using temporal masking and thresholding algorithm Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Robust Speech Recognition Based on Binaural Auditory Processing

Robust Speech Recognition Based on Binaural Auditory Processing INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

Robust Speech Recognition Based on Binaural Auditory Processing

Robust Speech Recognition Based on Binaural Auditory Processing Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh,

More information

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Power-Normalized Cepstral Coefficients (PNCC) for Robust

More information

MOST MODERN automatic speech recognition (ASR)

MOST MODERN automatic speech recognition (ASR) IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER 1997 451 A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition Brian Strope and Abeer Alwan, Member,

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Using the Gammachirp Filter for Auditory Analysis of Speech

Using the Gammachirp Filter for Auditory Analysis of Speech Using the Gammachirp Filter for Auditory Analysis of Speech 18.327: Wavelets and Filterbanks Alex Park malex@sls.lcs.mit.edu May 14, 2003 Abstract Modern automatic speech recognition (ASR) systems typically

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure

More information

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels AUDL 47 Auditory Perception You know about adding up waves, e.g. from two loudspeakers Week 2½ Mathematical prelude: Adding up levels 2 But how do you get the total rms from the rms values of two signals

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing AUDL 4007 Auditory Perception Week 1 The cochlea & auditory nerve: Obligatory stages of auditory processing 1 Think of the ear as a collection of systems, transforming sounds to be sent to the brain 25

More information

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications

More information

ROBUST SPEECH RECOGNITION BASED ON HUMAN BINAURAL PERCEPTION

ROBUST SPEECH RECOGNITION BASED ON HUMAN BINAURAL PERCEPTION ROBUST SPEECH RECOGNITION BASED ON HUMAN BINAURAL PERCEPTION Richard M. Stern and Thomas M. Sullivan Department of Electrical and Computer Engineering School of Computer Science Carnegie Mellon University

More information

Imagine the cochlea unrolled

Imagine the cochlea unrolled 2 2 1 1 1 1 1 Cochlea & Auditory Nerve: obligatory stages of auditory processing Think of the auditory periphery as a processor of signals 2 2 1 1 1 1 1 Imagine the cochlea unrolled Basilar membrane motion

More information

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend Signals & Systems for Speech & Hearing Week 6 Bandpass filters & filterbanks Practical spectral analysis Most analogue signals of interest are not easily mathematically specified so applying a Fourier

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based

More information

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin Hearing and Deafness 2. Ear as a analyzer Chris Darwin Frequency: -Hz Sine Wave. Spectrum Amplitude against -..5 Time (s) Waveform Amplitude against time amp Hz Frequency: 5-Hz Sine Wave. Spectrum Amplitude

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES J. Bouše, V. Vencovský Department of Radioelectronics, Faculty of Electrical

More information

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford)

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford) Phase and Feedback in the Nonlinear Brain Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford) Auditory processing pre-cosyne workshop March 23, 2004 Simplistic Models

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Acoustics, signals & systems for audiology. Week 4. Signals through Systems Acoustics, signals & systems for audiology Week 4 Signals through Systems Crucial ideas Any signal can be constructed as a sum of sine waves In a linear time-invariant (LTI) system, the response to a sinusoid

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS Roland SOTTEK, Klaus GENUIT HEAD acoustics GmbH, Ebertstr. 30a 52134 Herzogenrath, GERMANY SUMMARY Sound quality evaluation of

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

T Automatic Speech Recognition: From Theory to Practice

T Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

AUDL Final exam page 1/7 Please answer all of the following questions.

AUDL Final exam page 1/7 Please answer all of the following questions. AUDL 11 28 Final exam page 1/7 Please answer all of the following questions. 1) Consider 8 harmonics of a sawtooth wave which has a fundamental period of 1 ms and a fundamental component with a level of

More information

Damped Oscillator Cepstral Coefficients for Robust Speech Recognition

Damped Oscillator Cepstral Coefficients for Robust Speech Recognition Damped Oscillator Cepstral Coefficients for Robust Speech Recognition Vikramjit Mitra, Horacio Franco, Martin Graciarena Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA.

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution AUDL GS08/GAV1 Signals, systems, acoustics and the ear Loudness & Temporal resolution Absolute thresholds & Loudness Name some ways these concepts are crucial to audiologists Sivian & White (1933) JASA

More information

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts POSTER 25, PRAGUE MAY 4 Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts Bc. Martin Zalabák Department of Radioelectronics, Czech Technical University in Prague, Technická

More information

COM325 Computer Speech and Hearing

COM325 Computer Speech and Hearing COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk

More information

Spectral and temporal processing in the human auditory system

Spectral and temporal processing in the human auditory system Spectral and temporal processing in the human auditory system To r s t e n Da u 1, Mo rt e n L. Jepsen 1, a n d St e p h a n D. Ew e r t 2 1Centre for Applied Hearing Research, Ørsted DTU, Technical University

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although

More information

3D Distortion Measurement (DIS)

3D Distortion Measurement (DIS) 3D Distortion Measurement (DIS) Module of the R&D SYSTEM S4 FEATURES Voltage and frequency sweep Steady-state measurement Single-tone or two-tone excitation signal DC-component, magnitude and phase of

More information

Monaural and Binaural Speech Separation

Monaural and Binaural Speech Separation Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as

More information

Learning-Based Auditory Encoding for Robust Speech. Recognition

Learning-Based Auditory Encoding for Robust Speech. Recognition Learning-Based Auditory Encoding for Robust Speech Recognition Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering Yu-Hsiang

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL 9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma & Department of Electrical Engineering Supported in part by a MURI grant from the Office of

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Signals, Sound, and Sensation

Signals, Sound, and Sensation Signals, Sound, and Sensation William M. Hartmann Department of Physics and Astronomy Michigan State University East Lansing, Michigan Л1Р Contents Preface xv Chapter 1: Pure Tones 1 Mathematics of the

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Robust telephone speech recognition based on channel compensation

Robust telephone speech recognition based on channel compensation Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH

BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH Anjali Menon 1, Chanwoo Kim 2, Umpei Kurokawa 1, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University,

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Speech/Music Change Point Detection using Sonogram and AANN

Speech/Music Change Point Detection using Sonogram and AANN International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 6, Number 1 (2016), pp. 45-49 International Research Publications House http://www. irphouse.com Speech/Music Change

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking Courtney C. Lane 1, Norbert Kopco 2, Bertrand Delgutte 1, Barbara G. Shinn- Cunningham

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise.

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise. Journal of Advances in Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sari Branch, Islamic Azad University, Sari, I.R.Iran (Vol. 6, No. 3, August 2015), Pages: 87-95 www.jacr.iausari.ac.ir

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

FFT analysis in practice

FFT analysis in practice FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular

More information