Machine recognition of speech trained on data from New Jersey Labs

Similar documents
RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

PLP 2 Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

I D I A P. Hierarchical and Parallel Processing of Modulation Spectrum for ASR applications Fabio Valente a and Hynek Hermansky a

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Speech recognition from spectral dynamics

Speech and Music Discrimination based on Signal Modulation Spectrum.

SPEECH INTELLIGIBILITY DERIVED FROM EXCEEDINGLY SPARSE SPECTRAL INFORMATION

HCS 7367 Speech Perception

Using RASTA in task independent TANDEM feature extraction

Factors Governing the Intelligibility of Speech Sounds

Predicting Speech Intelligibility from a Population of Neurons

Reverse Correlation for analyzing MLP Posterior Features in ASR

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Spectro-temporal Gabor features as a front end for automatic speech recognition

Complex Sounds. Reading: Yost Ch. 4

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Auditory Based Feature Vectors for Speech Recognition Systems

Speech Signal Analysis

Monaural and binaural processing of fluctuating sounds in the auditory system

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Auditory filters at low frequencies: ERB and filter shape

MOST MODERN automatic speech recognition (ASR)

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress!

Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex

Using the Gammachirp Filter for Auditory Analysis of Speech

The psychoacoustics of reverberation

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Modulation Domain Spectral Subtraction for Speech Enhancement

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

SGN Audio and Speech Processing

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Digitally controlled Active Noise Reduction with integrated Speech Communication

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

ABSTRACT. Title of Document: SPECTROTEMPORAL MODULATION LISTENERS. Professor, Dr.Shihab Shamma, Department of. Electrical Engineering

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

EE482: Digital Signal Processing Applications

Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Preface A detailed knowledge of the processes involved in hearing is an essential prerequisite for numerous medical and technical applications, such a

Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering

Predicting the Intelligibility of Vocoded Speech

SGN Audio and Speech Processing

TNS Journal Club: Efficient coding of natural sounds, Lewicki, Nature Neurosceince, 2002

Auditory modelling for speech processing in the perceptual domain

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Cepstrum alanysis of speech signals

Adaptive Filters Application of Linear Prediction

A102 Signals and Systems for Hearing and Speech: Final exam answers

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Published in: Proceedings for ISCA ITRW Speech Analysis and Processing for Knowledge Discovery

Spectral and temporal processing in the human auditory system

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Fei Chen and Philipos C. Loizou a) Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas 75083

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Modeling auditory processing of amplitude modulation I. Detection and masking with narrow-band carriers Dau, T.; Kollmeier, B.; Kohlrausch, A.G.

Speech Synthesis; Pitch Detection and Vocoders

Channel selection in the modulation domain for improved speech intelligibility in noise

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

I. INTRODUCTION J. Acoust. Soc. Am. 110 (3), Pt. 1, Sep /2001/110(3)/1628/13/$ Acoustical Society of America

Citation for published version (APA): Lijzenga, J. (1997). Discrimination of simplified vowel spectra Groningen: s.n.

The role of intrinsic masker fluctuations on the spectral spread of masking

FFT 1 /n octave analysis wavelet

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford)

Speech Synthesis using Mel-Cepstral Coefficient Feature

What is Sound? Part II

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

Speech/Music Change Point Detection using Sonogram and AANN

METHOD OF ESTIMATING DIRECTION OF ARRIVAL OF SOUND SOURCE FOR MONAURAL HEARING BASED ON TEMPORAL MODULATION PERCEPTION

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

Hierarchical and parallel processing of auditory and modulation frequencies for automatic speech recognition

Robust telephone speech recognition based on channel compensation

6.551j/HST.714j Acoustics of Speech and Hearing: Exam 2

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Reprint from : Past, present and future of the Speech Transmission Index. ISBN

SNR Estimation Based on Amplitude Modulation Analysis With Applications to Noise Suppression

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Measuring the complexity of sound

Chapter 4 SPEECH ENHANCEMENT

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Transcription:

Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41

RASTA filter 10 attenuation [db] 40 1 10 modulation [Hz] 42

06/04/14 1-12 Hz Passband sensitivity of hearing to modulation peaks at about 4 Hz Riesz 1928, Zwicker 1952, modulation transfer function of primary auditory cortex peaks at about 4 Hz Schreiner via Greenberg, personal communication 1997 modulation spectrum of speech peaks at about 4 Hz Houtgast and Steeneken 1978 intelligibility of speech significantly impaired when 4 Hz modulation component attenuated Drullman et al 1992, Arai et al 1996 Relative importance of various components of modulation spectrum of speech for speech intelligibility and for ASR 43

RASTA filter 0 300 [ms] average four neighboring frames subtract exponentially decaying past values (τ=170 ms) Masking in Time t 0 t o + Δt t o + 250 ms t 0 t o + Δt t o + 250 ms suggests ~250 ms buffer (cri-cal interval) in auditory system what happens outside the cri0cal interval, does not affect detec0on of signal within the cri0cal interval 44

~ 200 ms length of impulse response discrimination of short stimuli improves up to about 200 ms loudness of equal-energy stimuli grows up to about 200 ms minimum detectable silent interval indicates constant of about 200 ms effect of forward masking lasts about 200 ms syllable-length buffer of human hearing? spectrogram (short-term Fourier spectrum) Perceptual Linear Prediction (PLP) (12 th order model) [s] RASTA-PLP 45

Formant-Less Vowel original speech filtered speech filter original speech spectrogram filtered speech spectrogram from RASTA 46

Data Do Not Lie Prof. Frederick Jelinek: Airplanes don t flap their wings. S. Lohr, New York Times, March 6, 2011 Airplanes do not flap wings but have wings nevertheless,.. Of course, we should try to incorporate the knowledge that we have of hearing, speech production, etc., into our systems,... F. Jelinek, Five speculations (and a divernto) on the themes of H. Bourlard, H. Hermansky, and N. Morgan, Speech Communication 18, 1996. 242 2 93 Linear Discriminant Analysis (LDA) Linear discriminants: eigenvectors of S -1 W S B LDA S W - within-class covariance matrix S B - between class covariance matrix Needs labeled data Within-class distributions assumed Gaussian with equal σ (take log of power spectrum) 47

Spectral Basis from LDA LDA gives basis for projection of spectral space LDA vectors from Fourier Spectrum (OGI 3 hour stories hand-labeled database) 63 % 16 % 12 % 2 % Spectral resolution of LDAderived spectral basis is higher at low frequencies Psychophysics: Critical bands of human hearing are broader at higher frequencies Physiology: Position of maximum of traveling wave on basilar membrane is proportional to logarithm of 48

Sensitivity to Spectral Change (Malayath 1999) Cosine basis LDA-derived basis Critical-ban Two ways of using LDA LDA gives basis for projec-on of spectral space LDA gives FIR filters for processing trajectories of spectral energies /j/ /u/ /a r / /j/ /o/ /j/ /o/ ~ 1 sec /j/ /u/ /a r / /j/ /o/ /j/ /o/ 49

RASTA Filters from Speech Data ~ 1 sec impulse responses 77% 10% -500 0 500-500 0 500 7% 2% /j/ /u/ /a r / /j/ /o/ /j/ /o/ -500 0 500 [ms] -500 0 500 [ms] attenuation [db] 0 10 responses (1 st discriminant in all channels) higher carrier frequencies 20 0.1 1.0 10.0 modulation [Hz] original RASTA filter 10 0 300 [ms] attenuation [db] 40 1 10 modulation [Hz] engineering effect ( signal ) cognitive signal effect ( signal ) perception good engineering could be consistent with biology physiology of sensory organs psychophysics of perception emulation of the knowledge in engineering 50

C. Shannon: Communication in Presence of Noise combination of channel and signal spectrum should be as flat (as random-like) as possible energy of the signal level of noise in the channel Forces of Nature energy of the signal level of noise in the channel if signal could be controlled (e.g. speech) put more signal where there is less noise sensory signal optimized for a given communication channel resource space 51

~10 ms ~400 ms Evaluate spectra within a given speech sound relative to neighboring sounds Mutual Information Between Phoneme Labels and Measurement(s) in Time H. Yang et al 2000, F. Li (unpublished) first measurement second measurement 52

Auditory cortical receptive fields from N. Mesgarani Time- patterns that optimally excite a given cortical neuron different carrier frequencies different temporal resolutions different spectral resolutions Most often localized and often rather long 1 st principal component along temporal axis from about 300 STRFs Nima Mesgarani (in preparation) (41% of variance) [s] Short Term Spectral Envelope? Ear is selective! Simultaneous masking: Sound elements outside a critical band do not corrupt decoding of elements inside the band Temporal masking: Sound elements outside a critical interval (about 250 ms) do not corrupt decoding of elements inside the interval P(ε) = P(ε i ) i Human listeners recognize speech in independent bands Jont Allen s interpretation of earlier works of Fletcher et al at the 1993 Summer Workshop at Rutgers University To recognize phoneme one needs to collect information distributed over the whole syllable Kozhevnikov and Chistovich (Speech: Articulation and Perception, 1965) 53

Power of Experimental Results Ptolemy Galileo Ear is selective!! howewer, it is NOT to derive spectrum of the signal but! to yield -localized temporal patterns, which carry the information about underlying acoustic events.! 54

Away from Short-Term Spectrum back to human hearing t 0 ΔT s(f,t 0 ) fourier transform spectrum of the short segment 109 Frequency Domain Linear Prediction (FDLP) FDLP means for all-pole estimation of Hilbert envelopes (instantaneous spectral energies) in individual channels speech signal preprocessing autoregressive model PLP spectrum t 0 cosine transfrorm t 0 -me autoregressive model FDLP es-mate of Hilbert envelope f 0 f 0 55