Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Similar documents
Spectro-Temporal Processing of Dynamic Broadband Sounds In Auditory Cortex

Neuronal correlates of pitch in the Inferior Colliculus

Ripples in the Anterior Auditory Field and Inferior Colliculus of the Ferret

Pressure vs. decibel modulation in spectrotemporal representations: How nonlinear are auditory cortical stimuli?

Imagine the cochlea unrolled

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford)

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Neural Coding of Multiple Stimulus Features in Auditory Cortex

Complex Sounds. Reading: Yost Ch. 4

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Multiresolution Spectrotemporal Analysis of Complex Sounds

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

TNS Journal Club: Efficient coding of natural sounds, Lewicki, Nature Neurosceince, 2002

Magnetoencephalography and Auditory Neural Representations

Spectral envelope coding in cat primary auditory cortex: linear and non-linear effects of stimulus characteristics

Across frequency processing with time varying spectra

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Predicting discrimination of formant frequencies in vowels with a computational model of the auditory midbrain

Simple Measures of Visual Encoding. vs. Information Theory

Chapter 5. Signal Analysis. 5.1 Denoising fiber optic sensor signal

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Modulation Encoding in Auditory Cortex. Jonathan Z. Simon University of Maryland

Gabor Analysis of Auditory Midbrain Receptive Fields: Spectro-Temporal and Binaural Composition

Spectral and temporal processing in the human auditory system

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Neural Representations of Sinusoidal Amplitude and Frequency Modulations in the Primary Auditory Cortex of Awake Primates

Neural Processing of Amplitude-Modulated Sounds: Joris, Schreiner and Rees, Physiol. Rev. 2004

Limulus eye: a filter cascade. Limulus 9/23/2011. Dynamic Response to Step Increase in Light Intensity

Chapter 2 A Silicon Model of Auditory-Nerve Response

Problems from the 3 rd edition

Neuromimetic Sound Representation for Percept Detection and. Manipulation. Abstract

Auditory modelling for speech processing in the perceptual domain

EC209 - Improving Signal-To-Noise Ratio (SNR) for Optimizing Repeatable Auditory Brainstem Responses

Concurrent Encoding of Frequency and Amplitude Modulation in Human Auditory Cortex: Encoding Transition

ABSTRACT. Title of Document: SPECTROTEMPORAL MODULATION LISTENERS. Professor, Dr.Shihab Shamma, Department of. Electrical Engineering

I R UNDERGRADUATE REPORT. Stereausis: A Binaural Processing Model. by Samuel Jiawei Ng Advisor: P.S. Krishnaprasad UG

SAMPLING THEORY. Representing continuous signals with discrete numbers

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Reverse Correlation for analyzing MLP Posterior Features in ASR

Speech Synthesis using Mel-Cepstral Coefficient Feature

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking

A102 Signals and Systems for Hearing and Speech: Final exam answers

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Drum Transcription Based on Independent Subspace Analysis

FFT 1 /n octave analysis wavelet

PLP 2 Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

Introduction to cochlear implants Philipos C. Loizou Figure Captions

The psychoacoustics of reverberation

8.3 Basic Parameters for Audio

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

AUDL Final exam page 1/7 Please answer all of the following questions.

A MULTI-RESOLUTION APPROACH TO COMMON FATE-BASED AUDIO SEPARATION

Beyond Blind Averaging Analyzing Event-Related Brain Dynamics

21/01/2014. Fundamentals of the analysis of neuronal oscillations. Separating sources

A Silicon Model Of Auditory Localization

Machine recognition of speech trained on data from New Jersey Labs

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Physiological evidence for auditory modulation filterbanks: Cortical responses to concurrent modulations

COM325 Computer Speech and Hearing

SGN Audio and Speech Processing

The Modulation Transfer Function for Speech Intelligibility

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

John Lazzaro and Carver Mead Department of Computer Science California Institute of Technology Pasadena, California, 91125

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu

Chapter 73. Two-Stroke Apparent Motion. George Mather

arxiv: v2 [q-bio.nc] 19 Feb 2014

Multirate Signal Processing Lecture 7, Sampling Gerald Schuller, TU Ilmenau

Data Communication. Chapter 3 Data Transmission

ECE438 - Laboratory 7a: Digital Filter Design (Week 1) By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015

SIMULATING RESTING CORTICAL BACKGROUND ACTIVITY WITH FILTERED NOISE. Journal of Integrative Neuroscience 7(3):

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Modeling auditory processing of amplitude modulation I. Detection and masking with narrow-band carriers Dau, T.; Kollmeier, B.; Kohlrausch, A.G.

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

EENG473 Mobile Communications Module 3 : Week # (12) Mobile Radio Propagation: Small-Scale Path Loss

A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise

A Simplified Extension of X-parameters to Describe Memory Effects for Wideband Modulated Signals

1.Explain the principle and characteristics of a matched filter. Hence derive the expression for its frequency response function.

Measuring the complexity of sound

Introduction to Computational Neuroscience

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Practical Applications of the Wavelet Analysis

Distortion products and the perceived pitch of harmonic complex tones

A learning, biologically-inspired sound localization model

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

FFT analysis in practice

Chapter 3 Data and Signals 3.1

Complex Digital Filters Using Isolated Poles and Zeroes

Pitch estimation using spiking neurons

Transcription:

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma & Department of Electrical Engineering Supported in part by a MURI grant from the Office of Naval Research and by a grant from the National Science Foundation also available at <http://www.isr.umd.edu/caar/pubs.html>

Summary In Primary Auditory Cortex (AI) of ferrets, we characterize the spectro-temporal properties of cells responses. We find that the responses correspond to temporal modulations from to Hz, and spectral modulations from to cycles/octave, in the stimulus spectro-temporal envelope. The Spectro-Temporal Response Function (STRF) is the linear component of the response. It is an excellent predictor of response. Different methods of determining the STRF are in good agreement, make similar (and similarly accurate) predictions. Spike-Triggered Averaging is an effective method to measure the STRF, when used with Temporally Orthogonal Ripple Combinations (TORCs) as stimuli. Spike-Triggered Averaging methods do not depend on quadrant separability, and provide a good method for seeking non-quadrant separable responses.

Auditory Stages and Representations Cochlea Inferior Colliculus Cortex envelope envelope of envelope Sound is bandpassed, half-wave rectified, and low-passed Envelope of the sound is bandpassed or lowpassed, extracting the first (fast) modulation of the sound near BF Band-passed envelope from IC is lowpassed, extracting the second (slow) modulation of the sound near BF Each stage of the auditory pathway represents the auditory stimulus differently. After cochlear processing, the acoustic waveform is represented by parallel, frequencyordered, neural signals, which can encode different characteristics of the sound in different areas. By primary auditory cortex, responses represent the slow modulations (a few to a few tens of Hertz) of the envelope of the signal. Spike-triggered averaging, a method for extracting the linear component of the response, can be done on neural signals at any stage, and the averaging can be on any representation of the signal. The stimuli must be rich for the final average to fully characterize the response. Acoustic Waveform Fast Spectro-Temporal Envelope Slow Spectro-Temporal Envelope f f Spikes to trigger average t t t t

Introduction to Cortical Responses I Spike-triggered average waveform Inability of cortical cells to lock to waveform usually renders this statistic useless Spike-triggered average envelope Cortical cells lock to the slow(er) modulations of the envelope. The average does not give much frequency selectivity information Spike-triggered average spectrogram Bandpassing at various frequencies before computing the spike-triggered average of the envelope gives more detail, such as differentiating between inhibitory and excitatory frequency bands, and shows different temporal responses for different frequency bands frequency (khz).5.5 6 8 6 8 6/a.5 6 8 time (ms)

Introduction to Cortical Responses II Spike-triggered spectro-temporal envelope average Spike-triggered averaging of the spectro-temporal envelope directly gives a similar spectro-temporal response field to the spiketriggered average of the filter-bank envelopes frequency (khz).5.5 STRF from Ripple Transfer Funcion The spectro-temporal transfer function is compiled by measuring the response amplitude and phase to single sinusoidally modulated spectra that move in time (ripples). The spectro-temporal response function is the -D inverse Fourier transform. 6/a.5 6 8 frequency (khz).5.5.5 6/a 6 time (ms) 8

Motivation and Methods Primary cortical cells prefer spectral envelope modulations from to Hz, and to cycles/octave. Ripple transfer functions, the Fourier transform of the Spectro-Temporal Response Field (STRF), are time-consuming to measure. Assuming separability (or at least quadrant-separability), measurement time is reduced, but how universal is separability? STRF can also be measured by spike-triggered averaging of spectro-temporal representations of the stimulus. The stimulus must be rich in the spectro-temporal modulations that characterize the response. Spike-triggered averaging does not rely on separability and can be used to test separability directly. As stimuli, Temporally Orthogonal Ripple Combinations (TORCs) cover large regions of spectro-temporal modulation space efficiently. Each stimulus is composed of superposition of moving sinusoids (ripples) Stimuli have only one spectral modulation for each temporal modulation, but many temporal modulations, each a multiple of the base. Stimulus components are orthogonal over the averaging interval. One-to-one correspondence between stimulus components and spectral modulations removes ambiguity of which component evokes which aspect of the response dynamics. Duplicating each stimulus with opposite polarity (overall sign) strongly reduces halfwave rectification non-linearity (actually all even-order non-linearities).

Spectro-Temporal Modulations Moving Ripple in Spectro-Temporal Space (Spectrogram) Frequency (khz) 8.5.5 5 The Fourier transform of a moving sinusoid has support only on a single point (and its complex conjugate). [.] exp(±πjωx±πjwt) Moving Ripple in Fourier Space Hz w -. cyc/oct. cyc/oct Ω Hz

-D Decomposition of Broadband Sound Frequencies mapped along cochlea on log frequency axis Natural sounds dynamic, time axis required. Use two-dimensional functions of log(freq) and time Analysis is often conceptually simpler in the Fourier domain. (A) A speech fragment has its envelope (B) Fourier transformed. The Fourier transform is then approximated by its largest components in (C) and then inverted back in (D), giving an excellent approximation to the original envelope. A frequency (khz) C ripple velocity (Hz).5.5.5 5-5 Spectrogram (log frequency) water all year 5 time (ms) Ripple Transform ( peaks) w Ω frequency (khz) frequency (khz) B.5.5.5 D.5.5 LPC Envelope 5 time (ms) Reconstruction - -5 5 ripple frequency (cycles/octave).5 5 time (ms)

Spike Averaged Spectro-Temporal Envelope Stimulus No. 8 Response No. 8 (Normalized PSTH) Frequency (khz).5.5.5 5 5 5 Spikes per presentation per ms. -. 5 5 5 Spike-Triggered Avg. Past Stimulus No. 8 Spectro-Temporal Response Field Frequency (khz).5.5.5.5.5 5 5 5.5 5 5 5

Cochlear Spectrogram Stimulus Waveform Cochlear Filters Spectrogram Log Frequency Log Frequency 5 5 5 The cochlear spectrogram of the stimulus is obtained by passing the stimulus waveform through a bank of cochlear filters. The temporal envelope of each of the filter outputs form the rows of the cochlear spectrogram.

Spike-Triggered Average Spectrogram Stimulus No. 8 Response No. 8 (Normalized PSTH) Frequency (khz).5.5.5 Spikes per presentation per ms. -. 5 5 5 5 5 5 Spike-Triggered Avg. Past Stimulus No. 8 Spectro-Temporal Response Field Frequency (khz).5.5.5 5 5 5 5 5 5

A Measurements by Ripple Velocity Ripple Velocity (Hz) Ripple Frequency is. cyc/oct 7 db B -Hz C π Transfer Function Amplitude Transfer Function Phase 6 IR (. cyc/oct) π - -6-8 8 6 Ripple velocity (Hz) -6 5 5 5 Spike events in (A) are turned into period histograms in (B). The amplitudes and phases give the tranfer function in (C), which can be inverse Fourier transformed to give Impulse Responses in (D). D -6 6 IR (-. cyc/oct) /38a7 6 8-8Hz 5 5 75 5 8 -Hz 5 5 5 8 Hz 7

A Measurements by Ripple Frequency Ripple Frequency (cyc/oct) Ripple Velocity is 8 Hz 7 db Spike counts B -. cyc/oct /38a6 cyc/oct C Transfer Function Amplitude 6 π Transfer Function Phase 8 π 8 π 6 π -.6 -.8.8.6 Ripple Frequency (cyc/oct) Spike events in (A) are turned into period histograms in (B). The amplitudes and phases give the tranfer function in (C), which can be inverse Fourier transformed to give Response Fields in (D). D Amplitude Spike counts 5 5 8 RF (-8 Hz) RF (8 Hz) RF (Pure Tones) 3 5 Octaves. cyc/oct. cyc/oct 5 5 75 5

Spectro-Temporal Transfer Function Spectro-temporal response field of neuron is the usual response field made time-dependent. Its Fourier transform is the transfer function. Either can be used to predict the response to any broadband dynamic sound. Spectro-Temporal Response Function (STRF) of a neuron x = log f Dimensional Transfer Function of the same neuron w Fourier Transform /a t [.] exp(±πjωx±πjwt) Inverse Transform ripple velocity (Hz) ripple frequency (cycles/octave) /a Ω

Spectro-Temporal Responses Compared I Inverse Fourier transform of ripple transfer function Spike-triggered average spectro-temporal envelope Spike-triggered average low-passed spectrogram Frequency (Hz) 5 5 5 6/a QSP 6/a MM 6/a MMc 8 Frequency (Hz) 5 6/a QSP 5 6/a MM 6/a MMc 5 5 5 5 5 5 5 5 5 Time (ms)

Spectro-Temporal Responses Compared II Inverse Fourier transform of ripple transfer function Spike-triggered average spectro-temporal envelope Spike-triggered average low-passed spectrogram Frequency (Hz) 5 5 5 6/a QSP 6/a MM 6/a MMc 8 Frequency (Hz) 5 5 6/6a QSP 6/6a MM 5 5 5 5 5 5 6/6a MMc 5 5 5 Time (ms)

Spectro-Temporal Responses Compared III Inverse Fourier transform of ripple transfer function Spike-triggered average spectro-temporal envelope Spike-triggered average low-passed spectrogram Frequency (Hz) 5 5 5 6/5a QSP 6/5a MM 6/5a MMc 8 Frequency (Hz) 5 5 5 5 5 Time (ms) 8/8a QSP 8/8a MM 8/8a MMc 5 5 5 Time (ms) 5 5 5 Time (ms)

Linearity in Theory Assuming linearity, the STRF predicts the response to any broadband dynamic stimulus, including single ripples moving in either direction (first two rows) and combinations of upward and downward moving ripples. Frequency (khz) Stimulus Spectrogram STRF Expected Response 8.5.5 8 * t = Frequency (khz).5 * t = Frequency (khz).5 8 * t.5.5 5 5 5 =

Linearity in Practice The correlation between predicted and actual response is quite good for most cells. Since cells cannot fire at negative rates, any prediction should be half-wave rectified before comparing to the actual response. 8 Stimulus Spectrogram 8 STRF Response Frequency (khz).5 * t.5 = - 6/a7(7).5 6.5 6 Frequency (khz) 8.5 time (ms) * t 8.5 time (ms) = 5-5 time (ms) /a6(3) Prediction Response No Spikes Spontaneous

Fast Responses Some units respond well at time scales as fast as ~ ms. This is seen both in the raster plot and in the STRF. When the output of the filter bank is low-passed at 5 Hz, the resulting STRF looks much more like the Spectro-Temporal Envelope generated STRF, which contains temporal modulation only up to Hz (in this case). Raster Plot 6/a 3 ms 5 Spike-triggered Cochlear Filter average Spike-triggered Low-Passed Cochlear Filter average Spike-triggered Spectro-Temporal Envelope average 6/a6 6/a6 6/a6 ms Cross-section at Best Frequency ms 6/a6

An STRF can fall into one of three categories: Non-separable: The transfer function is an arbitrary (complex-conjugate symmetric) function of ripple frequency and ripple velocity. Quadrant separable: The transfer function within each quadrant is a product of a function of ripple frequency and a function of ripple velocity. The envelope of the STRF is a simple product of a function of spectrum and a function of time. Quadrant Separability IR(t) RF(x) Spectro-Temporal Domain x t x x Non-separable Quadrant Separable t t D Fourier Transform D Fourier Transform Fourier Domain w w left-moving right-moving Ω Ω Fully separable: The transfer function is the product of a function of ripple freqency and ripple velocity everywhere. The resulting STRF is a product of a function of spectrum and a function of time. IR(t) RF(x) t x x Fully Separable x o t D Fourier Transform w Ω

Separability in STRFs Examples of Experimentally obtained STRFs 8 8 frequency (khz).5.5 9/b.5.5 /a Separable 6 8 frequency (khz) 8.5 time (ms) /a.5.5 time (ms) 7/8a Non-separable Note the variety of spectral and temporal behaviors

Cortical Filter Model Response fields in AI have characteristic shapes both spectrally and temporally. AI cells respond well only to a small set of moving ripples around a particular spectral peak spacing and velocity. We find cortical cells with all center frequencies, spectral symmetries, bandwidths, latencies and temporal impulse response symmetries. Therefore AI decomposes the input spectrum into different spectrally and temporally tuned channels. Equivalently, a population of cells, tuned around different moving ripple parameters, can effectively represent the input spectrum at multiple scales. Frequency (octaves).5 -.5 -.5 -.5 -.5 -.5 -.5 -.5 - Theoretical ripple filters used to generate a cortical representation.5 cyc/oct, 3 Hz.5 cyc/oct, 8 Hz.5 cyc/oct, -8 Hz.5 cyc/oct, -3 Hz Time cyc/oct, 3 Hz cyc/oct, 8 Hz cyc/oct, -8 Hz cyc/oct, -3 Hz Time +

The Cortical Representation Spectrally narrow cells pick out the fine features of the spectral profile, whereas broadly tuned cells pick out the coarse outlines of the spectrum. Similarly, dynamically sluggish cells will respond to the slow changes in the spectrum, whereas fast cells respond to rapid onsets and transitions. In this manner, AI is able to encode multiple views of the same dynamic spectrum.. 8 Hz cyc/oct, 8 Hz cyc/oct, 8 Hz..5.5.. Auditory Spectrogram cyc/oct cyc/oct Frequency (khz)..5.5. Come home right away. -8 Hz cyc/oct, -8 Hz cyc/oct, -8 Hz..5.5. 3 5 6 7 8 9 3 5 6 7 8 9 3 5 6 7 8 9

Selected References Spectro-Temporal Averaging Methods Calhoun BM, Miller RL, Wong JC, and Young ED, th International Symposium on Hearing (997). Eggermont JJ, Hearing Research 66 (993) 77-. Dynamical Transfer Function papers Kowalski NA, Depireux DA and Shamma SA, J.Neurophys. 76 (5) (996) 353-353, and 35-353. Depireux DA, Simon JZ and Shamma SA, Comments in Theoretical Biology (997). Stationary Transfer Function papers Shamma SA, Versnel H and Kowalski NA, J. Auditory Neuroscience () (995) 33-5, and 55-7, and 7-85. Schreiner CE and Calhoun BM. Auditory Neurosci., (99) 39-6. Related analysis techniques and models Wang K and Shamma SA, IEEE Trans. on Speech and Audio (3) (99) - 35, and 3() (995) 38-395. Shamma SA, Fleshman JW, Wiser PR and Versnel H, J. Neurophys 69() (993) 367-383.