Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants

Similar documents
Introduction to cochlear implants Philipos C. Loizou Figure Captions

Predicting the Intelligibility of Vocoded Speech

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

HCS 7367 Speech Perception

REVISED. Minimum spectral contrast needed for vowel identification by normal hearing and cochlear implant listeners

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Contribution of frequency modulation to speech recognition in noise a)

Measuring the critical band for speech a)

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Lab 15c: Cochlear Implant Simulation with a Filter Bank

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

Noise Reduction in Cochlear Implant using Empirical Mode Decomposition

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Physiological evidence for auditory modulation filterbanks: Cortical responses to concurrent modulations

Distortion products and the perceived pitch of harmonic complex tones

The role of fine structure in bilateral cochlear implantation

On the significance of phase in the short term Fourier spectrum for speech intelligibility

COCHLEAR implants (CIs) have been implanted in more

Mel Spectrum Analysis of Speech Recognition using Single Microphone

A new sound coding strategy for suppressing noise in cochlear implants

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Effect of bandwidth extension to telephone speech recognition in cochlear implant users

A102 Signals and Systems for Hearing and Speech: Final exam answers

Imagine the cochlea unrolled

Spectral modulation detection and vowel and consonant identification in normal hearing and cochlear implant listeners

Research Article A Sound Processor for Cochlear Implant Using a Simple Dual Path Nonlinear Model of Basilar Membrane

3.2 Measuring Frequency Response Of Low-Pass Filter :

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Single- and Multi-Channel Modulation Detection in Cochlear Implant Users

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

COM325 Computer Speech and Hearing

The Modulation Transfer Function for Speech Intelligibility

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

Cochlear implants (CIs), or bionic

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Synthesis using Mel-Cepstral Coefficient Feature

APPLICATIONS OF DSP OBJECTIVES

Fei Chen and Philipos C. Loizou a) Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas 75083

Auditory modelling for speech processing in the perceptual domain

Perception of amplitude modulation with single or multiple channels in cochlear implant users Galvin, John

I. INTRODUCTION J. Acoust. Soc. Am. 110 (3), Pt. 1, Sep /2001/110(3)/1628/13/$ Acoustical Society of America

Enhancing 3D Audio Using Blind Bandwidth Extension

Modulation analysis in ArtemiS SUITE 1

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

Music 171: Amplitude Modulation

III. Publication III. c 2005 Toni Hirvonen.

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Realtime Software Synthesis for Psychoacoustic Experiments David S. Sullivan Jr., Stephan Moore, and Ichiro Fujinaga

Journal of the Acoustical Society of America 88

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

The psychoacoustics of reverberation

The role of intrinsic masker fluctuations on the spectral spread of masking

Spectral and temporal processing in the human auditory system

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

Complex Sounds. Reading: Yost Ch. 4

Speech Synthesis; Pitch Detection and Vocoders

(12) United States Patent (10) Patent No.: US 7,937,155 B1

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Sampling and Reconstruction

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Assessment Schedule 2015 Music: Demonstrate aural and theoretical skills through transcription (91093)

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

From Encoding Sound to Encoding Touch

Estimating critical bandwidths of temporal sensitivity to low-frequency amplitude modulation

CMPT 368: Lecture 4 Amplitude Modulation (AM) Synthesis

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

6.101 Project Proposal April 9, 2014 Kayla Esquivel and Jason Yang. General Outline

Machine recognition of speech trained on data from New Jersey Labs

6.551j/HST.714j Acoustics of Speech and Hearing: Exam 2

Adaptive Filters Application of Linear Prediction

Principles of Musical Acoustics

Modelling the sensation of fluctuation strength

Massachusetts Institute of Technology Dept. of Electrical Engineering and Computer Science Fall Semester, Introduction to EECS 2

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

On the Design of a Flexible Stimulator for Animal Studies in Auditory Prostheses

Outline. Communications Engineering 1

Auditory Stream Segregation Using Cochlear Implant Simulations

MUS 302 ENGINEERING SECTION

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

Binaural Hearing. Reading: Yost Ch. 12

Amplitude Modulation Chapter 2. Modulation process

EE482: Digital Signal Processing Applications

Transcription:

Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants Kalyan S. Kasturi and Philipos C. Loizou Dept. of Electrical Engineering The University of Texas at Dallas Research supported by NIDCD/NIH (R01 DC 3421)

Introduction Several studies reported that cochlear implant listeners perform poorly (near chance) on melody identification tasks. This is partly due to the fact that current implant processors convey primarily envelope information and no fine-structure cues. Most devices use a logarithmic filter spacing, which is appropriate for speech, but not for music. Unlike speech, music is based on a highly-structured semitone scale. We therefore hypothesize that a filter spacing scheme that corresponds to a musical semitone structure might better capture pitch information for music perception (Exp 1).

Introduction (cont ed) A corollary to the above hypothesis is that the signal bandwidth might be critical for melody recognition as it affects the number of filters that fall within the low frequency region (Exp 2).

Experiment 1 Two different filter spacings were investigated: logarithmic and semitone-spaced. Semitone-spacing We varied the number of channels from 2 to 12 with the following filter bandwidths: 12 channels - each filter had a bandwidth of 1 semitone 6 channels - each filter had a bandwidth of 2 semitones 4 channels each filter had a bandwidth of 3 semitones 2 channels each filter had a bandwidth of 6 semitones Logarithmic spacing (currently used by commercial devices) Filters were logarithmically spaced. We varied the number of channels from 2 to 40.

Filter Spacing 20-4 khz Middle C 4-channel log spacing 12-channel semitone spacing 300 Hz 600 Hz 4-channel semitone spacing

Signal Processing Melodies were bandpass filtered into N channels using 6-th order Butterworth filters. The output of each channel was passed through a rectifier followed by a second-order Butterworth low-pass filter with cut-off frequency of 120 Hz to obtain the envelope of each channel. The envelope of each band-pass filter was modulated with white noise. Noise modulated envelopes were passed through synthesis filters that were essentially the same as the analysis filters. The outputs of all channels were summed up to obtain the synthesized melodies. Synthesized melodies were presented to 10 normalhearing subjects for identification in a closed-set format.

Melodies The melody test used thirty-four common melodies each consisting of sixteen isochronous notes as used by Hartmann [7]. Isochronous notes were used to remove the rhythm cues from the melodies. The notes were synthesized using samples of acoustic grand piano.

Results: Effect of filter spacing Percent correct 100 90 80 70 60 50 40 30 20 10 0 Log spacing Semitone 0 5 10 15 20 25 30 35 40 Number of channels

Analysis and Discussion Two-way ANOVA (repeated measures) indicated a significant effect of spectral resolution (number of channels), a significant effect of frequency spacing and a significant interaction (p<0.005). Semitone-spacing: Post-hoc tests (Fisher s LSD) showed that performance asymptoted (p>0.5) with 4 channels. Performance with 4 channels based on semitone filter spacing as good as performance with 12 channels based on logarithmic filter spacing. Conclusion: Filter spacing is extremely important in melody recognition.

Experiment 2 Investigated the effect of signal bandwidth on identification of melodies. Hypothesis: If a smaller signal bandwidth is used, then more filters would fall in the lowfrequency region and melody recognition should improve. Added one more condition in which the filters were logarithmically spaced within a smaller bandwidth spanning the range of 225-4500 Hz. Five normal-hearing listeners participated in this experiment.

Results: Effect of Bandwidth 100 90 Percent correct 80 70 60 50 40 30 20 10 0 Log- Large BW Semitone Log-Small BW 0 5 10 15 20 25 30 35 40 Number of channels

Analysis and Discussion Two-way ANOVA (repeated measures) indicated a significant effect of spectral resolution (number of channels), a significant effect of bandwidth and a significant interaction (p<0.005). Post-hoc tests (Fisher s LSD) indicated that: 4 chan: performance with small bandwidth > large bandwidth (p=0.013) 6 chan: semitone spacing > small bandwidth (p=0.029) small bandwidth > large bandwidth (p<0.005) For small number of channels, using a small bandwidth brings significant benefits on melody recognition. Semitone spacing remains superior.

Experiment 3 In cochlear implants, acoustic information is rarely presented in the correct place in the cochlea due to shallow insertion depths. CI patients typically receive frequency up-shifted stimuli. With speech, it is known that patients can tolerate large amounts of shift. The effect of frequency up-shifting on melody identification has not been thoroughly investigated. In the present experiment, we investigate the upshifting effect by using frequency transposed melodies i.e., melodies that are transposed to higher frequencies (1 and 3 khz).

Experiment 3: Transposed Stimuli The transposed stimuli preserve the temporal structure of the signal and can thus be used to assess the importance of presenting the music stimuli at the correct tonotopic place in the cochlea (Oxenham et al., Proc. Nat. Proc. Sc., 2004). More specifically, the present experiment will examine whether pitch perception can be accounted for by a purely temporal code or whether a tonotopic representation of frequency (place code) is necessary. The transposed stimuli were generated by multiplying the original 12-channel stimuli (semitone spacing) by a high-frequency sinusoidal carriers at 1 and 3 khz.

Results: Frequency transposed melodies 100 Percent correct 80 60 40 20 0 12-Chan Semitone 1 khz 3 khz Carrier frequency

Analysis and Discussion ANOVA (repeated measures) indicated a significant effect [F(2,18)=21.2, p<0.005] of correct tonotopic representation on melody recognition. Post hoc tests (Fisher s LSD) indicated that performance with 1 khz carrier was significantly (p=0.005) lower than baseline, and performance with 3 khz carrier was significantly (p=0.003) lower than performance with 1 khz carrier. Correct tonotopic representation is critically important for complex pitch perception.

Conclusions The semitone-based filter spacing yielded the best performance among all the filter spacings investigated. Nearly perfect melody recognition (~98%) was achieved using only four channels. The distribution of filters in the low-frequency region is very important for melody recognition. Filters based on a smaller signal bandwidth yielded significantly higher scores. Correct tonotopic representation is necessary for complex pitch perception melody recognition.

Discussion This shows that a finer filter spacing around the melody spectrum would better capture the fine structure cues and hence result in better melody recognition. As modulation frequency was increased melody recognition dropped. This indicates that preserving the place of stimulation is important. Upshifting the synthesized melodies with semitone spacing using four channels resulted in nearly perfect recognition and thus upshifting with a factor of 6.5mm did not degrade the performance.

Bibliography 1. Gfeller, K. and Lansing, C. R. (1991). Melodic, rhythmic, and timbral perception of adult cochlear implant users, Journal of Speech and Hearing Research., 34, 916-920. 2. Schulz, E. and Kerber, M. (1994). Music perception with the MED-EL implants, Advances in cochlear implants, 326-332. 3. Loizou, P. (1998). Mimicking the human ear: An overview of signal processing techniques for converting sound to electrical signals in cochlear implants, IEEE Signal Process. Mag., 15(5), 101-130.

Bibliography 4. Lobo, A., Toledo, F., Loizou, P. and Dorman, M. (2002). Effect of envelope low-pass filtering on melody recognition, 33rd Neural Prosthesis Workshop, Bethesda, MD. 5. Shannon, R. V., Zeng, F.-G., Kamath, V., Wygonski, J., and Ekelid, M. (1995). Speech recognition with primarily temporal cues, Science, 270, 303-304. 6. Kong, Y.-Y., Cruz, R., Jones, J. A., and Zeng, F.-G. (2004). Music perception with temporal cues in acoustic and electric hearing, Ear and Hearing, 25(2), 173-185.

Bibliography 7. Hartmann, W. M. and Johnson, D., (1991). Stream segregation and peripheral channeling, Music perception, 9(2), 155-184.