Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
|
|
- Gregory Andrews
- 6 years ago
- Views:
Transcription
1 Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University
2 Outline of presentation Introduction Human performance Reverberation effects On pitch On onset/offset On binaural cues Monaural enhancement of reverberant signal Binaural segregation of reverberant signal Discussion and summary 2
3 Reverberation as linear transmission system x () t h( τ )( s t τ ) = dτ x(t): reverberant signal; s(t): source signal h(τ): room impulse response function Late reflections Early reflections Time (ms) 3
4 Reverberation and speech quality Room reverberation causes two distinct perceptual effects on speech quality Early reflections lead to coloration or spectral deviation, determined by signal-to-reverberant energy ratio; it also boasts loudness Late reflections (long-term reverberation) smear the time-frequency components of speech, and are characterized by the reverberation time (T6) 4
5 Human performance Though speech perception in quiet seems robust to reverberation, speech intelligibility in noise suffers in the presence of reverberation (Plomp 76; Culling et al. 3) Culling et al. showed that reverberation (T6 =.4 s) produces 5 db increase in speech reception threshold when naturally intonated speech is presented together with a competing talker Hearing impaired listeners are particularly susceptible to reverberation The binaural advantage for speech perception in noise is diminished by reverberation The Culling et al. study found no advantage at all Culling et al. (23) 5
6 Human performance Darwin and Hukin (2) compared reverberation effects on spatial, pitch, and vocaltract size cues for sequential organization and found that ITD cues are seriously impaired by reverberation Pitch cues (F trajectory) are more resistant A combination of pitch and vocal-tract size cues is very resistant to reverberation 6
7 Outline of presentation Introduction Human performance Reverberation effects On pitch On onset/offset On binaural cues Monaural enhancement of reverberant signal Binaural segregation of reverberant signal Discussion and summary 7
8 Pitch tracking of a single utterance 5 Clean Male Utterance 5 Clean Female Utterance Frequency Reverberant Female Utterance (T 6 =.3 s) Frequency Frequency Reverberant Male Utterance (T 6 =.3 s) Frequency Pitch (time lag) Pitch Tracking Clean Reverberant. 2.5 Time (sec) Pitch (time lag) Pitch Tracking 6 Clean Reverberant Time (sec) Pitch is pretty robust to reverberation, especially for slowly changing pitch tracks and long voiced speech segments Noticeable artifacts: elongated pitch tracks 8
9 Pitch tracking of two utterances 5 Reverberant Mixture (T 6 =.3 s) Frequency (Hz) Pitch Tracking Pitch (time lag) 12 8 One-source tracking Two-source tracking Time (sec) Multipitch tracking using the Wu et al. algorithm (23). Even with multiple reverberant sources, pitch tracking works reasonably well 9
10 Reverberation effects on harmonic structure From Darwin and Hukin (2). The utterance is Could you please write the word bead down now. T6 =.4 s Primarily in the low-frequency range 1
11 Implications on pitch-based grouping Pitch (time lag) Pitch (time lag) Pitch (time lag) Histogram of selected peaks (Clean). 1.5 Histogram of selected peaks (T 6 =.3 s). 1.5 Pitch Tracks 1 Clean 7 Reverberant Time (sec) Smearing of harmonic structure is worse in the high-frequency range. The figure shows the histogram of peak positions that are nearest to the detected pitch periods for frequencies greater than 8 Hz. This smearing effect would degrade the performance of pitch-based grouping. 11
12 Reverberation effects on temporal envelope Amplitude (db) Amplitude (db) (a) Smoothed temporal envelope of anechoic utterance (b) Smoothed temporal envelope of reverberant utterance Time (s) Response envelope of a gammatone filter centered near 1 khz to the utterance That noise problem grows more annoying each day. (a) T6 = and (b) T6 =.3 s Amplitude modulation (AM) depth is reduced, but the AM pattern is reasonably maintained 12
13 Onset and offset detection 8 (a) Anechoic utterance Frequency (Hz) (b) Reverberant utterance Frequency (Hz) Time (s) Cochleogram representation. Red/black marks indicate detected onsets/offsets. The utterance: That noise problem grows more annoying each day. 13
14 Reverberation effects on onset/offset detection Both the times and strengths of onsets and offsets are affected Onset times are slightly shifted Onsets of weak phones (e.g. unvoiced stops) are smeared Offset times are shifted forward (delayed) Reverberation introduces spurious offsets 14
15 Reverberation effects on binaural cues: ITD Shinn-Cunningham and Kawakyu (23) showed that the responses of a neural model to ITD (interaural time difference) are poor indicators of source azimuth in the presence of reverberation Integration over time enhances the estimation robustness 15
16 ITD estimation in time-frequency (T-F) units Channel Center Frequency (Hz) 5 AZIMUTH HISTOGRAM: Target source at 45, anechoic Azimuth (degrees) Azimuth (degrees) -9 Across Frequency Integration (Clean) Channel Center Frequency (Hz) 5 AZIMUTH HISTOGRAM: Target source at 45, T 6 =.3 s Azimuth (degrees) Azimuth (degrees) Across Frequency Integration (T 6 =.3 s) Time (sec) ITD estimation in individual T-F units using a cross-correlation model (Roman et al. 3). The input is natural speech. The distribution of local azimuth estimates is much noisier in the reverberant condition 16
17 Interaural intensity difference estimation in T-F units Channel Center Frequency (Hz) Channel Center Frequency (Hz) IID (db) 5 IID HISTOGRAM: Target source at 45, anechoic IID HISTOGRAM: Target source at 45, T 6 =.3 s IID (db) IID (db) Mean IID for one utterance Clean Reverberant -2 5 Channel Center Frequency (Hz) The distribution of IID (interaural intensity difference) is also much noisier in reverberation, and the mean IID values lose characteristics 17
18 Outline of presentation Introduction Human performance Reverberation effects On pitch On onset/offset On binaural cues Monaural enhancement of reverberant signal Binaural segregation of reverberant signal Discussion and summary 18
19 A two-stage enhancement algorithm (Wu 3) Identify an inverse filter to reduce coloration distortion by maximizing kurtosis of LPC residue (Gillespie et al. 1) Clean speech (kurtosis = 12.2) Reverberant speech (kurtosis = 3.6) Time (ms) Estimate and subtract the effects of long-term reverberation 19
20 Results of Wu s enhancement algorithm Original speech Reverberant speech Inverse-filtered speech Enhanced speech 2
21 Binaural segregation of reverberant speech Roman and Wang (24) proposed a figure-ground segregation strategy to identify the T-F units dominated by target using spatial information, without imposing restrictions on the number, location or content of interfering sources Basic idea First perform cancellation of reverberant target (with detected target location) using adaptive filtering Then label those T-F units that have been largely attenuated in the first stage since they are more likely to originate from the target location H 1 S+N 1 H 2 S+N 2 W - DFT MATRIX DFT MATRIX BINARY MASK 21
22 Segregation results An example with a target speaker at ο and 4 other interfering speakers at (-135 ο, -45 ο, 45 ο, 135 ο ) and T6 =.3 s 22
23 ASR results The segregation output is fed to a missing data recognizer (Cooke et al. 1) (a) 5 speaker configuration Baseline performance Estimated binary mask Ideal binary mask (b) Nonspeech intrusion: rock music at 45º 23
24 Summary and discussion Reverberation corrupts auditory cues Pitch estimation is relatively robust, but harmonic structure is smeared, particularly in high-frequency AM depth is reduced but the AM pattern is reasonably maintained Onset times, and especially offset times, are shifted; onset and offset synchrony is weakened Binaural cues become unreliable A two-stage monaural algorithm for reverberant speech enhancement A binaural algorithm for segregating reverberant speech Issues What is ground truth pitch for a reverberant signal? Dereverberation versus enhancement How to deal with both segregation and reverberation monaurally? 24
25 Acknowledgment N. Roman and G. Hu for performing some computer experiments Funding by AFOSR/AFRL and NSF 25
Monaural and Binaural Speech Separation
Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as
More informationPitch-based monaural segregation of reverberant speech
Pitch-based monaural segregation of reverberant speech Nicoleta Roman a Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210 DeLiang Wang b Department of Computer
More informationPitch-Based Segregation of Reverberant Speech
Technical Report OSU-CISRC-4/5-TR22 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 Ftp site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/25
More informationThe psychoacoustics of reverberation
The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control
More information1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE
1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural
More informationA classification-based cocktail-party processor
A classification-based cocktail-party processor Nicoleta Roman, DeLiang Wang Department of Computer and Information Science and Center for Cognitive Science The Ohio State University Columbus, OH 43, USA
More informationBinaural Hearing. Reading: Yost Ch. 12
Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,
More informationRecurrent Timing Neural Networks for Joint F0-Localisation Estimation
Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Stuart N. Wrigley and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 211 Portobello Street, Sheffield
More informationPerception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.
Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions
More informationIN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract
More informationA Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation
Technical Report OSU-CISRC-1/8-TR5 Department of Computer Science and Engineering The Ohio State University Columbus, OH 431-177 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/8
More informationBinaural segregation in multisource reverberant environments
Binaural segregation in multisource reverberant environments Nicoleta Roman a Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210 Soundararajan Srinivasan b
More informationBinaural Segregation in Multisource Reverberant Environments
T e c h n i c a l R e p o r t O S U - C I S R C - 9 / 0 5 - T R 6 0 D e p a r t m e n t o f C o m p u t e r S c i e n c e a n d E n g i n e e r i n g T h e O h i o S t a t e U n i v e r s i t y C o l u
More informationRobust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:
Robust Automatic Speech Recognition In the 21 st Century Richard Stern (with Alex Acero, Yu-Hsiang Chiu, Evandro Gouvêa, Chanwoo Kim, Kshitiz Kumar, Amir Moghimi, Pedro Moreno, Hyung-Min Park, Bhiksha
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationHCS 7367 Speech Perception
HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based
More informationDominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation
Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Shibani.H 1, Lekshmi M S 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationBIOLOGICALLY INSPIRED BINAURAL ANALOGUE SIGNAL PROCESSING
Brain Inspired Cognitive Systems August 29 September 1, 2004 University of Stirling, Scotland, UK BIOLOGICALLY INSPIRED BINAURAL ANALOGUE SIGNAL PROCESSING Natasha Chia and Steve Collins University of
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationRobust Speech Recognition Based on Binaural Auditory Processing
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer
More informationRobust Speech Recognition Based on Binaural Auditory Processing
Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh,
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationEnvelopment and Small Room Acoustics
Envelopment and Small Room Acoustics David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 Copyright 9/21/00 by David Griesinger Preview of results Loudness isn t everything! At least two additional perceptions:
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationINTEGRATING MONAURAL AND BINAURAL CUES FOR SOUND LOCALIZATION AND SEGREGATION IN REVERBERANT ENVIRONMENTS
INTEGRATING MONAURAL AND BINAURAL CUES FOR SOUND LOCALIZATION AND SEGREGATION IN REVERBERANT ENVIRONMENTS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy
More informationIS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY?
IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY? G. Leembruggen Acoustic Directions, Sydney Australia 1 INTRODUCTION 1.1 Motivation for the Work With over fifteen
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationA CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL
9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationA Neural Oscillator Sound Separator for Missing Data Speech Recognition
A Neural Oscillator Sound Separator for Missing Data Speech Recognition Guy J. Brown and Jon Barker Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationA cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking
A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking Courtney C. Lane 1, Norbert Kopco 2, Bertrand Delgutte 1, Barbara G. Shinn- Cunningham
More informationExploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues
The Technology of Binaural Listening & Understanding: Paper ICA216-445 Exploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues G. Christopher Stecker
More informationLecture 14: Source Separation
ELEN E896 MUSIC SIGNAL PROCESSING Lecture 1: Source Separation 1. Sources, Mixtures, & Perception. Spatial Filtering 3. Time-Frequency Masking. Model-Based Separation Dan Ellis Dept. Electrical Engineering,
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationChapter IV THEORY OF CELP CODING
Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,
More informationThe Human Auditory System
medial geniculate nucleus primary auditory cortex inferior colliculus cochlea superior olivary complex The Human Auditory System Prominent Features of Binaural Hearing Localization Formation of positions
More informationAUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)
AUDL GS08/GAV1 Auditory Perception Envelope and temporal fine structure (TFS) Envelope and TFS arise from a method of decomposing waveforms The classic decomposition of waveforms Spectral analysis... Decomposes
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationBinaural reverberant Speech separation based on deep neural networks
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Binaural reverberant Speech separation based on deep neural networks Xueliang Zhang 1, DeLiang Wang 2,3 1 Department of Computer Science, Inner Mongolia
More informationSingle-Microphone Speech Dereverberation based on Multiple-Step Linear Predictive Inverse Filtering and Spectral Subtraction
Single-Microphone Speech Dereverberation based on Multiple-Step Linear Predictive Inverse Filtering and Spectral Subtraction Ali Baghaki A Thesis in The Department of Electrical and Computer Engineering
More informationModulation Domain Spectral Subtraction for Speech Enhancement
Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9
More informationImproving reverberant speech separation with binaural cues using temporal context and convolutional neural networks
Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationMINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE
MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE Scott Rickard, Conor Fearon University College Dublin, Dublin, Ireland {scott.rickard,conor.fearon}@ee.ucd.ie Radu Balan, Justinian Rosca Siemens
More informationIMPROVED COCKTAIL-PARTY PROCESSING
IMPROVED COCKTAIL-PARTY PROCESSING Alexis Favrot, Markus Erne Scopein Research Aarau, Switzerland postmaster@scopein.ch Christof Faller Audiovisual Communications Laboratory, LCAV Swiss Institute of Technology
More informationPsychoacoustic Cues in Room Size Perception
Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany 6084 This convention paper has been reproduced from the author s advance manuscript, without editing,
More informationROBUST SPEECH RECOGNITION. Richard Stern
ROBUST SPEECH RECOGNITION Richard Stern Robust Speech Recognition Group Mellon University Telephone: (412) 268-2535 Fax: (412) 268-3890 rms@cs.cmu.edu http://www.cs.cmu.edu/~rms Short Course at Universidad
More informationSimultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array
2012 2nd International Conference on Computer Design and Engineering (ICCDE 2012) IPCSIT vol. 49 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V49.14 Simultaneous Recognition of Speech
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationStefan Launer, Lyon, January 2011 Phonak AG, Stäfa, CH
State of art and Challenges in Improving Speech Intelligibility in Hearing Impaired People Stefan Launer, Lyon, January 2011 Phonak AG, Stäfa, CH Content Phonak Stefan Launer, Speech in Noise Workshop,
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationREpeating Pattern Extraction Technique (REPET)
REpeating Pattern Extraction Technique (REPET) EECS 32: Machine Perception of Music & Audio Zafar RAFII, Spring 22 Repetition Repetition is a fundamental element in generating and perceiving structure
More informationMachine recognition of speech trained on data from New Jersey Labs
Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation
More informationBoldt, Jesper Bünsow; Kjems, Ulrik; Pedersen, Michael Syskind; Lunner, Thomas; Wang, DeLiang
Downloaded from vbn.aau.dk on: januar 14, 19 Aalborg Universitet Estimation of the Ideal Binary Mask using Directional Systems Boldt, Jesper Bünsow; Kjems, Ulrik; Pedersen, Michael Syskind; Lunner, Thomas;
More informationTwo-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling
Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Mikko Parviainen 1 and Tuomas Virtanen 2 Institute of Signal Processing Tampere University
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationAN547 - Why you need high performance, ultra-high SNR MEMS microphones
AN547 AN547 - Why you need high performance, ultra-high SNR MEMS Table of contents 1 Abstract................................................................................1 2 Signal to Noise Ratio (SNR)..............................................................2
More informationAMAIN cause of speech degradation in practically all listening
774 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 A Two-Stage Algorithm for One-Microphone Reverberant Speech Enhancement Mingyang Wu, Member, IEEE, and DeLiang
More informationComplex Sounds. Reading: Yost Ch. 4
Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency
More informationIII. Publication III. c 2005 Toni Hirvonen.
III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationAuditory Segmentation Based on Onset and Offset Analysis
Technical Report: OSU-CISRC-1/-TR4 Technical Report: OSU-CISRC-1/-TR4 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 Ftp site: ftp.cse.ohio-state.edu Login:
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationUsing Vision to Improve Sound Source Separation
Using Vision to Improve Sound Source Separation Yukiko Nakagawa y, Hiroshi G. Okuno y, and Hiroaki Kitano yz ykitano Symbiotic Systems Project ERATO, Japan Science and Technology Corp. Mansion 31 Suite
More informationNAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test
NAME STUDENT # ELEC 484 Audio Signal Processing Midterm Exam July 2008 CLOSED BOOK EXAM Time 1 hour Listening test Choose one of the digital audio effects for each sound example. Put only ONE mark in each
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationAuditory Localization
Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception
More informationEnvelope Modulation Spectrum (EMS)
Envelope Modulation Spectrum (EMS) The Envelope Modulation Spectrum (EMS) is a representation of the slow amplitude modulations in a signal and the distribution of energy in the amplitude fluctuations
More informationUsing Energy Difference for Speech Separation of Dual-microphone Close-talk System
ensors & Transducers, Vol. 1, pecial Issue, May 013, pp. 1-17 ensors & Transducers 013 by IF http://www.sensorsportal.com Using Energy Difference for peech eparation of Dual-microphone Close-talk ystem
More informationRobust speech recognition using temporal masking and thresholding algorithm
Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,
More informationSPEECH INTELLIGIBILITY, SPATIAL UNMASKING, AND REALISM IN REVERBERANT SPATIAL AUDITORY DISPLAYS. Barbara Shinn-Cunningham
SPEECH INELLIGIBILIY, SPAIAL UNMASKING, AND REALISM IN REVERBERAN SPAIAL AUDIORY DISPLAYS Barbara Shinn-Cunningham Boston University Hearing Research Center, Departments of Cognitive and Neural Systems
More informationA CASA-Based System for Long-Term SNR Estimation Arun Narayanan, Student Member, IEEE, and DeLiang Wang, Fellow, IEEE
2518 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 9, NOVEMBER 2012 A CASA-Based System for Long-Term SNR Estimation Arun Narayanan, Student Member, IEEE, and DeLiang Wang,
More informationAudio Quality Terminology
Audio Quality Terminology ABSTRACT The terms described herein relate to audio quality artifacts. The intent of this document is to ensure Avaya customers, business partners and services teams engage in
More informationURBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Human perception of sound and space ITD, IID,
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationChapter 2. Speech Enhancement Techniques: State of Art
Speech Enhancement Techniques: State of Art 11 The speech signal degradations may be attributed to various factors; viz. disorders in production organs, different sensors (microphones) and their placement
More informationEC209 - Improving Signal-To-Noise Ratio (SNR) for Optimizing Repeatable Auditory Brainstem Responses
EC209 - Improving Signal-To-Noise Ratio (SNR) for Optimizing Repeatable Auditory Brainstem Responses Aaron Steinman, Ph.D. Director of Research, Vivosonic Inc. aaron.steinman@vivosonic.com 1 Outline Why
More informationAUDITORY ILLUSIONS & LAB REPORT FORM
01/02 Illusions - 1 AUDITORY ILLUSIONS & LAB REPORT FORM NAME: DATE: PARTNER(S): The objective of this experiment is: To understand concepts such as beats, localization, masking, and musical effects. APPARATUS:
More informationBINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH Anjali Menon 1, Chanwoo Kim 2, Umpei Kurokawa 1, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University,
More informationAuditory System For a Mobile Robot
Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Université de Sherbrooke, Québec, Canada Jean-Marc.Valin@USherbrooke.ca Motivations
More informationACOUSTICS AND PERCEPTION OF SOUND IN EVERYDAY ENVIRONMENTS. Barbara Shinn-Cunningham
ACOUSTICS AND PERCEPTION OF SOUND IN EVERYDAY ENVIRONMENTS Barbara Shinn-Cunningham Boston University 677 Beacon St. Boston, MA 02215 shinn@cns.bu.edu ABSTRACT One aspect of hearing that has received relatively
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationFinal Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015
Final Exam Study Guide: 15-322 Introduction to Computer Music Course Staff April 24, 2015 This document is intended to help you identify and master the main concepts of 15-322, which is also what we intend
More informationSNR Estimation Based on Amplitude Modulation Analysis With Applications to Noise Suppression
184 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 3, MAY 2003 SNR Estimation Based on Amplitude Modulation Analysis With Applications to Noise Suppression Jürgen Tchorz and Birger Kollmeier
More informationPsychology of Language
PSYCH 150 / LIN 155 UCI COGNITIVE SCIENCES syn lab Psychology of Language Prof. Jon Sprouse 01.10.13: The Mental Representation of Speech Sounds 1 A logical organization For clarity s sake, we ll organize
More informationA generalized framework for binaural spectral subtraction dereverberation
A generalized framework for binaural spectral subtraction dereverberation Alexandros Tsilfidis, Eleftheria Georganti, John Mourjopoulos Audio and Acoustic Technology Group, Department of Electrical and
More informationComputational Perception. Sound localization 2
Computational Perception 15-485/785 January 22, 2008 Sound localization 2 Last lecture sound propagation: reflection, diffraction, shadowing sound intensity (db) defining computational problems sound lateralization
More informationAnalytical Analysis of Disturbed Radio Broadcast
th International Workshop on Perceptual Quality of Systems (PQS 0) - September 0, Vienna, Austria Analysis of Disturbed Radio Broadcast Jan Reimes, Marc Lepage, Frank Kettler Jörg Zerlik, Frank Homann,
More informationEstimates based on a model of room acoustics. Arthur Boothroyd 2003 Used and distributed with permission for 2003 ACCESS conference
Estimates based on a model of room acoustics Arthur Boothroyd 2003 Used and distributed with permission for 2003 ACCESS conference Basic model Direct signal (level falls by 6 db per doubling of distance)
More information