Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress!
|
|
- Jeremy Osborne
- 5 years ago
- Views:
Transcription
1 Applying Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress! Richard Stern (with Chanwoo Kim, Yu-Hsiang Chiu, and others) Department of Electrical and Computer Engineering and Language Technologies Institute Carnegie Mellon University Pittsburgh, Pennsylvania Telephone: ; FAX: Frederick Jelinek Memorial Workshop on Meaning Representations in Language and Speech Processing Prague, Czech Republic July 16, 2014
2 Introduction auditory processing and automatic speech recognition n I was originally trained in auditory perception, and my original work was in binaural hearing n Over the past years, I have been spending the bulk of my time trying to improve the accuracy of automatic speech recognition systems in difficult acoustical environments n In this talk I would like to discuss some of the ways in my group (and many others) have been attempting to apply knowledge of auditory perception to improve ASR accuracy Comment: approaches can be more or less faithful to physiology and psychophysics Slide 2
3 The big questions. n How can knowledge of auditory physiology and perception improve speech recognition accuracy? n Can speech recognition results tell us anything we don t already know about auditory processing? n What aspects of the processing are most valuable for robust feature extraction? Slide 3
4 Two historical notes n Everything is changing with deep learning Is there a role for traditional robust speech technologies? n Knowledge-based versus statistically-based processing Slide 4
5 So what I will do is. n Briefly review some of the major physiological and psychophysical results that motivate the models n Briefly review and discuss the major classical auditory models of the 1980s Seneff, Lyon, and Ghitza n Review some of the major new trends in today s models n Talk about some representative issues that have driven work as of late at CMU and what we have learned from them Slide 5
6 Speech recognition as pattern classification Speech features Utterance hypotheses Feature extraction Decision making! procedure n Major functional components: Signal processing to extract features from speech waveforms Comparison of features to pre-stored representations n Important design choices: Choice of features Specific method of comparing features to stored representations Slide 6
7 Default signal processing: Mel frequency cepstral coefficients (MFCCs) input speech Multiply by Brief Window Fourier Transform Magnitude Triangular Weighting Log Inverse Fourier Transform MFCCs Comment: 20-ms time slices are modeled by smoothed spectra, with attention paid to auditory frequency selectivity Slide 7
8 What the speech recognizer sees. An original spectrogram: Spectrum recovered from MFCC: Slide 8
9 Comments on the MFCC representation n It s very blurry compared to a wideband spectrogram! n Aspects of auditory processing represented: Frequency selectivity and spectral bandwidth (but using a constant analysis window duration!)» Wavelet schemes exploit time-frequency resolution better Nonlinear amplitude response n Aspects of auditory processing NOT represented: Detailed timing structure Lateral suppression Enhancement of temporal contrast Other auditory nonlinearities Slide 9
10 Basic auditory anatomy n Structures involved in auditory processing: Slide 10
11 Excitation along the basilar membrane (courtesy James Hudspeth, HHMI) Slide 11
12 Central auditory pathways n There is a lot going on! n For the most part, we only consider the response of the auditory nerve It is in series with everything else Slide 12
13 Transient response of auditory-nerve fibers n Histograms of response to tone bursts (Kiang et al., 1965): Comment: Onsets and offsets produce overshoot Slide 13
14 Frequency response of auditory-nerve fibers: tuning curves n Threshold level for auditory-nerve response to tones: n Note dependence of bandwidth on center frequency and asymmetry of response Slide 14
15 Typical response of auditory-nerve fibers as a function of stimulus level n Typical response of auditory-nerve fibers to tones as a function of intensity: n Comment: Saturation and limited dynamic range Slide 15
16 Synchronized auditory-nerve response to low-frequency tones n Comment: response remains synchronized over a wide range of intensities Slide 16
17 Comments on synchronized auditory response n Nerve fibers synchronize to fine structure at low frequencies, signal envelopes at high frequencies n Synchrony clearly important for auditory localization n Synchrony could be important for monaural processing of complex signals as well Slide 17
18 Lateral suppression in auditory processing n Auditory-nerve response to pairs of tones: n Comment: Lateral suppression enhances local contrast in frequency Slide 18
19 Auditory frequency selectivity: critical bands n Measurements of psychophysical filter bandwidth by various methods: n Comments: Bandwidth increases with center frequency Solid curve is Equivalent Rectangular Bandwidth (ERB) Slide 19
20 Three perceptual auditory frequency scales Bark scale: (DE) Bark( f ) =.01 f, 0 f < f +1.5, 500 f < ln( f ) 32.6, 1220 f Mel scale: (USA) Mel( f ) = 2595 log 10 (1 + f 700 ) ERB scale: (UK) ERB( f ) = 24.7(4.37 f +1) Slide 20
21 Comparison of normalized perceptual frequency scales n Bark scale (in blue), Mel scale (in red), and ERB scale (in green): 100 Relative perceptual scale Bark Mel ERB Frequency, Hz Slide 21
22 Perceptual masking of adjacent spectrotemporal components n Spectral masking: Intense signals at a given frequency mask adjacent frequencies (asymmetrically) n Temporal masking: Intense signals at a given frequency can mask successive input at that frequency (and to some extent before the masker occurs!) n These phenomena are an important part of the auditory models used in perceptual audio coding (such as in creating MP3 files) Slide 22
23 The loudness of sounds n Equal loudness contours (Fletcher-Munson curves): Slide 23
24 Summary of basic auditory physiology and perception n Major monaural physiological attributes: Frequency analysis in parallel channels Preservation of temporal fine structure Limited dynamic range in individual channels Enhancement of temporal contrast (at onsets and offsets) Enhancement of spectral contrast (at adjacent frequencies) n Most major physiological attributes have psychophysical correlates n Most physiological and psychophysical effects are not preserved in conventional representations for speech recognition Slide 24
25 Auditory models in the 1980s: the Seneff model n Overall model: Envelope Detector Mean-Rate Spectrum An early well-known auditory model Critical-Band Filter Bank Stage I Hair-Cell Model Stage II n Detail of Stage II: Synchrony Detector Stage III Synchrony Spectrum In addition to mean rate, used Generalized Synchrony Detector to extract synchrony Saturating Half-Wave Rectifier Short-term AGC Lowpass Filter Rapid AGC Slide 25
26 Auditory models in the 1980s: Ghitza s EIH model COCHLEAR FILTER-1 COCHLEAR FILTER-i LEVEL CROSSINGS L-1 INTERVAL HISTOGRAMS IH-1 L-7 IH-7 Estimated timing information from ensembles of zero crossings with different thresholds COCHLEAR FILTER-85 Slide 26
27 Auditory models in the 1980s: Lyon s auditory model n Single stage of the Lyon auditory model: HALF-WAVE RECTIFIER 1-kHZ LOWPASS GAIN A GAIN B GAIN C Target LIMIT H-B + H-C Lyon model included nonlinear compression, lateral suppression, temporal effects Also added correlograms (autocorrelation and crosscorrelation of model outputs) Slide 27
28 And one more Cohen s model (1989) 512-Point FFT Loudness normalization and transient enhancement novel for the time CRITICAL-BAND FILTERS LOUDNESS NORMALIZATION POWER-LAW COMPRESSION SHORT-TERM ADAPTATION Used successfully as part of many IBM systems Slide 28
29 The other standard approach: Perceptual Linear Prediction (PLP, Hermansky 90) n Comments: A pragmatic approach to auditory modeling Pre-emphasis, loudness normalization based on threshold of hearing RASTA enhancement provides cepstral normalization and modulation filtering Widely used with success today Slide 30
30 Auditory modeling was expensive: Computational complexity of Seneff model n Number of multiplications per ms of speech (from Ohshima and Stern, 1994): 25,000 20,000 15,000 10, Seneff Model Stage III Stage II Stage I LPC Slide 31
31 Summary: early auditory models n The models developed in the 1980s included: Realistic auditory filtering Realistic auditory nonlnearity Synchrony extraction Lateral suppression Higher order processing through auto-correlation and cross-correlation n Every system developer had his or her own idea of what was important Slide 32
32 Evaluation of early auditory models (Ohshima and Stern, 1994) n Not much quantitative evaluation actually performed n General trends of results: Physiological processing did not help much (if at all) for clean speech More substantial improvements observed for degraded input Benefits generally do not exceed what could be achieved with more prosaic approaches (e.g. CDCN/VTS in our case). Slide 33
33 Other reasons why work on auditory models subsided in the late 1980s n Failure to obtain a good statistical match between characteristics of features and speech recognition system Ameliorated by subsequent development of continuous HMMs n More pressing need to solve other basic speech recognition problems Slide 34
34 Renaissance in the 1990s! By the late 1990s, physiologically-motivated and perceptuallymotivated approaches to signal processing began to flourish Some major new trends. n Computation no longer such a limiting factor n Serious attention to temporal evolution n Attention to reverberation n Binaural processing n More effective and mature approaches to information fusion Slide 35
35 Peripheral auditory modeling at CMU 2004 now n Foci of activities: Representing synchrony The shape of the rate-intensity function Revisiting analysis duration Revisiting frequency resolution Onset enhancement Modulation filtering Binaural and polyaural techniques Auditory scene analysis: common frequency modulation Slide 36
36 Speech representation using mean rate n Representation of vowels by Young and Sachs using mean rate: n Mean rate representation does not preserve spectral information Slide 37
37 Speech representation using average localized synchrony rate n Representation of vowels by Young and Sachs using ALSR: Slide 38
38 Physiologically-motivated signal processing: the Zhang-Carney model of the periphery n We used the synapse output as the basis for further processing Slide 40
39 Physiologically-motivated signal processing: synchrony and mean-rate detection (Kim/Chiu 06) n Synchrony response is smeared across frequency to remove pitch effects n Higher frequencies represented by mean rate of firing n Synchrony and mean rate combined additively n Much more processing than MFCCs Slide 41
40 Comparing auditory processing with cepstral analysis: clean speech Original spectrogram MFCC reconstruction Auditory analysis Slide 42
41 Comparing auditory processing with cepstral analysis: 20-dB SNR Slide 43
42 Comparing auditory processing with cepstral analysis: 10-dB SNR Slide 44
43 Comparing auditory processing with cepstral analysis: 0-dB SNR Slide 45
44 Auditory processing is more effective than MFCCs at low SNRs, especially in white noise Accuracy in background noise: Accuracy in background music: n Curves are shifted by db (greater improvement than obtained with VTS or CDCN) [Results from Kim et al., Interspeech 2006] Slide 46
45 Do auditory models really need to be so complex? n Model of Zhang et al. 2001: A much simpler model: P(t) Gammatone Filters Nonlinear Rectifiers Lowpass Filters s(t) Slide 47
46 Comparing simple and complex auditory models n Comparing MFCC processing, a trivial (filter rectify compress) auditory model, and the full Carney-Zhang model (Chiu 2006): Complex Auditory Simple Auditory MFCC 100 WER (%) SNR (db) Slide 48
47 The nonlinearity seems to be the most important attribute of the Seneff model (Chiu 08) Envelope Detector Mean-Rate Spectrum Critical-Band Filter Bank Hair-Cell Model Synchrony Detector Synchrony Spectrum Stage I Stage II Stage III Saturating Half-Wave Rectifier Short-term AGC Lowpass Filter Rapid AGC Slide 49
48 Why the nonlinearity seems to help rate level function compression traditional log compression clean noisy Traditional log compression output 10 8 output Count input energy histogram of clean speech input energy histogram of 20 db white noise channel index Rate level function compression 9 clean 8 noisy t output Cou unt input energy channel index Slide 50
49 Impact of auditory nonlinearity (Chiu) Accuracy 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Learned nonlinearity Baseline nonlinearity MFCC clean 20 db 15 db 10 db 5 db 0 db -5 db (a) Test set A Slide 51
50 PNCC processing (Kim and Stern, 2010,2014) n A pragmatic implementation of a number of the principles described: Gammatone filterbanks Nonlinearity shaped to follow auditory processing Medium-time environmental compensation using nonlinearity cepstral highpass filtering in each channel Enhancement of envelope onsets Computationally efficient implementation Slide 52
51 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT Slide 53
52 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT H(! ) 2 RASTA-PLP Processing STFT Crit-Band Freq Wtg Nonlinear Compression RASTA Filter Nonlinear Expansion Power-Law Nonlinearity IFFT Compute LPCbased Cepstra Exponent 1/3 Slide 54
53 An integrated front end: power-normalized cepstral coefficients (PNCC, Kim 10) H(! ) 2 MFCC Processing STFT Triangular Freq Wtg Logarithmic Nonlinearity DCT H(! ) 2 RASTA-PLP Processing STFT Crit-Band Freq Wtg Nonlinear Compression RASTA Filter Nonlinear Expansion Power-Law Nonlinearity IFFT Compute LPCbased Cepstra Exponent 1/3 H(! ) 2 PNCC Processing STFT Gammatone Freq Wtg Noise Reduction Temporal Masking Frequency Weighting Power Normaliz.. Power-Law Nonlinearity DCT Exponent 1/15 Slide 55
54 The nonlinearity in PNCC processing (Kim) Rate (spikes / sec) Human Rate Intensity Model Cube Root Power Law Approximation MMSE Power Law Approximation Logarithmic Approximation Pressure (Pa) Rate (spikes / sec) Human Rate Intensity Model Cube Root Power Law Approximation MMSE Power Law Approximation Logarithmic Approximation Tone Level (db SPL) Slide 56
55 Frequency resolution n Examined several types of frequency resolution MFCC triangular filters Gammatone filter shapes Truncated Gammatone filter shapes n Most results do not depend greatly on filter shape n Some sort of frequency integration is helpful when frequencybased selection algorithms are used H(e j! ) Frequency (Hz) Slide 57
56 Analysis window duration (Kim) n Typical analysis window duration for speech recognition is ~25-35 ms n Optimal analysis window duration for estimation of environmental parameters is ~ ms n Best systems measure environmental parameters (including voice activity detection over a longer time interval but apply results to a short-duration analysis frame Slide 58
57 Temporal Speech Properties: modulation filtering Output of speech and noise segments from 14 th Mel filter (1050 Hz) n Speech segment exhibits greater fluctuations Slide 59 59!
58 Nonlinear noise processing n Use nonlinear cepstral highpass filtering to pass speech but not noise n Why nonlinear? Need to keep results positive because we are dealing with manipulations of signal power Slide 60
59 Asymmetric lowpass filtering (Kim, 2010) n Overview of processing: Assume that noise components vary slowly compared to speech components Obtain a running estimate of noise level in each channel using nonlinear processing Subtract estimated noise level from speech n An example: Note: Asymmetric highpass filtering is obtained by subtracting the lowpass filter output from the input Slide 61
60 Implementing asymmetric lowpass filtering Basic equation: Dependence on parameter values: Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.9,! b = 0.9) Time (s) Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.5,! b = 0.95) Time (s) Street 5 db Power (db) Q in [ m, l] Q out [ m, l] (! a = 0.999,! b = 0.5) Time (s) Slide 62
61 Computational complexity of front ends Mults & Divs per Frame MFCC PLP PNCC Truncated PNCC Slide 64
62 Performance of PNCC in white noise (RM) 56678(6901#!!0!0:;/4 #!!,! +! *! B.%% <C%%0D>?=0EF- $! <C%% /5-F5!BGB!! " #! #" $! %&'() -./01234 Slide 65
63 Performance of PNCC in white noise (WSJ) 56678(6901#!!0!0:;/4 #!!,! +! *! C.%% $! DE%% /5-H5!CIC!! " #! #" $! %&'() -./01234 Slide 66
64 Performance of PNCC in background music 56678(6901#!!0!0:;/4 #!!,! +! *! B.%% $! >C%% /5-H5!BIB!! " #! #" $! %&'() -./01234 Slide 67
65 Performance of PNCC in reverberation #!! *! (! &! $!?AB!!'C58+,-,./,.01234: DEFF GHFF5I21J5K6A GHFF +;A6;!DLD!!!"#!"$!"%!"&!"'!"(!") #"$ +,-,./, ,589: Slide 68
66 Contributions of PNCC components: white noise (WSJ) -../0(.123#!!2!24567 #!!,! +! *! $! + Temporal masking! + Noise suppression! + Medium-duration processing! Baseline MFCC + CMN!!! " #! #" $! %&'() 89623:;7 Slide 69
67 Contributions of PNCC components: background music (WSJ) -../0(.123#!!2!24567 #!!,! +! *! $! + Temporal masking! + Noise suppression! + Medium-duration processing! Baseline MFCC + CMN!!! " #! #" $! %&'() 89623:;7 Slide 70
68 Contributions of PNCC components: reverberation (WSJ) #!! *! (! &! $!?AB!!'C58+,-,./,.01234: D4!EE24,5FGHH5I21J5HKG + Temporal masking! D4!EE24,5FGHH5I21J3=15K09C24L + Noise suppression! D4!EE24,5FGHH5 5I21J3=15K09C24L504M5N2E1,.24L + Medium-duration processing! O09,E24,5KNHH5I21J5HKG Baseline MFCC + CMN!!!!"#!"$!"%!"&!"'!"(!") #"$ +,-,./, ,589: Slide 71
69 Effects of onset enhancement/temporal masking (SSF processing, Kim 10) #!! /<#01<7=>60.?>='4 #!! 5A# /:;<= (6901#!!0!0:;/4,! +! *! $! /5-A5!HIH0D>EF0%<.!! " #! #" $! %&'() -./ ,,-./,012#!!1!13456 *! (! &! $!!!!"#!"$!"%!"&!"'!"(!") #"$ Slide 72
70 PNCC and Slide 73
71 Summary so what matters? n Knowledge of the auditory system can certainly improve ASR accuracy: Use of synchrony Consideration of rate-intensity function Onset enhancement Nonlinear modulation filtering Selective reconstruction Consideration of processes mediating auditory scene analysis Slide 74
72 Summary: PNCC processing n PNCC processing includes More effective nonlinearity Parameter estimation for noise compensation and analysis based on longer analysis time and frequency spread Efficient noise compensation based on modulation filtering Onset enhancement Computationally-efficient implementation n Not considered yet Synchrony representation Lateral suppression Slide 75
73
Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:
Robust Automatic Speech Recognition In the 21 st Century Richard Stern (with Alex Acero, Yu-Hsiang Chiu, Evandro Gouvêa, Chanwoo Kim, Kshitiz Kumar, Amir Moghimi, Pedro Moreno, Hyung-Min Park, Bhiksha
More informationROBUST SPEECH RECOGNITION. Richard Stern
ROBUST SPEECH RECOGNITION Richard Stern Robust Speech Recognition Group Mellon University Telephone: (412) 268-2535 Fax: (412) 268-3890 rms@cs.cmu.edu http://www.cs.cmu.edu/~rms Short Course at Universidad
More informationIN recent decades following the introduction of hidden. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. X, NO. X, MONTH, YEAR 1 Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim and Richard M. Stern, Member,
More informationSignal Processing for Robust Speech Recognition Motivated by Auditory Processing
Signal Processing for Robust Speech Recognition Motivated by Auditory Processing Chanwoo Kim CMU-LTI-1-17 Language Technologies Institute School of Computer Science Carnegie Mellon University 5 Forbes
More informationPower-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and Richard M. Stern, Fellow, IEEE
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 1315 Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and
More informationSIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM
SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,
More informationRobust speech recognition using temporal masking and thresholding algorithm
Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationRobust Speech Recognition Based on Binaural Auditory Processing
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer
More informationPower Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition
Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies
More informationRobust Speech Recognition Based on Binaural Auditory Processing
Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh,
More informationIEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Power-Normalized Cepstral Coefficients (PNCC) for Robust
More informationMOST MODERN automatic speech recognition (ASR)
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER 1997 451 A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition Brian Strope and Abeer Alwan, Member,
More informationAuditory Based Feature Vectors for Speech Recognition Systems
Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines
More informationUsing the Gammachirp Filter for Auditory Analysis of Speech
Using the Gammachirp Filter for Auditory Analysis of Speech 18.327: Wavelets and Filterbanks Alex Park malex@sls.lcs.mit.edu May 14, 2003 Abstract Modern automatic speech recognition (ASR) systems typically
More informationPerception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.
Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions
More informationPsycho-acoustics (Sound characteristics, Masking, and Loudness)
Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure
More informationYou know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels
AUDL 47 Auditory Perception You know about adding up waves, e.g. from two loudspeakers Week 2½ Mathematical prelude: Adding up levels 2 But how do you get the total rms from the rms values of two signals
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,
More informationAUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing
AUDL 4007 Auditory Perception Week 1 The cochlea & auditory nerve: Obligatory stages of auditory processing 1 Think of the ear as a collection of systems, transforming sounds to be sent to the brain 25
More informationAN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications
More informationROBUST SPEECH RECOGNITION BASED ON HUMAN BINAURAL PERCEPTION
ROBUST SPEECH RECOGNITION BASED ON HUMAN BINAURAL PERCEPTION Richard M. Stern and Thomas M. Sullivan Department of Electrical and Computer Engineering School of Computer Science Carnegie Mellon University
More informationImagine the cochlea unrolled
2 2 1 1 1 1 1 Cochlea & Auditory Nerve: obligatory stages of auditory processing Think of the auditory periphery as a processor of signals 2 2 1 1 1 1 1 Imagine the cochlea unrolled Basilar membrane motion
More informationSignals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend
Signals & Systems for Speech & Hearing Week 6 Bandpass filters & filterbanks Practical spectral analysis Most analogue signals of interest are not easily mathematically specified so applying a Fourier
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationHCS 7367 Speech Perception
HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based
More informationHearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin
Hearing and Deafness 2. Ear as a analyzer Chris Darwin Frequency: -Hz Sine Wave. Spectrum Amplitude against -..5 Time (s) Waveform Amplitude against time amp Hz Frequency: 5-Hz Sine Wave. Spectrum Amplitude
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationSignal segmentation and waveform characterization. Biosignal processing, S Autumn 2012
Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationTHE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES
THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES J. Bouše, V. Vencovský Department of Radioelectronics, Faculty of Electrical
More informationPhase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford)
Phase and Feedback in the Nonlinear Brain Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford) Auditory processing pre-cosyne workshop March 23, 2004 Simplistic Models
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationAcoustics, signals & systems for audiology. Week 4. Signals through Systems
Acoustics, signals & systems for audiology Week 4 Signals through Systems Crucial ideas Any signal can be constructed as a sum of sine waves In a linear time-invariant (LTI) system, the response to a sinusoid
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationAuditory modelling for speech processing in the perceptual domain
ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract
More informationSOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION
SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS Roland SOTTEK, Klaus GENUIT HEAD acoustics GmbH, Ebertstr. 30a 52134 Herzogenrath, GERMANY SUMMARY Sound quality evaluation of
More informationSGN Audio and Speech Processing
Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationT Automatic Speech Recognition: From Theory to Practice
Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University
More informationMel- frequency cepstral coefficients (MFCCs) and gammatone filter banks
SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina
More informationMachine recognition of speech trained on data from New Jersey Labs
Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation
More informationAUDL Final exam page 1/7 Please answer all of the following questions.
AUDL 11 28 Final exam page 1/7 Please answer all of the following questions. 1) Consider 8 harmonics of a sawtooth wave which has a fundamental period of 1 ms and a fundamental component with a level of
More informationDamped Oscillator Cepstral Coefficients for Robust Speech Recognition
Damped Oscillator Cepstral Coefficients for Robust Speech Recognition Vikramjit Mitra, Horacio Franco, Martin Graciarena Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA.
More informationAUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution
AUDL GS08/GAV1 Signals, systems, acoustics and the ear Loudness & Temporal resolution Absolute thresholds & Loudness Name some ways these concepts are crucial to audiologists Sivian & White (1933) JASA
More informationTesting of Objective Audio Quality Assessment Models on Archive Recordings Artifacts
POSTER 25, PRAGUE MAY 4 Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts Bc. Martin Zalabák Department of Radioelectronics, Czech Technical University in Prague, Technická
More informationCOM325 Computer Speech and Hearing
COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk
More informationSpectral and temporal processing in the human auditory system
Spectral and temporal processing in the human auditory system To r s t e n Da u 1, Mo rt e n L. Jepsen 1, a n d St e p h a n D. Ew e r t 2 1Centre for Applied Hearing Research, Ørsted DTU, Technical University
More informationSGN Audio and Speech Processing
SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although
More information3D Distortion Measurement (DIS)
3D Distortion Measurement (DIS) Module of the R&D SYSTEM S4 FEATURES Voltage and frequency sweep Steady-state measurement Single-tone or two-tone excitation signal DC-component, magnitude and phase of
More informationMonaural and Binaural Speech Separation
Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as
More informationLearning-Based Auditory Encoding for Robust Speech. Recognition
Learning-Based Auditory Encoding for Robust Speech Recognition Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering Yu-Hsiang
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationIN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationA CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL
9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen
More informationThe psychoacoustics of reverberation
The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control
More informationLearning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives
Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationTone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.
Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and
More informationREAL-TIME BROADBAND NOISE REDUCTION
REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time
More informationSpectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma
Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma & Department of Electrical Engineering Supported in part by a MURI grant from the Office of
More informationRhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University
Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationSignals, Sound, and Sensation
Signals, Sound, and Sensation William M. Hartmann Department of Physics and Astronomy Michigan State University East Lansing, Michigan Л1Р Contents Preface xv Chapter 1: Pure Tones 1 Mathematics of the
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationSPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction
SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationRobust telephone speech recognition based on channel compensation
Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,
More informationBINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH Anjali Menon 1, Chanwoo Kim 2, Umpei Kurokawa 1, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University,
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationSpeech/Music Change Point Detection using Sonogram and AANN
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 6, Number 1 (2016), pp. 45-49 International Research Publications House http://www. irphouse.com Speech/Music Change
More informationVQ Source Models: Perceptual & Phase Issues
VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu
More informationA cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking
A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking Courtney C. Lane 1, Norbert Kopco 2, Bertrand Delgutte 1, Barbara G. Shinn- Cunningham
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More information1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise.
Journal of Advances in Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sari Branch, Islamic Azad University, Sari, I.R.Iran (Vol. 6, No. 3, August 2015), Pages: 87-95 www.jacr.iausari.ac.ir
More informationComplex Sounds. Reading: Yost Ch. 4
Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationBinaural Hearing. Reading: Yost Ch. 12
Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to
More informationFFT analysis in practice
FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular
More information