arxiv: v1 [eess.as] 30 Dec 2017

Size: px
Start display at page:

Download "arxiv: v1 [eess.as] 30 Dec 2017"

Transcription

1 LOGARITHMI FREQUEY SALIG AD OSISTET FREQUEY OVERAGE FOR THE SELETIO OF AUDITORY FILTERAK ETER FREQUEIES Shoufeng Lin arxiv:8.75v [eess.as] 3 Dec 27 Department of Electrical and omputer Engineering, urtin University Kent Street, entley, Perth, Western Australia, 62 shoufeng.lin@postgrad.curtin.edu.au; ee.linsf@gmail.com ASTRAT This paper provides new insights into the problem of selecting filter center frequencies for the auditory filterbanks. We propose to use a constant frequency distance and a consistent frequency coverage as the two metrics that motivate the logarithmic frequency scaling and a regularized selection of center frequencies. The frequency scaling and the consistent frequency coverage have been derived based on a common harmonic speaker signal model. Furthermore, we have found that the existing linear equivalent rectangular bandwidth (ER function as well as any possible linear ER approximation can also lead to a consistent frequency coverage. The results are verified and demonstrated using the gammatone filterbank. Index Terms auditory filterbank, speech signal processing, frequency scaling, frequency coverage, ER.. ITRODUTIO Auditory filterbanks have been widely accepted and applied in numerous speech signal processing algorithms especially in the computational auditory scene analysis (ASA area [], for various applications including the speech enhancement, recognition and transcription. A typical auditory filterbank consists of two parts, i.e. the filter type and the centre frequencies of filters. ommon filter types include the gammatone, gammachirp, and their variants [2], which simulate the auditory response of human hearers. hoice of center frequencies of the auditory filters has evolved from the earlier critical bandwidth and the critical-band-rate scale [3], to the polynomial approximation of equivalent rectangular bandwidth (ER [4], and the currently well-accepted linear ER [5], as well as their corresponding ER-rate scales (ERS. Although the linear ER approximation in [5] has been found useful in practical implementations, it has been based on experimental findings through psychoacoustic measurement and curve-fitting. Logarithmic frequency scales have also been applied [6, 7, 8]. However, the selection of the number of subbands for a given frequency range still remains empirical for both of the ER rate scale and the logarithmic scale. In this paper, we further investigate the frequency scaling and provide new insights including a new proposed frequency coverage metric, and also derivations of a new frequency scaling function that lead to consistent frequency coverage for auditory filterbanks. Moreover, based on the proposed definition of frequency coverage, we also derive an expression for the frequency coverage metric from the existing linear ER. 2. EQUIVALET RETAGULAR ADWIDTH SALE The ER of a particular filter is defined as the bandwidth of a rectangular filter to pass the same energy of the filter [4, 5]. The relationship between the ER of the human auditory filter and the center frequency has been studied extensively using analytical expressions to approximate measurement data from psychoacoustic experiment. An early approximation has the polynomial form [4] ÊR(f = a f 2 + b f + c, ( where f is the frequency in unit of Hz, and a, b, c R are parameters. However, one of the most widely accepted analytical approximation over the past decades has been the linear form [5] ẼR(f =24.7 (.437 f + = f. Each ER corresponds to a constant distance along the basilar membrane [9, 5] in cochlea. The ER-rate scale (ERS has been developed to scale frequency in terms of units of the ER, by solving the integral [4, 5]: ẼRS(f = df, (3 ẼR(f with the boundary condition (2 ẼRS( =. (4

2 Using (2 in (3 and (4 yields [5] ẼRS(f = 2.4 lg(.437 f +. (5 The ER and ERS given in (2 and (5 have been applied in numerous auditory studies, for selecting the center frequencies of the auditory filterbank [], yet the ER approximation is still found as a result of curve-fitting from experiments, and the number of subbands for a given frequency range is still an empirical parameter. 3. SUGGESTED FREQUEY SALIG AD OVERAGE 3.. Speaker Signal Model ased on the source excitation - vocal tract models for the process of speech production [], as well as the amplitudemodulation (AM and frequency modulation (FM structure [2], a harmonic model is used for the speaker signal: s ( q H q s q (t = s ( q (t, (6 = (t = A ( q (t cos ( ω q t + φ ( q (t, (7 where t R is continuous time, s q (t the speech signal from the q-th speaker, q =,..., Q, integer Q the number of concurrent speakers, s ( q (t the -th harmonic of speaker q, integer the order of harmonics for a speaker, integer H q the maximum order of harmonics for speaker q, A ( q (t the envelope of each harmonic, φ ( q (t R the phase (which is short-time constant for speech signals, and ω q > the (angular fundamental frequency. With appropriate selection of filter center frequencies, the auditory filterbank ideally separates into subbands the harmonic components of not only a single speaker, but also multiple concurrent speakers, based on the time-frequency sparsity assumption of speech signals [3] Logarithmic Frequency Scaling In practice, concurrent speakers usually have different fundamental frequencies. Thus we can denote fundamental frequencies of two speakers as f, f 2 (f = ω /2π, f 2 = ω 2 /2π, f f 2, and their difference is f = f f 2. (8 Thus from (7 the frequency difference of their -th harmonic is f. This means that their harmonics (of same order are more distant at higher frequencies on the linear frequency scale, which makes selection of the filterbank center frequencies difficult for a regular per-speaker estimate. We thus propose a frequency scaling function Υ( that satisfies (9 so that speech components of separate speakers appear equidistantly, with respect to (w.r.t. : Υ( f Υ( f 2 onstant, w.r.t.. (9 The logarithmic functions are functional solutions to (9: Υ( = A log ( +, ( where A >, >, R. They also have better resolutions for the lower frequencies, which aligns with the fact that most speech energy falls in low frequencies (e.g. fundamental frequencies and their lower-order harmonics. We can easily verify from ( that Υ( f Υ( f 2 A (log (f /f 2, which is constant with respect to. Denote the ratio of center frequency to the bandwidth as for filter band b (b =,..., b, integer b > is the number of filter bands, i.e. (b and f = η(b denote the bandwidth and center fre- where quency of filter band b, respectively., ( is also referred to as the quality factor (Q-factor of subband b. Denote the frequency range that we are interested in as [f min, f max ], where f max > f min >. Assuming that the center frequencies of filter bands are equidistantly spaced in the proposed frequency range, we have Υ( and = ( b b Υ(f min + (b Υ(f max, (2 b = Υ ( Υ(, (3 where Υ ( denotes the inverse function of Υ(. From (, (2 and (3, we can get for the new logarithmic frequency scaling = Υ (Υ( = Υ ( ( b b Υ(f min + (b Υ(f max b = ( b b Υ(f min +(b Υ(fmax A 3.3. Proposed Frequency overage (4 The auditory filterbank requires sufficient frequency coverage to capture all harmonic components of concurrent speakers. Here we propose to define the frequency coverage of the filterbank on the proposed frequency scale as (b Σf, (5

3 where (b and Σf denote the distance between consecutive filter bands and the half of the sum of their bandwidths, as shown in (6 and (7, respectively: and Σ 2 (b+ f, (6 (f (b+ +. (7 Apparently = gives a full coverage for ideal brickwall bandpass filters with no overlap. For a practical auditory filterbank however, the filters always have finite rolloff rate, thus reasonable overlap is required for full coverage, leading to. Also depending on applications, we may have < when full coverage is not required. Therefore from (, (4 and (5, we have when = η (b+, = 2 f (b+ = 2 = 2 = 2 = 2 f (b+ + f (b+ f (b+ + f (b+ / f (b+ + / Υ(fmax Υ(fmin A( + Υ(fmax Υ(f min A( f min ( fmax ( fmax f min +, (8 which clearly shows that the frequency coverage on the logarithmic frequency scaling is consistent over the frequency range, i.e. if the Q-factor is a constant w.r.t subband index b, the resulting is also a constant value Frequency overage of the Existing ERS The existing ER function (2 does not lead to a constant, here we investigate its corresponding frequency coverage by applying the definition in (5. Denote the general form of ER in (2 as ˆυ(f = D + E f, (9 where D, E >. When D = 24.7, and E =.8 we have (2. The resulting ERS following the process of (3 and (4 becomes: ˆΥ(f = E lg( + D f, (2 where D E D, (2 and E E lg e. (22 Assuming the filter bandwidth is a constant scale of the ER, which is true for some auditory filters, e.g. the gammatone filter [2], i.e. = K ˆυ(f, (23 where K > is a constant. ote here that the Q-factor is not constant as D. Therefore, selecting equidistantly on the scale ˆΥ(f, similar to (4, we have = ˆΥ (b ( ˆΥ(f = ˆΥ ( ( b b ˆΥ(f min + (b ˆΥ(f max b = [ D ( + D f min ( b b ( + D f max (b ] D. Thus from (5 and (9 we have,ˆυ = 2 f (b+ = K D + E 2 = E K 2 f (b+ + (f (b+ + (24 f (b+ [(( + D f min ( b b ( + D f max (b + (( + D f min ( b b ( + D f max (b ]/ [(( + D f min ( b b ( + D f max (b (( + D f min ( b b ( + D f max (b ] = E K [ ( + D f max b + ( + D f min [ 2 ] ( + D f max b ( + D f min b ]/ = E K ( D+E fmax D+E f min b + 2 ( D+E fmax, D+E f min b (25 which is also constant over filter subbands. Thus as long as the ER has the linear form as (9 and assuming that (23 holds, the resulting frequency coverage is constant over frequency at given f min, f max and b. Thus the number of subbands for a given frequency range b can be derived from the required frequency coverage using (25, and the subband center frequencies can then be calculated from (4 or (24.

4 4. UMERIAL STUDIES 4.. ew ER and ERS Functions From ( we have a new frequency scaling function that can lead to consistent frequency coverage for the auditory filterbank, as well as a constant Q-factor. ow we calculate the parameters. Denote the maximum inaudible frequency as f m, usually f m 2Hz, we use the boundary condition ER (Hz Fidell 983 Shailer 983 Houtgast 977 Patterson 976 Patterson 982 Weber 977 ER(f ER(f υ(f ER v.s. enter Frequency instead of (4. Thus from ( we have Υ(f m =, (26 = A log (f m. (27 From (3 and ( we have a new approximation of the ER: υ(f = / dυ(f df = ln (28 A f. hoosing natural logarithm, i.e. = e, where e = , we can get A from linear fitting of experimental readings from the literature [4, 5, 6, 7, 8, 9] as shown in Fig.. We can see that υ(f = f, (29 A where A = 7.7 fits the data well. Then we have { A ln(f +, f > f m Υ(f =, (3, f f m where = 23.. Equations (29 and (3 are the proposed new ER and ERS functions. ote here that the ER of human auditory system may vary with age and sound level and from one listener to another [4]. Thus the precise values of A and may vary. However, the derivation from ( to (8 shows that, as long as the ER function has the proposed form of (28 or (29, the resulting frequency scaling always satisfies the frequency coverage as (8 shows. The existing and proposed ERS functions are plotted in Fig. 2. We can see that the proposed scaling follows the proposed logarithmic scaling, and is steeper at frequencies lower than about Hz. In this section we use f min = 2Hz and f max = 36Hz. The center frequencies that correspond to equidistant points on respective ERS for b = 6 are plotted in. We can see that the proposed ERS has more points at low frequencies. This can provide better frequency resolution on the lower frequencies as most of speaker fundamental frequencies are below 5Hz, and usually most speech energies are in the fundamental frequency or its lower order harmonics [] enter Frequency (Hz Fig. : Measured equivalent rectangular bandwidth versus center frequency, and ER curves. ERS ERS ERS(f Υ(f ERS v.s. Frequency Frequency (Hz ERS v.s. Frequency (Logarithmic Scale ERS(f Υ(f Frequency Logarithmic Scale (Hz Fig. 2: The existing and proposed ERS and corresponding selected center frequencies Frequency overage of the Gammatone Filterbank The frequency coverage is the property that we propose for the selection of center frequencies of an auditory filterbank. Here we use the gammatone filter to demonstrate the feature. We can see from [2] that bandwidth of the gammatone filter is only dependent on the filter order n (n and the ER, i.e.,γ = k(n ẼR(, (3 where [ π(2n 2!2 (2n 2 ] k(n = 2 2 /n. (32 (n! 2 This satisfies the assumptions in (8 and (23. Thus using the new ER function (29 instead of (2 in (3, we have the Q-factor for the gammatone filter,γ = A k(n, (33

5 which is constant over frequency, e.g. when n = 4, we have k(4 =.8865, and,γ = Thus given b = 6, we can get η (b η (b from (8, and η(b,ˆυ η (b v.s. Frequency.66 from (25. ẼR(f Proposed υ(f Frequency (Hz Frequency overage v.s. umber of Sub bands.5.5 ERS(f Proposed Υ(f umber of Sub bands Fig. 3: Q-factor and frequency coverage for a 4-th order gammatone filterbank. Fig. 3 further provides the frequency coverage of the proposed and existing ERS over the number of subbands of the 4-th order gammatone auditory filterbank for the frequency range of [2, 36]Hz. We can see from the top panel that for frequencies above about 5Hz, both ERs align well with each other. However, the existing ER has decreasing Q-factors as frequencies decrease below about 5Hz, while the proposed ER is consistent across the entire frequency range. We can also see from the bottom panel that for both ER scaling functions, the frequency coverage is constant for a given number of subbands b, and increases almost linearly with the number of subbands. The frequency coverage reaches about at b = 24 for both scaling. However, it can also be noted that for the same frequency range, the ERS requires less number of subbands than the new logarithmic scale, for a desired frequency coverage. 5. OLUSIOS This paper investigates the frequency scaling of the auditory filterbanks, and proposes a novel frequency coverage metric for the selection of center frequencies of auditory filterbanks. We also propose a new ER that aligns with the logarithmic frequency scaling, and derive that equidistant frequencies on the logarithmic frequency scale provide a consistent frequency coverage for the filterbanks. Moreover, we show that the existing and any possible linear ER can also provide consistent frequency coverage. The suggested frequency coverage is demonstrated using the gammatone filterbank. Acknowledgment The author would like to acknowledge the contribution of the Australian Postgraduate Award and Australian Government Research Training Program Scholarship in supporting this research. Due thanks are given to Professor S. ordholm and anonymous reviewers for the review comments on early revisions of the manuscript. 6. REFEREES [] D. Wang and G. J. rown, omputational auditory scene analysis: Principles, algorithms, and applications. Wiley-IEEE Press, 26. [2] J. Holdsworth, I. immo-smith, R. Patterson, and P. Rice, Implementing a gammatone filter bank, Annex of the SVOS Final Report: Part A: The Auditory Filterbank, vol., pp. 5, 988. [3] E. Zwicker and E. Terhardt, Analytical expressions for critical-band rate and critical bandwidth as a function of frequency, The Journal of the Acoustical Society of America, vol. 68, no. 5, pp , 98. [4].. Moore and. R. Glasberg, Suggested formulae for calculating auditory-filter bandwidths and excitation patterns, The Journal of the Acoustical Society of America, vol. 74, no. 3, pp , 983. [5]. R. Glasberg and.. Moore, Derivation of auditory filter shapes from notched-noise data, Hearing research, vol. 47, no., pp. 3 38, 99. [6] X. Sun, Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio, in Acoustics, Speech, and Signal Processing (IASSP, 22 IEEE International onference on, vol.. IEEE, 22, pp. I 333. [7] F. olan, Intonational equivalence: an experimental evaluation of pitch scales, in Proceedings of the 5th International ongress of Phonetic Sciences, arcelona, vol. 39, 23. [8] W. iesmans,. Das, T. Francart, and A. ertrand, Auditory-inspired speech envelope extraction methods for improved eeg-based auditory attention detection in a cocktail party scenario, IEEE Transactions on eural Systems and Rehabilitation Engineering, vol. 25, no. 5, pp , 27. [9]. Moore, Parallels between frequency selectivity measured psychophysically ant in (cochilear mechanics, 986.

6 [] R. Patterson, I. immo-smith, J. Holdsworth, and P. Rice, An efficient auditory filterbank based on the gammatone function, in a meeting of the IO Speech Group on Auditory Modelling at RSRE, vol. 2, no. 7, 987. [] J. R. Deller Jr, J. G. Proakis, and J. H. Hansen, Discrete time processing of speech signals. Prentice Hall PTR, 993. [2] P. Maragos, J. F. Kaiser, and T. F. Quatieri, Energy separation in signal modulations with application to speech analysis, IEEE transactions on signal processing, vol. 4, no., pp , 993. [3] O. Yilmaz and S. Rickard, lind separation of speech mixtures via time-frequency masking, IEEE Transactions on Signal Processing, vol. 52, no. 7, pp , 24. [4] R. D. Patterson, Auditory filter shapes derived with noise stimuli, The Journal of the Acoustical Society of America, vol. 59, no. 3, pp , 976. [5] D. L. Weber, Growth of masking and the auditory filter, The Journal of the Acoustical Society of America, vol. 62, no. 2, pp , 977. [6] T. Houtgast, Auditory-filter characteristics derived from direct-masking data and pulsation-threshold data with a rippled-noise masker, The Journal of the Acoustical Society of America, vol. 62, no. 2, pp , 977. [7] R. D. Patterson, I. immo-smith, D. L. Weber, and R. Milroy, The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold, The Journal of the Acoustical Society of America, vol. 72, no. 6, pp , 982. [8] S. Fidell, R. Horonjeff, S. Teffeteller, and D. M. Green, Effective masking bandwidths at low frequencies, The Journal of the Acoustical Society of America, vol. 73, no. 2, pp , 983. [9] M. J. Shailer and.. Moore, Gap detection as a function of frequency, bandwidth, and level, The Journal of the Acoustical Society of America, vol. 74, no. 2, pp , 983.

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,

More information

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels AUDL 47 Auditory Perception You know about adding up waves, e.g. from two loudspeakers Week 2½ Mathematical prelude: Adding up levels 2 But how do you get the total rms from the rms values of two signals

More information

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure

More information

Journal of the Acoustical Society of America 88

Journal of the Acoustical Society of America 88 The following article appeared in Journal of the Acoustical Society of America 88: 97 100 and may be found at http://scitation.aip.org/content/asa/journal/jasa/88/1/10121/1.399849. Copyright (1990) Acoustical

More information

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Stuart N. Wrigley and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 211 Portobello Street, Sheffield

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING AND NOTCH FILTER Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA Tokyo University of Science Faculty of Science and Technology ABSTRACT

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution AUDL GS08/GAV1 Signals, systems, acoustics and the ear Loudness & Temporal resolution Absolute thresholds & Loudness Name some ways these concepts are crucial to audiologists Sivian & White (1933) JASA

More information

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin Hearing and Deafness 2. Ear as a analyzer Chris Darwin Frequency: -Hz Sine Wave. Spectrum Amplitude against -..5 Time (s) Waveform Amplitude against time amp Hz Frequency: 5-Hz Sine Wave. Spectrum Amplitude

More information

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract

More information

A Pole Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data

A Pole Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data A Pole Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data Richard F. Lyon Google, Inc. Abstract. A cascade of two-pole two-zero filters with level-dependent

More information

Experiments in two-tone interference

Experiments in two-tone interference Experiments in two-tone interference Using zero-based encoding An alternative look at combination tones and the critical band John K. Bates Time/Space Systems Functions of the experimental system: Variable

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford)

Phase and Feedback in the Nonlinear Brain. Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford) Phase and Feedback in the Nonlinear Brain Malcolm Slaney (IBM and Stanford) Hiroko Shiraiwa-Terasawa (Stanford) Regaip Sen (Stanford) Auditory processing pre-cosyne workshop March 23, 2004 Simplistic Models

More information

On the relationship between multi-channel envelope and temporal fine structure

On the relationship between multi-channel envelope and temporal fine structure On the relationship between multi-channel envelope and temporal fine structure PETER L. SØNDERGAARD 1, RÉMI DECORSIÈRE 1 AND TORSTEN DAU 1 1 Centre for Applied Hearing Research, Technical University of

More information

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)].

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)]. XVI. SIGNAL DETECTION BY HUMAN OBSERVERS Prof. J. A. Swets Prof. D. M. Green Linda E. Branneman P. D. Donahue Susan T. Sewall A. MASKING WITH TWO CONTINUOUS TONES One of the earliest studies in the modern

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

Oreja: A MATLAB environment for the design of psychoacoustic stimuli

Oreja: A MATLAB environment for the design of psychoacoustic stimuli Journal Behavior Research Methods 2006,?? 38 (?), (4),???-??? 574-578 Oreja: A MATLAB environment for the design of psychoacoustic stimuli ELVIRA PÉREZ University of Liverpool, Liverpool, England and RAUL

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

Using the Gammachirp Filter for Auditory Analysis of Speech

Using the Gammachirp Filter for Auditory Analysis of Speech Using the Gammachirp Filter for Auditory Analysis of Speech 18.327: Wavelets and Filterbanks Alex Park malex@sls.lcs.mit.edu May 14, 2003 Abstract Modern automatic speech recognition (ASR) systems typically

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

COM325 Computer Speech and Hearing

COM325 Computer Speech and Hearing COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Human Auditory Periphery (HAP)

Human Auditory Periphery (HAP) Human Auditory Periphery (HAP) Ray Meddis Department of Human Sciences, University of Essex Colchester, CO4 3SQ, UK. rmeddis@essex.ac.uk A demonstrator for a human auditory modelling approach. 23/11/2003

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise.

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise. Journal of Advances in Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sari Branch, Islamic Azad University, Sari, I.R.Iran (Vol. 6, No. 3, August 2015), Pages: 87-95 www.jacr.iausari.ac.ir

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

PROJECT REPORT. Monaural voiced speech segregation based on elaborate harmonic grouping strategies. INSTRUCTOR: Dr. Rajesh Hegde GROUP Members:

PROJECT REPORT. Monaural voiced speech segregation based on elaborate harmonic grouping strategies. INSTRUCTOR: Dr. Rajesh Hegde GROUP Members: PROJECT REPORT Monaural voiced speech segregation based on elaborate harmonic grouping strategies INSTRUCTOR: Dr. Rajesh Hegde GROUP Members: Gaurav Solanki Shouvik Ganguly Shiv Prakash Prashant Khokhar

More information

Spectral and temporal processing in the human auditory system

Spectral and temporal processing in the human auditory system Spectral and temporal processing in the human auditory system To r s t e n Da u 1, Mo rt e n L. Jepsen 1, a n d St e p h a n D. Ew e r t 2 1Centre for Applied Hearing Research, Ørsted DTU, Technical University

More information

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation 1 Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation Zhangli Chen* and Volker Hohmann Abstract This paper describes an online algorithm for enhancing monaural

More information

Design of a Sharp Linear-Phase FIR Filter Using the α-scaled Sampling Kernel

Design of a Sharp Linear-Phase FIR Filter Using the α-scaled Sampling Kernel Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 129 Design of a Sharp Linear-Phase FIR Filter Using the -scaled Sampling Kernel K.J. Kim,

More information

Boldt, Jesper Bünsow; Kjems, Ulrik; Pedersen, Michael Syskind; Lunner, Thomas; Wang, DeLiang

Boldt, Jesper Bünsow; Kjems, Ulrik; Pedersen, Michael Syskind; Lunner, Thomas; Wang, DeLiang Downloaded from vbn.aau.dk on: januar 14, 19 Aalborg Universitet Estimation of the Ideal Binary Mask using Directional Systems Boldt, Jesper Bünsow; Kjems, Ulrik; Pedersen, Michael Syskind; Lunner, Thomas;

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Audible Aliasing Distortion in Digital Audio Synthesis

Audible Aliasing Distortion in Digital Audio Synthesis 56 J. SCHIMMEL, AUDIBLE ALIASING DISTORTION IN DIGITAL AUDIO SYNTHESIS Audible Aliasing Distortion in Digital Audio Synthesis Jiri SCHIMMEL Dept. of Telecommunications, Faculty of Electrical Engineering

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Subband Analysis of Time Delay Estimation in STFT Domain

Subband Analysis of Time Delay Estimation in STFT Domain PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,

More information

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Acoustics, signals & systems for audiology. Week 4. Signals through Systems Acoustics, signals & systems for audiology Week 4 Signals through Systems Crucial ideas Any signal can be constructed as a sum of sine waves In a linear time-invariant (LTI) system, the response to a sinusoid

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

ABSTRACT. Title of Document: SPECTROTEMPORAL MODULATION LISTENERS. Professor, Dr.Shihab Shamma, Department of. Electrical Engineering

ABSTRACT. Title of Document: SPECTROTEMPORAL MODULATION LISTENERS. Professor, Dr.Shihab Shamma, Department of. Electrical Engineering ABSTRACT Title of Document: SPECTROTEMPORAL MODULATION SENSITIVITY IN HEARING-IMPAIRED LISTENERS Golbarg Mehraei, Master of Science, 29 Directed By: Professor, Dr.Shihab Shamma, Department of Electrical

More information

arxiv: v1 [cs.it] 9 Mar 2016

arxiv: v1 [cs.it] 9 Mar 2016 A Novel Design of Linear Phase Non-uniform Digital Filter Banks arxiv:163.78v1 [cs.it] 9 Mar 16 Sakthivel V, Elizabeth Elias Department of Electronics and Communication Engineering, National Institute

More information

Bark and ERB Bilinear Transforms

Bark and ERB Bilinear Transforms Bark and ERB Bilinear Transforms Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University Stanford, CA 9435 USA Jonathan S. Abel Human Factors Research Division

More information

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend Signals & Systems for Speech & Hearing Week 6 Bandpass filters & filterbanks Practical spectral analysis Most analogue signals of interest are not easily mathematically specified so applying a Fourier

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution Acoustics, signals & systems for audiology Week 9 Basic Psychoacoustic Phenomena: Temporal resolution Modulating a sinusoid carrier at 1 khz (fine structure) x modulator at 100 Hz (envelope) = amplitudemodulated

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

A Novel Control Method to Minimize Distortion in AC Inverters. Dennis Gyma

A Novel Control Method to Minimize Distortion in AC Inverters. Dennis Gyma A Novel Control Method to Minimize Distortion in AC Inverters Dennis Gyma Hewlett-Packard Company 150 Green Pond Road Rockaway, NJ 07866 ABSTRACT In PWM AC inverters, the duty-cycle modulator transfer

More information

FFT 1 /n octave analysis wavelet

FFT 1 /n octave analysis wavelet 06/16 For most acoustic examinations, a simple sound level analysis is insufficient, as not only the overall sound pressure level, but also the frequency-dependent distribution of the level has a significant

More information

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR 22 CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR 2.1 INTRODUCTION A CI is a device that can provide a sense of sound to people who are deaf or profoundly hearing-impaired. Filters

More information

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920 Detection and discrimination of frequency glides as a function of direction, duration, frequency span, and center frequency John P. Madden and Kevin M. Fire Department of Communication Sciences and Disorders,

More information

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS Roland SOTTEK, Klaus GENUIT HEAD acoustics GmbH, Ebertstr. 30a 52134 Herzogenrath, GERMANY SUMMARY Sound quality evaluation of

More information

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts POSTER 25, PRAGUE MAY 4 Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts Bc. Martin Zalabák Department of Radioelectronics, Czech Technical University in Prague, Technická

More information

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

Influence of fine structure and envelope variability on gap-duration discrimination thresholds Münkner, S.; Kohlrausch, A.G.; Püschel, D.

Influence of fine structure and envelope variability on gap-duration discrimination thresholds Münkner, S.; Kohlrausch, A.G.; Püschel, D. Influence of fine structure and envelope variability on gap-duration discrimination thresholds Münkner, S.; Kohlrausch, A.G.; Püschel, D. Published in: Journal of the Acoustical Society of America DOI:

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Auditory filters at low frequencies: ERB and filter shape

Auditory filters at low frequencies: ERB and filter shape Auditory filters at low frequencies: ERB and filter shape Spring - 2007 Acoustics - 07gr1061 Carlos Jurado David Robledano Spring 2007 AALBORG UNIVERSITY 2 Preface The report contains all relevant information

More information

Speech Enhancement Based on Audible Noise Suppression

Speech Enhancement Based on Audible Noise Suppression IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 6, NOVEMBER 1997 497 Speech Enhancement Based on Audible Noise Suppression Dionysis E. Tsoukalas, John N. Mourjopoulos, Member, IEEE, and George

More information

Signals, Sound, and Sensation

Signals, Sound, and Sensation Signals, Sound, and Sensation William M. Hartmann Department of Physics and Astronomy Michigan State University East Lansing, Michigan Л1Р Contents Preface xv Chapter 1: Pure Tones 1 Mathematics of the

More information

arxiv: v1 [cs.sd] 4 Dec 2018

arxiv: v1 [cs.sd] 4 Dec 2018 LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and

More information

Intensity Discrimination and Binaural Interaction

Intensity Discrimination and Binaural Interaction Technical University of Denmark Intensity Discrimination and Binaural Interaction 2 nd semester project DTU Electrical Engineering Acoustic Technology Spring semester 2008 Group 5 Troels Schmidt Lindgreen

More information

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES J. Bouše, V. Vencovský Department of Radioelectronics, Faculty of Electrical

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION Broadly speaking, system identification is the art and science of using measurements obtained from a system to characterize the system. The characterization

More information

IMPROVED COCKTAIL-PARTY PROCESSING

IMPROVED COCKTAIL-PARTY PROCESSING IMPROVED COCKTAIL-PARTY PROCESSING Alexis Favrot, Markus Erne Scopein Research Aarau, Switzerland postmaster@scopein.ch Christof Faller Audiovisual Communications Laboratory, LCAV Swiss Institute of Technology

More information

IIR Ultra-Wideband Pulse Shaper Design

IIR Ultra-Wideband Pulse Shaper Design IIR Ultra-Wideband Pulse Shaper esign Chun-Yang Chen and P. P. Vaidyanathan ept. of Electrical Engineering, MC 36-93 California Institute of Technology, Pasadena, CA 95, USA E-mail: cyc@caltech.edu, ppvnath@systems.caltech.edu

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH RESEARCH REPORT IDIAP IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH Cong-Thanh Do Mohammad J. Taghizadeh Philip N. Garner Idiap-RR-40-2011 DECEMBER

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION Mr. Jaykumar. S. Dhage Assistant Professor, Department of Computer Science & Engineering

More information

An auditory model that can account for frequency selectivity and phase effects on masking

An auditory model that can account for frequency selectivity and phase effects on masking Acoust. Sci. & Tech. 2, (24) PAPER An auditory model that can account for frequency selectivity and phase effects on masking Akira Nishimura 1; 1 Department of Media and Cultural Studies, Faculty of Informatics,

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Frequency-Response Masking FIR Filters

Frequency-Response Masking FIR Filters Frequency-Response Masking FIR Filters Georg Holzmann June 14, 2007 With the frequency-response masking technique it is possible to design sharp and linear phase FIR filters. Therefore a model filter and

More information

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position Applying the Filtered Back-Projection Method to Extract Signal at Specific Position 1 Chia-Ming Chang and Chun-Hao Peng Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Analysis on Acoustic Attenuation by Periodic Array Structure EH KWEE DOE 1, WIN PA PA MYO 2

Analysis on Acoustic Attenuation by Periodic Array Structure EH KWEE DOE 1, WIN PA PA MYO 2 www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.24 September-2014, Pages:4885-4889 Analysis on Acoustic Attenuation by Periodic Array Structure EH KWEE DOE 1, WIN PA PA MYO 2 1 Dept of Mechanical

More information

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN 10th International Society for Music Information Retrieval Conference (ISMIR 2009 MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN Christopher A. Santoro +* Corey I. Cheng *# + LSB Audio Tampa, FL 33610

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information