Auditory modelling for speech processing in the perceptual domain

Similar documents
Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Overview of Code Excited Linear Predictive Coder

Different Approaches of Spectral Subtraction Method for Speech Enhancement

REAL-TIME BROADBAND NOISE REDUCTION

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

ELEC9344:Speech & Audio Processing. Chapter 13 (Week 13) Professor E. Ambikairajah. UNSW, Australia. Auditory Masking

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

HCS 7367 Speech Perception

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Audible Aliasing Distortion in Digital Audio Synthesis

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

The psychoacoustics of reverberation

Communications Theory and Engineering

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

NOISE ESTIMATION IN A SINGLE CHANNEL

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Using the Gammachirp Filter for Auditory Analysis of Speech

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

Chapter IV THEORY OF CELP CODING

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Imagine the cochlea unrolled

Speech Enhancement using Wiener filtering

Audio Compression using the MLT and SPIHT

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Auditory Based Feature Vectors for Speech Recognition Systems

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

APPLICATIONS OF DSP OBJECTIVES

Wavelet Speech Enhancement based on the Teager Energy Operator

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Convention Paper Presented at the 122nd Convention 2007 May 5 8 Vienna, Austria

Evaluation of Audio Compression Artifacts M. Herrera Martinez

A Pole Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Enhanced Waveform Interpolative Coding at 4 kbps

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

A binaural auditory model and applications to spatial sound evaluation

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

MPEG-4 Structured Audio Systems

RECENTLY, there has been an increasing interest in noisy

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Module 9: Multirate Digital Signal Processing Prof. Eliathamby Ambikairajah Dr. Tharmarajah Thiruvaran School of Electrical Engineering &

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Advances in Applied and Pure Mathematics

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

Technical University of Denmark

Speech/Music Change Point Detection using Sonogram and AANN

Speech Enhancement Based on Audible Noise Suppression

Audio Restoration Based on DSP Tools

Digital Speech Processing and Coding

PATTERN EXTRACTION IN SPARSE REPRESENTATIONS WITH APPLICATION TO AUDIO CODING

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Analysis of LMS Algorithm in Wavelet Domain

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Audio and Speech Compression Using DCT and DWT Techniques

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

Equalizers. Contents: IIR or FIR for audio filtering? Shelving equalizers Peak equalizers

Digitally controlled Active Noise Reduction with integrated Speech Communication

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

TRADITIONAL PSYCHOACOUSTIC MODEL AND DAUBECHIES WAVELETS FOR ENHANCED SPEECH CODER PERFORMANCE. Sheetal D. Gunjal 1*, Rajeshree D.

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Chapter 2: Digitization of Sound

Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication

Speech Compression for Better Audibility Using Wavelet Transformation with Adaptive Kalman Filtering

Digital Signal Processing of Speech for the Hearing Impaired

Copyright S. K. Mitra

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Speech Synthesis using Mel-Cepstral Coefficient Feature

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

TWO ALGORITHMS IN DIGITAL AUDIO STEGANOGRAPHY USING QUANTIZED FREQUENCY DOMAIN EMBEDDING AND REVERSIBLE INTEGER TRANSFORMS

Transcription:

ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract The human hearing system is the most robust speech processor despite noisy environments. This work presents a new computational model for our auditory system by exploring the psychoacoustical masking properties. The model is then applied to speech coding in the perceptual domain. The coding algorithm is capable of producing high quality coded speech and audio, which account for temporal as well as spectral details. The proposed filterbank is also applied to speech denoising in the perceptual domain. The enhanced speech is of good perceptual quality. School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, Australia. mailto:ll.lin@ee.unsw.edu.au University of New South Wales University of New South Wales See http://anziamj.austms.org.au/v45/ctac2003/lin2/home.html for this article, c Austral. Mathematical Soc. 2004. Published September 1, 2004. ISSN 1446-8735

ANZIAM J. 45 (E) ppc964 C980, 2004 C965 Contents 1 Introduction C965 2 A critical band scale auditory filterbank C966 3 Application of an auditory filterbank to speech processing C970 3.1 Speech coding using an auditory filterbank......... C970 3.2 Speech denoising using an auditory filterbank....... C975 4 Conclusions C976 References C979 1 Introduction When our ear is excited by an input stimulus, different regions of the basilar membrane respond maximally to different frequencies, that is, a frequency tuning occurs along the membrane. We can therefore think of the response patterns as due to a bank of cochlea filters along the basilar membrane. Adequate modelling of the principal behaviour of the peripheral auditory systems is a very difficult problem. Earlier models used transmission line representations to simulate basilar motion [6]. Recently parallel auditory filterbanks such as the Gammatone filters [7], have become very popular as a reasonably accurate alternative for auditory filtering. A parallel auditory filterbank is easily inverted and hence has applications in auditory-based speech and audio processing. In this work we present a new parallel auditory filterbank on the critical band scale. The filterbank models psychoacoustic tuning curves obtained from the well known masking curves. Current applications of speech and audio coding algorithms include cellular and personal communications, teleconferencing, secure communications.

1 Introduction C966 Low bit rate speech coders provide impressive performance above 4 kbps for speech signals. But do not perform well on musical signals. Similarly, transform coders perform well for music signals, but not for speech signals at lower bit rates. There is therefore a need for high quality coders that work equally well with either speech or general audio signals. In this work we propose a scheme for a universal coder based on an auditory filterbank model that handles both wide band speech and audio signals. Speech noise reduction is a very important research field with applications in many areas such as voice communication and automatic speech recognition. The most popular methods, with many variants, are Wiener filtering and spectral subtraction [4]. Although these methods reduce the noise, they also reduces speech power and hence introduce speech distortion. In this work we propose a denoising technique based on an auditory filterbank and a new perceptual modification of Wiener filtering. Speech distortion is reduced and speech intelligibility is improved. 2 A critical band scale auditory filterbank This section presents a parallel auditory filterbank model that matches psychoacoustical tuning curves. The tuning curves are obtained by exploring the relation between auditory masking and tuning curves and the similarity of the masking curves in the critical band scale. Details are described by Lin, Ambikairajah and Holmes [5]. The transfer function of the critical-band auditory filterbank that models the psychoacoustical tuning curves is developed in the z-domain [5]: G(z) = (1 r 0z 1 )(1 2r B cos(2πf B /f s )z 1 + r 2 B z 2 ) (1 2r A cos(2πf A /f s )z 1 + r 2 A z 2 ) 4, (1) where f s = 16 khz is the sampling frequency, and the parameters f A = f 2 c + B 2 w and r A = e 2πBw/fs.

2 A critical band scale auditory filterbank C967 The parameter B w is calculated using the formula in [8]: B w = 25 + 75[1 + 1.4(f c /1000) 2 ] 0.69, Z c = 13 arctan(0.76f c /1000) + 3.5 arctan(f c /7500) 2, where Z c is the corresponding critical band rate of f c. The parameters r 0 and r B are chosen as r 0 = 0.955 and r B = 0.985. We use the following empirical formula to choose f B : f B = 117.5(f c /1000) 2 + 1135.5(f c /1000) + 277.0. The frequency response of the 21 critical band auditory filters in the frequency range of 0 to 8 khz is shown in Figure 1 by the dashed lines. The proposed critical-band auditory filterbank is also approximately powercomplementary. That is, M G i (e jω ) 2 C, (2) i=1 where C is a constant and G i (e jω ) is the frequency response of the analysis filter at the ith channel and M is the total number of channels. If we choose the synthesis filter as h i (n) = g i ( n) for i = 1,..., M, (3) then the synthesis filterbank is implemented using fir filters obtained by time-reversal of the impulse responses of the corresponding analysis filters. The signal reconstruction is nearly perfect, that is, M i=1 g i(n) h i (n) Cδ(n). Figure 1 shows the overall analysis/synthesis frequency response by the solid line. It resembles the frequency response of an all-pass filter. The implementation of the analysis/synthesis filterbank scheme is shown in Figure 2. Each analysis filter is implemented as an iir filter with 8 poles and 3 zeros. Each synthesis filter is implemented as a fir filter with 128 coefficients. An 8 ms delay is required to make the filter causal if f s = 16 khz. Between the analysis and synthesis sections is the processing block that carries out speech coding or denoising algorithms, which is described next.

2 A critical band scale auditory filterbank C968 Figure 1: Frequency response of the auditory filterbank; dashed: analysis filters, solid: overall analysis/synthesis response.

2 A critical band scale auditory filterbank C969 Analysis x(n) x 1( n ) Filter g 1 (n) Synthesis Filter h 1 (n) xˆ1 ( n) xˆ ( n) Analysis Filter g 2 (n) x 2 ( n ) Processing Synthesis Filter h 2 (n) xˆ 2 ( n) Analysis Filter g M (n) x M (n) Synthesis Filter h M (n) xˆ M ( n) Figure 2: Speech processing based on an auditory filterbank.

2 A critical band scale auditory filterbank C970 3 Application of an auditory filterbank to speech processing 3.1 Speech coding using an auditory filterbank The first step of the coding scheme is to filter the speech/audio signal by the critical-band analysis filters g i (n). The output of each filter, x i (n), is then half-wave rectified, and the positive peaks of the critical band signals are located. Physically, the half-wave rectification process corresponds to the action of the inner hair cells, which respond to movement of the basilar membrane in one direction only. Peaks correspond to higher rates of neural firing at larger displacements of the inner hair cell from its position at rest [2, 3]. This process results in a series of critical band pulse trains, where the pulses retain the amplitudes of the critical band signals from which they were derived. Figure 3 shows, using spikes, a sequence of such pulses for the critical band centred at 1 khz. The masking properties of human auditory system are applied to eliminate redundant pulses. Because lower power components of the critical band signals are rendered inaudible by the presence of larger power components in neighbouring critical bands, a simultaneous masking model is employed. Weak signal components become inaudible by the presence of stronger signal components in the same critical band that precede or follow them in time, and this is called temporal masking. When the signal precedes the masker in time, it is called pre-masking; when the signal follows the masker in time, the condition is called post-masking [1, 9, 10]. A strong signal can mask a weaker signal that occurs after it and a weaker signal that occurs before it. Both temporal pre-masking and temporal post-masking are employed in this work to reduce the number of pulses. Figure 3 shows an example of post-masking with the masking thresholds shown using the dashed line. All pulses with amplitudes less than the masking threshold are discarded. The darkened spikes are the pulses to be kept after applying post-masking.

3 Application of an auditory filterbank to speech processing C971 140 Post masking 120 100 80 60 40 20 0 0 20 40 60 80 100 120 140 samples Figure 3: Pulse reduction using post-masking; solid lines: pulses, dashed lines: thresholds (centre frequency 1 khz).

3 Application of an auditory filterbank to speech processing C972 The upper panel in Figure 4 shows the pulses locations of 21 channels obtained at the stage of peak-picking. The lower panel in Figure 4 shows the pulses retained after applying auditory masking. The purpose of applying masking is to produce a more efficient and perceptually accurate parameterization of the firing pulses occurring in each band. The pulse train in each critical band after redundancy reduction was finally normalized by the mean of its non-zero pulse amplitudes across the frame. For each frame, the signal parameters requiring for coding are the gains of the critical bands and the amplitudes and positions of the pulses. Each critical band gain is quantized to 6 bits and the amplitude of each pulse is quantized to 1 bit. The pulse positions are coded using a new run-length coding technique. The overall average bit rate resulting from this coding scheme is 58 kbps. The synthesis process starts with decoding to obtain the pulse train for each channel, and then filtering the pulse train by the corresponding fir synthesis filter h i (n). Summing the outputs from all filters results in the reconstructed speech or audio signal, which is perceptually the same as the original. The lower panel in Figure 5 shows one frame of the resynthesised speech based on the decoded pulse trains. The corresponding original speech is shown in the upper panel of Figure 5. The duration of the speech frame is 32 ms (512 samples for f s = 16 khz). The advantage of this coder is that it works equally well with either speech or general audio signals, is highly scalable, and is of moderate complexity. Further research is required to examine the statistical correlation and redundancy among the pulses, and investigate the use of Huffman coding or arithmetic coding techniques to reduce the bit rate further.

3 Application of an auditory filterbank to speech processing C973 C h a n n e l N o. 2 4 6 8 1 0 1 2 1 4 (a) 1 6 1 8 2 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 s a m p le s (b) C h a nnel N o. 5 10 15 20 50 100 150 200 250 300 350 400 450 500 samples s a m p le s Figure 4: Pulse trains of 21 critical bands; (a) before auditory masking, (b) after auditory masking.

3 Application of an auditory filterbank to speech processing C974 0.03 (a) Original speech 0.02 0.01 0-0.01-0.02-0.03 0 50 100 150 200 250 300 350 400 450 500 0. 0 3 (b) Reconstructed speech 0. 0 2 0. 0 1 0-0. 0 1-0. 0 2-0. 0 3 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 s a m p l e s Figure 5: A frame of the original speech and its reconstruction.

3 Application of an auditory filterbank to speech processing C975 3.2 Speech denoising using an auditory filterbank Assume that the input speech to the filterbank is corrupted by additive noise; that is, x(n) = s(n) + w(n), where s(n) is the clean speech and w(n) is the additive noise. Both s(n) and w(n) are assumed zero-mean and uncorrelated. The first part of our speech denoising scheme is to decompose the noisy speech x(n) into noisy critical band signal (Figure 2): x i (n) = g i (n) x(n) = s i (n) + w i (n), (4) where s i (n) = g i (n) s(n) is the output from the ith critical band filter when the input to the filterbank is the clean speech only, and w i (n) = h i (n) w(n) is the corresponding output when the input is the noise only. Both signals, s i (n) and w i (n), are zero-mean and uncorrelated, since each auditory filter is a narrow bandpass filter and the clean speech s(n) and the noise w(n) are uncorrelated. Then the denoised subband signal is ŝ i = K i x i (n), (5) where the K i (i = 1,..., M) are the denoising gains to be determined. Define σ 2 s i = E{s 2 i (n)} and σ 2 w i = E{w 2 i (n)}. The denoising gain K i is obtained by minimising J i = (K i 1) 2 σ 2 s i + µk 2 i max{σ 2 w i T i, 0}. (6) The first part of the above equation (K i 1) 2 σs 2 i represents the speech distortion due to denoising; the second part Ki 2 max{σw 2 i T i, 0} represents the noise residual. The parameter µ allows a trade-off between signal distortion and noise: if µ is large the noise is reduced, but there is greater signal distortion. T i is the estimated masking threshold due to the speech signal. The noise is included in this perceptual criterion only if it exceeds the masking threshold. The denoising gain is then K i = σ 2 s i σ 2 s i + µ max{σ 2 w i T i, 0}. (7)

3 Application of an auditory filterbank to speech processing C976 When the noise σw 2 i is under the masking threshold T i, the gain K i will always be 1. The gain decreases as the noise exceeds this level, but it will always be larger than the optimum solutions to the conventional Wiener problems [4]. The speech distortion is always smaller than achieved with the Wiener solution (that is, if masking is not allowed for). The noise residual is always larger than with the Wiener solution, but the difference will not be audible due to auditory masking effects. The synthesis process starts with filtering ŝ i (n) by the corresponding fir synthesis filter h i (n). Summing the outputs from all filters results in the denoised speech. The proposed denoising technique is tested on a variety of noises including pink noise, car noise and tank noise. Informal listening demonstrates that the perceptually modified Wiener filter gives denoised speech with more intelligibility than the traditional Wiener filter. An example of speech denoising with car noise of signal-to-noise ratio of 5 db is shown in Figures 6 and 7. See the clean, noisy and denoised sentences plotted in Figure 6. The denoising gains obtained using the perceptual Wiener filtering in two channels are shown by the solid lines and the conventional Wiener filtering gains are shown by the dashed lines in Figure 7. See that the gain resulted from the proposed denoising approach is always higher than the gain from the conventional Wiener filter and hence speech distortion is reduced. 4 Conclusions We present a new parallel auditory filterbank that models the psychoacoustical tuning curves. The model is applied to speech coding and speech denoising in the perceptual domain. The decomposition of speech signal into critical band signals enables easy application of auditory masking properties to reduce bit rate in coding and speech distortion in denoising. The auditorysystem-based coding paradigm produces high quality coded speech or audio,

4 Conclusions C977 (a) Clean speech (b) Noisy speech (c) Denoised speech Figure 6: Clean, noisy and denoised speech sentences.

4 Conclusions C978 Figure 7: Denoising gains for channels 5 and 15; solid: perceptual Wiener filtering, dotted: conventional Wiener filtering.

4 Conclusions C979 is highly scalable, and is of moderate complexity. The perceptually modified Wiener filter results in denoised speech with more improved intelligibility and less speech distortion than the conventional Wiener filter. References [1] E. Ambikairajah, A. G. Davis and W. T. K. Wong. Auditory masking and mpeg-1 audio compression. Electr. & Commun. Eng. Journal, 9(4):165 197. C970 [2] E. Ambikairajah, J. Epps and L. Lin. Wideband speech and audio coding using Gammatone filter banks. Proceedings of the 2001 International Conference on Acoustics, Speech, and Signal Processing, pages 773 776, 2001. C970 [3] G. Kubin and W. B. Kleijn. On speech coding in a perceptual domain. Proceedings of the 1999 International Conference on Acoustics, Speech, and Signal Processing, pages 205 208, 1999. C970 [4] J. S. Lim and A. V. Oppenheim. Enhancement and bandwidth compression of noisy speech. Proc. IEEE, 67(12):1586 1604, 1979. C966, C976 [5] L. Lin, E. Ambikairajah and W. H. Holmes. Auditory filterbank design using masking curves. Proceedings of the 7th European Conference on Speech Communication and Technology, pages 411 414, 2001. C966 [6] R. F. Lyon. A computational model of filtering detection and compression in the cochlea. Proceedings of the 1982 International Conference on Acoustics, Speech, and Signal Processing, pages 1282 1285, 1982. C965

References C980 [7] R. D. Patterson, M. Allerhand and C. Giguere. Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform. J. Acoust. Soc. Am., 98:1890 1894, 1995. C965 [8] E. Zwicker and E. Terhardt. Analytical expressions for critical band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am., 68:1523 1525, 1980. C967 [9] E. Zwicker and U. T. Zwicker. Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system. J. Audio Eng. Soc., 39(3):115 125, 1991. C970 [10] E. Zwicker and H. Fastl. Psychoacoustics: Facts and models. Springer-Verlag, 1999. C970