Subband Analysis of Time Delay Estimation in STFT Domain

Similar documents
Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

Automotive three-microphone voice activity detector and noise-canceller

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Nonuniform multi level crossing for signal reconstruction

Perceptual Distortion Maps for Room Reverberation

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Time Delay Estimation: Applications and Algorithms

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

A spatial squeezing approach to ambisonic audio compression

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

The analysis of multi-channel sound reproduction algorithms using HRTF data

Reducing comb filtering on different musical instruments using time delay estimation

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

Chapter 4 SPEECH ENHANCEMENT

FFT analysis in practice

Frequency Domain Representation of Signals

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Discrete Fourier Transform (DFT)

Binaural Cue Coding Part I: Psychoacoustic Fundamentals and Design Principles

Objective Evaluation of Edge Blur and Ringing Artefacts: Application to JPEG and JPEG 2000 Image Codecs

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

IMPROVED COCKTAIL-PARTY PROCESSING

PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS

SIGNALS AND SYSTEMS LABORATORY 13: Digital Communication

Sound Source Localization using HRTF database

MPEG-4 Structured Audio Systems

Sampling and Signal Processing

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Enhanced Waveform Interpolative Coding at 4 kbps

Combined Use of Various Passive Radar Range-Doppler Techniques and Angle of Arrival using MUSIC for the Detection of Ground Moving Objects

Introduction of Audio and Music

GUJARAT TECHNOLOGICAL UNIVERSITY

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Evaluation of Audio Compression Artifacts M. Herrera Martinez

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Recent Advances in Acoustic Signal Extraction and Dereverberation

Linear Time-Invariant Systems

Spatialized teleconferencing: recording and 'Squeezed' rendering of multiple distributed sites

Multirate Digital Signal Processing

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels

ME scope Application Note 01 The FFT, Leakage, and Windowing

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Proceedings of Meetings on Acoustics

Laboratory Assignment 4. Fourier Sound Synthesis

Audio Restoration Based on DSP Tools

Robust Low-Resource Sound Localization in Correlated Noise

Short-Time Fourier Transform and Its Inverse

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

IMPULSE RESPONSE MEASUREMENT WITH SINE SWEEPS AND AMPLITUDE MODULATION SCHEMES. Q. Meng, D. Sen, S. Wang and L. Hayes

Presentation Outline. Advisors: Dr. In Soo Ahn Dr. Thomas L. Stewart. Team Members: Luke Vercimak Karl Weyeneth. Karl. Luke

Performance Analysis of OFDM for Different Digital Modulation Schemes using Matlab Simulation

TWO ALGORITHMS IN DIGITAL AUDIO STEGANOGRAPHY USING QUANTIZED FREQUENCY DOMAIN EMBEDDING AND REVERSIBLE INTEGER TRANSFORMS

Introduction to Audio Watermarking Schemes

DISCRETE FOURIER TRANSFORM AND FILTER DESIGN

Interpolation Error in Waveform Table Lookup

Auditory modelling for speech processing in the perceptual domain

Speech Synthesis using Mel-Cepstral Coefficient Feature

Convention Paper Presented at the 120th Convention 2006 May Paris, France

Signal processing preliminaries

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

FPGA implementation of DWT for Audio Watermarking Application

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

A Java Virtual Sound Environment

Sound source localization and its use in multimedia applications

Chapter 9. Chapter 9 275

Comparison of a Pleasant and Unpleasant Sound

Communications Theory and Engineering

A Faster Method for Accurate Spectral Testing without Requiring Coherent Sampling

ENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE

SAMPLING THEORY. Representing continuous signals with discrete numbers

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Audio Compression using the MLT and SPIHT

Module 9: Multirate Digital Signal Processing Prof. Eliathamby Ambikairajah Dr. Tharmarajah Thiruvaran School of Electrical Engineering &

Module 3 : Sampling and Reconstruction Problem Set 3

!"!#"#$% Lecture 2: Media Creation. Some materials taken from Prof. Yao Wang s slides RECAP

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

SOURCE LOCALIZATION USING TIME DIFFERENCE OF ARRIVAL WITHIN A SPARSE REPRESENTATION FRAMEWORK

Sound pressure level calculation methodology investigation of corona noise in AC substations

Msc Engineering Physics (6th academic year) Royal Institute of Technology, Stockholm August December 2003

Electrical and Telecommunication Engineering Technology NEW YORK CITY COLLEGE OF TECHNOLOGY THE CITY UNIVERSITY OF NEW YORK

Signals. Continuous valued or discrete valued Can the signal take any value or only discrete values?

University of Bristol - Explore Bristol Research. Link to publication record in Explore Bristol Research PDF-document.

III. Publication III. c 2005 Toni Hirvonen.

Transcription:

PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au, dsen@ee.unsw.edu.au, wenliang.lu@student.unsw.edu.au Abstract For decades, time delay estimation (TDE) has been a significant issue in areas of radar, speech and audio processing. In recent years, the accurate estimation of TDE has been an important topic in multichannel audio compression where the inter-aural time delay (ITD) between each channel and a reference channel is a natural parameter used to represent multiple audio channels. This paper analyzes and evaluates various TDE algorithms in terms of their accuracy and computational complexity. 1. Introduction The estimation of TDE also known as the Time Difference of Arrivals (TDOA) in various fields (Carter 1987) is used extensively in radar, sonar, auditory localization (Blauert 1983) and varieties of other signal processing and telecommunications applications. Recent research on perceptual compression of multichannel audio (Baumgarte and Faller 23), requires TDE algorithms to compute the interaural time delay(itd) within each critical band. The most popular TDE algorithms are variations of the concept of Cross-Correlation (CC) (Hertz 1986). Among these, Generalized Cross-Correlation (GCC) (Knapp and Carter 1976) has been well accepted as an efficient method, and possesses integer sample resolution. However, higher precision of subsample resolution is typically required in various applications including high-fidelity audio applications where the effect of reverberation and sound spatialization is highly dependent on correct representation of delay as a function of frequency. The accurate estimation of time delay within frequency subbands is challenging both in terms of accuracy and computational complexity. This paper focuses on the sub-band analysis of TDE in the frequency domain and is aimed at improving the accuracy and complexity of the method used in the Binaural Cue Coding (Baumgarte and Faller 23) algorithm. Several algorithms with subsample resolution are investigated. In the first section of the paper, we discuss the theoretical basis of methods, such as Circular Cross-Correlation (CXC) and Linear Regression Modeling, followed by detailed analysis and results in subsequent sections. Various test signals including single and multiple sinusoidal signals are used to test the methods. 2. Methodology The following four techniques for TDE, all implemented in the frequency domain are investigated: 2.1. Cross-correlation (CC) in frequency domain Consider two discrete signals x[k] and y[k]. The signals may be the two channels from a stereo recording or two signals from opposite sides of a reverberant chamber. The signals can be expressed as: x[k] = s[k] + n 1 [k]; (1) y[k] = s[k d] + n 2 [k]. (2) where s[k d] is a signal achieved by delaying original signal s[k] by d samples. The ambient noise can be modeled by using Additive White Gaussian oise as n 1 [k] and n 2 [k]. Thus the cross-correlation of the two signals, x[k] and y[k], is given c xy [τ] = x[k]y[k + τ]. (3) Since both x[k] and y[k] are real, the above equation can also be written as: c xy [τ] = x[ k] y[k]. (4) meaning of course that the cross-correlation of above two signals can be expressed as a linear convolution. Hence, its Discrete Time Fourier Transform (DTFT) can be expressed in terms of X(θ) and Y (θ), which are the DTFTs of x[k] and y[k] respectively: C xy (θ) = F{c xy [τ]} = X(θ)Y (θ) (5), where X(θ) stands for the complex conjugate of X(θ). Thus the inverse DTFT, c xy [τ] in time domain is given c xy [τ] = 1 π 2π R{ X[θ]Y [θ]e jτθ dθ} (6) π The delay d corresponds to the maximum of c xy [τ], and thus can be computed from Equation (6). This means that the above cross-correlation of the two signals can be evaluated in the frequency domain. The delay, d can be determined d = argmax{c xy [τ]} (7) 2.2. Subband analysis of TDE in frequency domain In the case of Binaural Cue Coding and other parametric audio compression techniques, the time delay needs to be determined within a relatively small frequency range Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. Paul Warren & Catherine I. Watson. ISB 9581946 2 9 University of Auckland, ew Zealand. December 6-8, 26. Copyright, Australian Speech Science & Technology Association Inc.

rather than the entire frequency domain. (Baumgarte and Faller 22). When replacing the DTFT with a Discrete Fourier Transform (DFT), this subband analysis implies that only a few frequency components are available per band to compute the delay. The above technique lends itself very easily to this subband analysis. Time domain crosscorrelation would require the conversion of each subband to the time domain and thus would prove computationally cumbersome. Consider a subband with boundaries A l and A h which represent DFT indices and where (A h > A l ). The subband is thus a frequency bin containing frequency components ranging from (A l /)F s to (A h /)F s, where F s is the sampling frequency and is fame size or the length of DFT. From Equation (6), the cross-correlation for the subband can be written as: c lh [τ] = 1 A h 1 R{ A l X[θ k ]Y [θ k ]e jτθk } (8) It should be noted that A h, the upper boundary of the subband is omitted. From Equation (7), the time delay in the subband can be determined from the following equation: d lh = argmax{c lh [τ]} (9) 2.3. Circular cross-correlation (CXC) This second method which we call Circular Cross- Correlation (CXC) is a variation of the above Cross- Correlation in frequency domain technique. The method is derived in order to better the integer resolution of the delay estimation in the previous technique while keeping computational complexity penalties as low as possible. In order to determine a time delay with non-integer (or sub-sample) resolution, it would generally require interpolation of the original signals. In other words, upsampling the original sequences by a factor of m, before computing the cross-correlation. Thus, the upsampled delay m d would be estimated m d = argmax{cˆxŷ [τ]} (1) where cˆxŷ [τ] stands for the cross-correlation of the upsampled signals, ˆx[k] and ŷ[k]. Consequently, the precision is increased to 1 m samples when using this method. If instead of computing the cross-correlation in the time domain, a frequency domain approach is taken, where we can arbitrarily shift one signal by introducing a phase delay, the method would still possess the higher precision than a simple cross-correlation method, while maintaining a much lower complexity compared to a interpolated time-domain approach. The technique may however be prone to circular shifts and our paper discusses whether this produces any appreciable error in the delay computation. 2.4. on-circular cross-correlation (CXC) This method is designed to avoid the effects of circular shifting in the previous method. while maintaining the same level of accuracy as the circular cross correlation (CXC) approach. To avoid any circular shifts, we pad the original signals with zeros at the end. While eliminating circular shifts, this approach could result in a dramatic increment in computational complexity, especially when exchanging between time domain and frequency domain, as the signal s length is consequently increased due to the padded samples. Later sections of the paper show results which make it possible to discuss whether the extra computation is warranted. 2.5. Linear regression modeling (LRM) The DFT of the above original signals, x[k] and y[k],can be represented with the following relationship in the frequency domain (assuming Gaussian noise n 1 [k] and n 2 [k] have relatively low energy spectrum in frequency domain and thus can be ignored): x[k] = X[θ]; (11) y[k] x[k d] = X[θ]e jdθ. (12) In other words, a time delay contributes changes to the phases spectrum rather than their magnitude spectrum. Thus, the time delay d can be derived as: d = Ψ[θ] θ (13) where Ψ[θ] is the phase difference between two signals. Multiple Linear Regression methods can thus be employed to estimate the slope of curve Ψ[θ] to θ, which can also be regarded as the group delay between two signals. However, because of the limited number of samples ( when the subbands represent critical bands, and coding-delay considerations preclude the usage of long time frames, only a couple of frequency samples are available at the lower end of the spectrum), errors can be significant when using this technique. 2.6. Zero-padded linear regression modeling (ZPLRM) To alleviate the above problem of limited number of frequency samples while not imposing a coding-delay penalty, we consider zero padding the original sequences in time domain. This has the effect of interpolating samples in frequency domain. Hence, the precision is expected to improved, especially at low frequency region, at the expense of computational complexity. 3. Experiments, results and discussion Sinusoidal signals, which could be processed by human ears, are fundamental elements of natural or artificial sound signals (SmithIII 23). As a result, signals composed of single or multiple sinusoids are employed to evaluate each of the above algorithms. Additionally, sub-band boundaries A b as discussed previously could be decided by either uniform bandwidth segmentation, as designed in MPEG-I (Pan 1995), or critical bands partition, as shown in Table 1. The latter, whose bandwidth approximates 2ERB, is designed by Faller (Faller and Baumgarte 23). Other factors involved in this paper are assigned as following: original sampling PAGE 212 Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. Paul Warren & Catherine I. Watson. ISB 9581946 2 9 University of Auckland, ew Zealand. December 6-8, 26. Copyright, Australian Speech Science & Technology Association Inc.

PAGE 213 Table 1: Critical Band Boundaries A A 1 A 2 A 3 A 4 A 5 A 6 2 4 7 11 15 2 A 7 A 8 A 9 A 1 A 11 A 12 A 13 26 34 44 56 71 9 113 A 14 A 15 A 16 A 17 A 18 A 19 A 2 142 178 222 277 345 43 513.2.4.6 TDE for Single Sin Wave by CXC.8.5 1 1.5 2 x 1 4 TDE for Single Sin Wave by CXC.2.4.6.8.5 1 1.5 2 x 1 4 TDE for Single Sin Wave by LRM 6 4 2.5 1 1.5 2 x 1 4 TDE for Single Sin Wave by ZPLRM 5 5 1.5 1 1.5 2 x 1 4 Figure 1: TDE for Single Sinusoid, Mean Error vs Frequency, Uniform Subband BW frequency F s = 32kHz, signal length or Frames size = 124 and upsampling factor m = 8. Delay is artificially introduced into our signals and is chosen from a random vector whose elements are real numbers varying between and 2. The limits of this artificial delay are chosen with the highest subbands in mind to avoid phase ambiguity of the sinusoidal signals. 3.1. Tests on single sinusoids For this experiment, the 124 point DFT is divided into 32 subbands with uniform bandwidth. The experiment is carried out with the purpose of testing the performances of the different algorithms as a function of the frequency of the sinusoid. All four methods described above: CXC, CXC, LRM and ZPLRM are tested. The frequency of the single sinusoidal is varied between Fs Fs to 512 by steps of.5 Fs. Thus the discrete spectrum of the sinusoid can be located exactly on an integer sample or between samples. Results are shown in Fig. 1, which contains curves of mean error versus frequency for each of the above four methods. From Fig. 1 above, it is obvious that the algorithms, CXC and CXC, are more accurate than the other two methods of evaluating different time delays. The mean error achieved by ZPLRM is much lower than that of LRM. Thus, as expected, zero padding does produce better performance. However, even with zero-padding, the linear regression methods do not compare with cross-correlation methods. The sharp increment on mean error when using CXC and CXC for extremely high frequencies can be explained 2 15 1 5 TDE for Single Sin Wave 5.5 1 1.5 2 delay in sample Figure 2: TDE for Single Sinusoid, Mean Error vs Delay Values, Uniform Subband BW by the periodicity of the cross-correlation. Since periods of high frequency sinusoids are close to the maximum delay value, the peak value is likely to be found in another period, which introduces such error at extremely high frequencies. The error for LRM and ZPLRM tends to increase with increasing frequency and is a direct result of the limits of linear-regression for high frequency sinusoids. The relationship between mean error and given delay value is shown in Figure 2. The performances of CXC and CXC over various delay values are almost invariant. The results for LRM and ZPLRM also bear testament to the fact that the use of zero padding is capable of reducing estimation error, especially in methods using Linear Regression model. Similar results are achieved when the subbands have non-uniform bandwidth given by Table1. 3.2. Tests on multiple sinusoids Signals consisting of multiple sinusoidal signals are employed in this set of experiments. In addition to the same delay set as used in the previous section, several other conditions are addressed. First, uniform bandwidth subbands are considered with 32 subbands. The sinusoid complex can either be chosen such that their frequency components correspond exactly to the DFT indices or alternatively, they can be chosen such they lie between the integer bins. Figure 3 represents the former case and shows similar results to the single sinusoid case presented in the previous section. Figure 4 presents the latter case with non-integer frequency components. For this case, the mean error is larger especially at low frequencies when using CXC, than any one of the previous cases. This is because the energy of each component is spread to adjacent frequency bins. This perhaps is a more realistic representation of what happens with natural audio signals whose frequency components will invariably lie on the frequency continuum rather than at distinct frequency bins. A second case is considered, where the number of frequency components per subband are gradually increased to CXC LRM CXC ZPLRM Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. Paul Warren & Catherine I. Watson. ISB 9581946 2 9 University of Auckland, ew Zealand. December 6-8, 26. Copyright, Australian Speech Science & Technology Association Inc.

PAGE 214 TDE for Multi, Integer 32/32 by CXC 1 1.5 1 1.5 2 x 1 4 TDE for Multi, Integer 32/32 by LRM 2 1 1.5 1 1.5 2 x 1 4 TDE for Multi, Integer 32/32 by CXC.6.4.2.2.5 1 1.5 2 x 1 4 TDE for Multi, Integer 32/32 by ZPLRM 1 1.5 1 1.5 2 x 1 4 6 4 2 TDE for Multi Sin Waves by CXC 2 4 6 8 1 12 14 16 32/32 TDE for Multi Sin Waves by CXC 64/32 6 128/32 4 256/32 512/32 2 124/32 248/32 2 4 6 8 1 12 14 16 Figure 3: TDE for Multi-Sinusoids, Integer Samples Figure 5: TDE for Multi-Sinusoids, TDE for Multi, on integer 32/32 by CXCTDE for Multi, on integer 32/32 by CXC.6.6.4.2.2.2.5 1 1.5 2.5 1 1.5 2 x 1 4 x 1 4 TDE for Multi, on integer 32/32 by LRMTDE for Multi, on integer 32/32 by ZPLRM 1 1 5 5.5 1 1.5 2 x 1 4.4.2 5 5 1.5 1 1.5 2 x 1 4 Figure 4: TDE for Multi-Sinusoids, on-integer Samples investigate the performance of different techniques depending on the number of frequency components per subband. As shown in Fig. 5, both CXC and CXC maintain good TDE at relatively high frequencies above 4 khz, while at low frequencies the mean error tends to increase with more sinusoidal components. 4. Computational complexity The increase in length of the signals due to the use of upsampling results in a dramatic increase in computational complexity when compared to the simple cross-correlation (CC) method. The CXC method however does not require upsampling but produces sub-sample resolution timedelays meaning that the complexity of this algorithm is comparable with the CC method while producing higher accuracy. Padding zeros in time domain to interpolate samples in frequency domain requires more computations and thus the CXC is computationally more expensive compared with both CC and CXC. The same result could be observed between the LRM and ZPLRM methods, which means ZPLRM slightly improves the performance of LRM at the expense of computations. 5. Conclusions In this paper, four different techniques of CXC, CXC, LRM and ZPLRM, have been evaluated in terms of their ability to estimate time difference between two signals. The two signals are composed of additive sinusoids. The performance of the algorithms are compared in terms of their accuracy and computational complexity. Generally speaking, CXC and CXC perform better than methods of LRM and ZPLRM, which means Multiple Linear Regression might not be a suitable model for this application of estimating ICTDs for multichannel audio compression. By using CXC and CXC, the precision can be improved to 1/m samples, where m is the upsampling factor. Additionally, zero padding in time domain, applied in CXC and ZPLRM, is shown to reduce the mean error especially at high frequencies. Upsampling is also able to improve precision as expected. Zero-padding imposes a complexity penalty and thus the CXC and ZPLRM algorithms are computationally more expensive, compared to CXC and LRM methods. References Baumgarte, F. and C. Faller (23). Binaural Cue Coding Part I: Psychoacoustic fundamentals and design principles. IEEE Transactions on Speech and Audio Processing, 11, 59 519. Baumgarte, F. and C. Faller (May 22). Estimation of Auditory Spatial Cues for Binaural Cue Coding. Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2), 2, 181 184. Blauert, J. (1983). Spatial Hearing. The Psychophysics of Human Sound Localization. Cambridge, Mass.: MIT Press. Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. Paul Warren & Catherine I. Watson. ISB 9581946 2 9 University of Auckland, ew Zealand. December 6-8, 26. Copyright, Australian Speech Science & Technology Association Inc.

Carter, G. C. (1987). Coherence and Time Delay Estimation. Proceedings of the IEEE, 75, 236 255. Faller, C. and F. Baumgarte (23). Binaural Cue Coding Part II: Schemes and applications. IEEE Transactions on Speech and Audio Processing, 11, 52 531. Hertz, D. (1986). Time Delay Estimation by Combining Efficient Algorithms and Generalized Cross- Correlation Methods. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34, 1 7. Knapp, C. H. and G. C. Carter (1976). The Generalized Correlation Method for Estimation of Time Delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 32 327. Pan, D. (1995). A Tutorial on MPEG/Audio Compression. IEEE Multimedia Journal, Summer, 6 74. SmithIII, J. O. (23). Mathematics of the Discrete Fourier Transform (DFT), with Music and Audio Applications. Menlo Park, California: W3K Publishing. PAGE 215 Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. Paul Warren & Catherine I. Watson. ISB 9581946 2 9 University of Auckland, ew Zealand. December 6-8, 26. Copyright, Australian Speech Science & Technology Association Inc.