Investigation of Several Types of Nonlinearities for Use in Stereo Acoustic Echo Cancellation

Size: px
Start display at page:

Download "Investigation of Several Types of Nonlinearities for Use in Stereo Acoustic Echo Cancellation"

Transcription

1 686 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Investigation of Several Types of Nonlinearities for Use in Stereo Acoustic Echo Cancellation Dennis R. Morgan, Senior Member, IEEE, Joseph L. Hall, and Jacob Benesty, Member, IEEE Abstract In this paper, we investigate several types of nonlinearities used for the unique identification of receiving room impulse responses in stereo acoustic echo cancellation. The effectiveness is quantified by the mutual coherence of the transformed signals. The perceptual degradation is studied by psychoacoustic experiments in terms of subjective quality and localization accuracy in the medial plane. The results indicate that, of the several nonlinearities considered, ideal half-wave rectification and smoothed half-wave rectification appear to be the best choices for speech. For music, the nonlinearity parameter of the ideal rectifier must be readjusted. The smoothed rectifier does not require this readjustment, but is a little more difficult to implement. Index Terms Acoustic echo cancellation, adaptive filters, nonlinearity, psychoacoustics, stereo. I. INTRODUCTION IN stereo acoustic echo cancellation, there is a fundamental problem in uniquely identifying the receiving room impulse responses because of the coherence between the two loudspeaker signals [1]. This is of particular concern because, lacking proper identification, echo cancellation will depend on the impulse responses in the (actual or synthesized) transmission room. This means that one must track not only changes in the receiving room but also changes in the transmission room, which can be very rapid (e.g., when one person stops talking and another person starts). One successful solution to the fundamental problem is to deliberately add a small amount of distortion to each channel, e.g., through half-wave rectification [2]. This distortion is effective in reducing the coherence and thereby enabling correct identification of the room responses for both fullband stereo room-to-room conferencing [2] [4] and for low frequencies only in a hybrid arrangement [5]. The technique has also been proposed for synthesized stereo in desktop conferencing [6], where the same identification problem arises even though one knows the synthesizing transfer functions. In all of these applications, the nonlinearity technique has been shown to be effective for enabling unique identification of the receiving room impulse responses, yet it is hardly audible for speech signals due to self-masking effects. Until now, however, the perceptual degradation has not been quantified. Manuscript received June 2, 2000; revised April 30, The associate editor coordinating the review of this paper and approving it for publication was Dr. Michael S. Brandstein. D. R. Morgan and J. Benesty are with Bell Laboratories, Lucent Technologies, Murray Hill, NJ USA ( drrm@bell-labs.com; jbenesty@bell-labs.com). J. L. Hall, retired, was with Bell Laboratories, Lucent Technologies, Murray Hill, NJ USA. Publisher Item Identifier S (01) In this paper, we compare the effectiveness of several nonlinearities for achieving the above objectives. It is known that the conditioning of the multichannel covariance matrix, i.e., the ratio of largest to smallest eigenvalues, determines the misalignment of the solution and the speed of convergence of any adaptive algorithm. In [2, App. A], a link is established between the coherence and the covariance matrix, whereby the eigenvalues are lower bounded by the factor. Accordingly, the closer the coherence is to 1, the higher the misalignment and the slower the convergence. Therefore, we choose coherence reduction as a proxy for performance. In Section II, the coherences are derived theoretically for the half-wave rectifier as well as several other memoryless transformations. Parameter values of the nonlinear functions are selected such that they result in similar coherence reduction. These results are supported by simulations described in Section III. Then, in Section IV, we compare, for equivalent coherence reduction, the perceptual quality for both speech and music using formal psychoacoustic methods. We also investigate psychoacoustic effects on localization in the medial plane to see if the nonlinearity results in any impairment. Conclusions are summarized in Section V. II. THEORETICAL COHERENCE CALCULATIONS In this section, we derive mathematical expressions for the coherence of two signals modified by several types of nonlinear transformations and calculate some explicit examples using an actual measured room response. A. Formal Description of Nonlinear Distortion The starting point will be the general memoryless nonlinear transformation where is an arbitrary function. This distorted signal is added to the original signal to form the modified signal where the parameter controls the amount of distortion added. For two signals, such as in stereo teleconferencing, we similarly define (1) (2) (3a) (3b) /01$ IEEE

2 MORGAN et al.: NONLINEARITIES FOR USE IN STEREO ACOUSTIC ECHO CANCELLATION 687 B. General Formulation of Coherence We assume here that the speech signal can be represented as a stationary random Gaussian process over short intervals. This assumption is appropriate for the memoryless nonlinear transformations considered in this paper and will be further justified by the simulation results. If and are joint Gaussian stationary processes with variances and, respectively, we have, using Price s theorem [7], the cross correlation functions (4a) where the constants (4b) (5a) TABLE I OUTPUT CORRELATION FUNCTION OF SEVERAL NONLINEARITIES AS A FUNCTION OF THE NORMALIZED INPUT CORRELATION FUNCTION r () R ()=( ) (5b) and. Expressions (4) and (5a) are a slight generalization of Bussgang s theorem [7] to the case of cross-channel relations. The second form of (5), which is more convenient for our purposes, is derived from the first using integration by parts. Expressions for can be obtained as a function of [8] and are used with the above to compute the auto correlation and cross correlation functions of the modified signals (3) (6) By taking the Fourier transform, we can express these quantities in the frequency domain (7) which are then used to calculate the coherence For, we assume that for balanced operation, and define (8) (9) (10) (In general, the quantity is to be interpreted here as a single constant; we choose this notation because it is suggestive of the completely matched conditions,.) With these definitions, the coherence of the modified signals (8) is expressed as For only now with (11), the same expressions as above are obtained, (12) TABLE II CALCULATED VALUES OF (13) FOR n =1; 2; 3; 4 We note that in (7), the quantities are dimensionless; therefore, the units (and consequently power scaling) of, and hence the values of, will depend on the explicit form of the nonlinearity. C. Coherence with Example Nonlinear Functions Table I adapted from [8], lists some representative expressions of the output correlation function for several simple nonlinearities in terms of the normalized input correlation function. To compute the associated coefficients for these nonlinearities, it will be useful to evaluate integrals of the form (13) which are listed in Table II for. Applying these results in the defining equations (5), (9), and (10), we obtain the values listed in Table III, where for convenience,,, and are defined in terms of a common factor. Finally, substituting these coefficients together with the Fourier transforms of into (11) determines the coherence. D. Parameter Values to Obtain Similar Coherence Properties Measurements were taken of a room impulse response (Hu- MaNet room B [9]) using SYSid [10] over an 8-kHz bandwidth (4096 points with 16-kHz sampling rate). The front and back walls of the listening room are 16.6 ft (5.06 m) long, the side walls are 11.2 ft (3.41 m) long, and the room is 8.0 ft (2.44 m) high. The walls of the room are covered with acoustic-absorption panels. A white noise source was assumed so that products of the room response Fourier transforms determine the spectra

3 688 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Fig. 1. Theoretical coherence of white noise through measured room responses for various nonlinearities. (a) Half-wave, = 0:3. (b) Full-wave, =0:3=2:3. (c) Hard limiter, = =0:15. (d) Square-law, =0:05. (e) Square-sign, =0:15. (f) Cubic, =0:03. TABLE III CROSS CORRELATION CONSTANTS AND COHERENCE PARAMETERS CALCULATED FOR SEVERAL NONLINEARITIES. These spectra were inverse transformed (4096-point IFFT), modified using Table I, and retransformed (4096-point FFT) to obtain, which are then substituted into (11) along with the results of Table III to compute the coherence. Matlab was used to perform all of these calculations and plot the results, which appear in Fig. 1 for the various nonlinearities. In each case, the value of was selected to produce approximately the same amount of coherence reduction averaged over frequency, as shown in the second column of Table IV (the entry in the bottom row will be discussed in the next section). As mentioned in the introduction, the misalignment and convergence properties are more properly determined by the eigenvalues of the covariance matrix, which are lower bounded in terms of the coherence by the factor. Thus, averaged over frequency, is a measure of misalignment and convergence time. We also show these values in the third column of Table IV. These measures are also noted to be roughly comparable across the set of nonlinearities. We note that adding a half-wave rectified signal with factor results in a gain of to positive signals and a gain of 1 to negative signals. For a full-wave rectifier, the corresponding gains are and. Therefore, adding a half-wave rectified signal with factor is exactly equivalent to adding a full-wave rectified signal with factor and rescaling the composite signal by. Thus, Fig. 1(a) and (b) are identical. III. COHERENCE CALCULATIONS FROM SIMULATIONS Computer simulations were performed using Gaussian white noise, speech, and music signals. The white noise was generated using a standard routine (Matlab function). This case is used as a cross-check on the theoretical results of the last section. The speech signal was compiled from a digital speech data-

4 MORGAN et al.: NONLINEARITIES FOR USE IN STEREO ACOUSTIC ECHO CANCELLATION 689 TABLE IV WHITE NOISE COHERENCE MAGNITUDE AND INVERSE EIGENVALUE BOUND, AVERAGED OVER FREQUENCY, FOR NONLINEARITIES AND PARAMETER VALUES OF FIGS. 1 AND 3 base sampled at 16 khz. It consists of the following three sentences spoken by a male talker: Bobby did a good deed. Do you abide by your bid? A teacher patched it up. The signal extends over about 5.3 s at the 16-kHz sampling rate. The music signal was of a piano playing the first few bars of Beethoven s Moonlight Sonata. These signals were convolved with the aforementioned measured room impulse responses and processed (using the Matlab spectrum function) to obtain the magnitude-squared coherence. After taking the square root, we smoothed the coherence magnitude over 100 of 8193 frequency points (approximately 100 Hz). For white noise, the coherence of Fig. 1(a) was likewise smoothed over 50 of 4097 frequency points (approximately 100 Hz) and is plotted in Fig. 2(a). The simulation produced the coherence plotted in Fig. 2(b) and is seen to be in reasonable agreement, thereby verifying the methodology. Having established close agreement between theoretical and simulated coherence, we next use the simulation to compare the coherence for all three signal sources as shown on the left side of Fig. 3 for ideal half-wave rectification. The simulation was also used to compute the coherence of smoothed half-wave rectification [11] (14) where is a parameter used to round the edge of the discontinuous derivative at. This function would be difficult to treat on a theoretical basis. Here, the simulated coherence plots for this function are plotted on the right side of Fig. 3 for and. For purposes of comparison, we list the white noise coherence measures that were computed from the smoothed half-wave rectifier simulation in the bottom row of Table IV. As can be seen, the white noise coherence measures for the smoothed half-wave are both lower than those obtained for the other nonlinearities. However, these differences tend to diminish for speech and Fig. 2. Coherence of white noise through measured room responses for half-wave nonlinearity with = 0:3, smoothed by averaging over 100 Hz blocks. (a) Theoretical. (b) Simulation. music, as is evident in Fig. 3. We have not listed the speech and music coherence measures because they are very dependent on the particular sample used and meaningful results would require much more extensive evaluation over a comprehensive data base relating to the ultimate application. We prefer to separate the variabilities so that on one hand we generally characterize the intrinsic capability of the nonlinearity by the white noise coherence, while on the other hand determine the psychoacoustic degradations with a representative speech and music sample. On the basis of the coherence calculations, we can say at this point that for the nonlinear functions considered here, roughly comparable coherence reduction is achieved for the parameter values in Figs. 1 and 3 [the exception being for music, Fig. 3(e) and (f) to be discussed later]. As previously discussed, this reduction would lead to comparable misalignment and convergence performance for the stereo acoustic echo cancellation problem. Now with this as a basis of comparison, we go on to evaluate the perceptual degradation introduced by these various

5 690 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Fig. 3. Simulated coherence of white noise [top panels (a) (b)], speech [middle panels (c) (d)], and music [bottom panels (e)(f)] through measured room responses for ideal rectification [left panels (a) (c) (e)] with = 0:3 and smoothed rectification [right panels (b) (d) (f)] with =1:0, c= =0:65. nonlinearities with respect to subjective quality and auditory localization. IV. PERCEPTUAL DEGRADATION A. Subjective Quality The psychoacoustic listening experiment described in this section determines the effect of the above nonlinear transformations on the subjective quality of speech and of music. Three audio tokens were used: a male talker and a female talker uttering the sentence A teacher patched it up and the first few bars of Moonlight Sonata as used in Section III. All three tokens were stored as 16-bit PCM with a sampling rate of 16 khz and were normalized to an rms level of 1528 units. The male-talker token had a duration of 1.8 s, and a range of 9963 to units. The female-talker token had a duration of 1.5 s and a range of 9740 to units. The music token had a duration of 5.5 s and a range of 6541 to 6023 units. Signals were presented and controlled by a Concurrent MC5400 computer fitted with a DA04H 16-bit D/A converter. The D/A output was lowpass filtered at 5 khz and presented to a subject inside a double-walled Industrial Acoustics Company soundproof booth. Signals were presented diotically through Sennheiser HD-250 headphones at a comfortable listening level of approximately 80 db SPL. The transformation used for these experiments was to replace the original signals by the modified signals in (3), with one of the following four nonlinear functions in Table I: (A) half-wave, (C) hard limiter, (D) square-law, and (E) square-sign. 1 (For ease in comparison, this designation was chosen to match that used in Fig. 1.) The same parameter values as in Fig. 1 were used for these four nonlinearities. In addition, a fifth nonlinearity, designated as (G), for smoothed half-wave rectification (14) with and, as used in Fig. 3, was also included. For control purposes, an additional processing condition (O) was defined as no processing [ in (3)]. Thirteen subjects took part in this experiment. The ages of the subjects ranged from 33 to 65 years. Some of the subjects had a moderate amount of presbycusis (normal age-related hearing loss), but all subjects had audiologically normal hearing according to the 1964 ISO reference of average hearing loss at 1 The full-wave nonlinearity was not considered since, with appropriate scaling, it is identical to the half-wave nonlinearity, as previously noted. Also, the cubic nonlinearity was not used here because it produced overflow of the 16-bit D/A converter.

6 MORGAN et al.: NONLINEARITIES FOR USE IN STEREO ACOUSTIC ECHO CANCELLATION 691 Fig. 4. Average response ratings for speech and for music. The horizontal bars indicate 95% confidence intervals. Processing conditions correspond to: (A) half-wave, = 0:3; (C) hard limiter, = =0:15; (D) square-law, =0:05; (E) square-sign, =0:15; (G) smoothed half-wave rectification, =1:0 and c= =0:65; (O) no processing ( =0). 500, 1000, and 2000 Hz less than 26 db [12]. There was no apparent relationship between age or amount of presbycusis and performance in this experiment, and results from all 13 subjects are pooled. Each subject took part in a single experimental session. A session consisted of two replications of the 18 stimuli consisting of the three audio tokens either undistorted or processed as in (3) with one of the five nonlinearities described above. Subjects were instructed to indicate the quality of each stimulus by pushing one of five pushbuttons labeled excellent, good, fair, poor, and bad. 2 Printed instructions which the subjects read before the experiment are reproduced here as Appendix A. The five possible response ratings bad to excellent were assigned numerical values one to five, and statistical analysis was done on the resulting set of 468 numbers (three audio tokens six processing conditions thirteen subjects two replications). As was stated above, results from all 13 subjects were pooled. In addition, since analysis revealed no significant differences between results for male and for female talkers, results from the two talkers were pooled. An analysis of variance assuming 13 subject effects plus 39 token processing condition effects ( for the subset of experiments reported here plus another for seven additional conditions included in a larger experiment) gave 95% confidence intervals of 0.18 response units for speech and 0.26 response units for music. Experimental results for speech and for music are summarized in Fig. 4. The average response rating is shown on the horizontal axis, and the two types of source material are shown on the vertical axis. The horizontal bars indicate 95% confidence 2 Note that while these categories are identical to the listening-quality scale recommended for MOS testing in ITU-T Recommendation P.800, the present test should not be regarded as an MOS test; we did not include reference conditions, and many of the listeners had previous experience in speech coding. intervals. The key relating symbols in Fig. 4 to processing condition appears in the caption: A, C, D, and E relate directly to the labels in the subplots of Fig. 1, with the same parameter value for in each case; G is smoothed half-wave rectification (14) with and, relating directly to the right side of Fig. 3; and O designates unprocessed signals. As previously stated, these values were selected to produce approximately the same amount of coherence reduction. Of the four conditions reported in Fig. 4, hard limiting (C) and square-sign distortion (E) can be rejected out of hand: they both received average ratings of fair or worse for both speech and music. For speech, the other three conditions all appear to be quite satisfactory: they all received average ratings between good and excellent. Square-law distortion (D) and smoothed rectification (G) fared almost as well for music as for speech, but half-wave distortion (A) did substantially worse. However, it must be recalled from Fig. 3(e) and (f) that the half-wave rectifier with is overly aggressive for music. Therefore, the value of could be reduced for music, which would improve the perceptual quality. It is interesting to note that differences between average ratings for speech and music were greatest for the ideal half-wave rectifier (A) and hard limiter (C), whereas conditions O, D, E, and G produced differences less than 0.5. Note that the latter are exactly the four conditions that do not have a sharp discontinuity at the origin. Our speculation is that the sharp discontinuity that occurs with half-wave rectification and with hard limiting produces distortion that was more detrimental for the sustained tonal musical sample than it was for speech. B. Auditory Localization The second experiment reported here demonstrates that moderate nonlinear processing has essentially no effect on

7 692 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Fig. 5. Room layouts used for auditory localization experiments (coordinate units in feet). (a) Simulated room used to generate signals. (b) Actual room used to present nonlinearly transformed signals to listener. auditory localization. It differs from the first experiment in that stimuli were presented over a pair of loudspeakers rather than through headphones. The original speech token was the same male-talker token used in the first experiment. The left- and right-channel signals were produced in a simulated transmission room with a sound source and two microphones using the image model [13]. The simulated room was specified so as to model the actual room used in Sections II and III. A top view of the simulated room is shown in Fig. 5(a). (We use units of feet, as designated in the original room specifications.) The reflection coefficients of walls, ceiling, and floor were 0.85, 0.65, and 0.80, respectively. Two conditions were investigated. In one, the left- and right-channel signals were produced by a widely-spaced pair of omnidirectional microphones [ Left Mike and Right Mike in Fig. 5(a)]. In the other, the left- and right-channel signals were produced by a closely-spaced pair of cardioid microphones directed at right angles to each other [ Crossed Cardioid Mikes in Fig. 5(a)]. The position of the centered talker was the same for both conditions. Note that the centered talker ( ft in the next sub-

8 MORGAN et al.: NONLINEARITIES FOR USE IN STEREO ACOUSTIC ECHO CANCELLATION 693 section) was not equidistant from the left and right walls. This asymmetry was introduced deliberately to eliminate atypical artifacts. The height of the room was 8.0 ft. The speech source was 3.25 ft above the floor and all microphones were 2.25 ft above the floor. Each cardioid microphone was implemented by means of a closely-spaced pair of omnidirectional microphones (2.0 cm between microphones) with appropriately delayed and integrated outputs [14]. This spacing is small enough to provide an acceptable frequency response and large enough to provide an acceptable spatial resolution. The left-channel microphone was directed 45 toward the left and the right-channel microphone was directed 45 toward the right. The transformation used for these experiments was to replace the left- and right-channel signals and by the modified signals of (3) using the half-wave rectifier of Table I with nonlinearity parameter. This value of was chosen to be even somewhat larger than the value used in the coherence calculations and subjective quality experiments in order to better evoke any possible impairment of localization performance. The transformed signals were amplified and presented at a comfortable listening level of approximately 75 db SPL over a pair of Quad ESL-63 electrostatic loudspeakers. The listening room, shown in Fig. 5(b), was the same as described in Section III. In each experimental trial, the subject listened to a pair of stimulus presentations that differed in the location of the talker in the simulated transmission room. The subject was required to judge whether the perceived location of the talker in the second presentation was to the left of or to the right of the perceived location of the talker in the first presentation. Talker positions for the two stimulus presentations comprising a trial were symmetrical about the centered position. One was ft to the left of the centered position, and the other was ft to the right, so that the total change of talker position between the two stimulus presentations was ft. The order of presentation was randomized from trial to trial. We carried out a series of preliminary listening experiments to determine what values of would be included in the experiment. The criterion was to select values of that would cover the range from difficult (probability of correct response slightly better than chance) to easy (probability of correct response close to unity). These preliminary listening experiments revealed that a much smaller change of is detectable with the omnidirectional microphone configuration than with the cardioid microphone configuration. We selected four values of for the omnidirectional microphone configuration, 0.02 ft, 0.04 ft, 0.1 ft, and 0.2 ft, and four values of for the cardioid microphone configuration, 0.2 ft, 0.4 ft, 1.0 ft, and 2.0 ft. There were thus 16 different stimulus conditions: two microphones (omnidirectional or cardioid) four values of (as described above) two distortion conditions [unmodified ( ) and modified ( )]. An experimental session consisted of five replications of the 16 conditions, for a total of 80 trials, with the order of conditions randomized within each replication. An experimental session took about seven minutes to complete. Four subjects participated in the experiment. The ages of the subjects ranged from 33 to 63 years. As in the previous experiment, some of the subjects exhibited a moderate amount of presbycusis, all subjects had audiologically normal hearing, and there was no apparent relationship between age or amount of presbycusis and performance in the auditory localization task. Each subject read a sheet of printed instructions reproduced here as Appendix B and then participated in one or two practice sessions followed, on separate days, by two data-collection sessions, so the experimental results are based on ten observations per condition for each subject. The results for the four subjects were averaged and are plotted in Fig. 6. Each panel shows the proportion of correct responses,, versus the amount by which the talker position changed,. The left panel shows results with omnidirectional microphones and the right panel shows results with crossed cardioid microphones. In each panel, the points labeled O are for unmodified speech [ in (3)] and the points labeled A are for speech modified using half-wave rectification [ in (3)]. Each point shows the average of 40 trials. The error bars between the left and right panels show 95% confidence intervals of probability of correct response for and. The results indicate that introduction of this nonlinear distortion produces little or no degradation of stereophonic localization. Of 32 cases (four subjects two types of microphone four values of ), the proportion of correct responses was equal for distorted and undistorted speech in 14 cases, was higher for undistorted than for distorted in seven cases, and was higher for distorted than for undistorted in 11 cases. A simple nonparametric sign test based on these numbers shows that the difference between distorted and undistorted speech over the ensemble of experimental subjects is not significant. Fig. 6 supports this conclusion. In only one of the eight cases ( = 0.02 ft, omnidirectional microphones) is the difference between proportion of correct responses for distorted and undistorted speech significant at the 95% level. There were also some intersubject differences. Not all subjects did equally well overall, and in addition two of the subjects showed more of a difference between crossed cardioid and omnidirectional microphones than did two othersubjects. This was true for both undistorted and distorted speech. The primary directional cue produced by the omnidirectional microphone configuration is interchannel time difference, while the primary directional cue produced by the cardioid microphone configuration is interchannel intensity difference. We know [15] that some subjects are more dependent on interaural time difference in a lateralization task while other subjects are more dependent on interaural intensity difference, so it comes as no surprise that the effect of microphone configuration is different for different subjects. Fig. 6 shows that the threshold value of [ ] averaged over the four subjects was more than an order of magnitude smaller for the omnidirectional configuration (0.05 ft) than it was for the cardioid configuration (1.5 ft). However, the omnidirectional configuration gives a distorted representation of talker position: the perceived position of the talker moves abruptly from extreme right to extreme left as the actual talker position crosses the midline between the two microphones. This distortion occurs because the primary directional cue produced by the omnidirectional microphone configuration

9 694 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Fig. 6. (a) (b) Proportion of correct responses versus change of talker position for average of four subjects. Each point shows the average of 40 trials. Left panel: omnidirectional microphones. Right panel: crossed cardioid microphones. Processing conditions correspond to: (O) unmodified [ = 0 in (3)]; (A) modified with half-wave rectification ([ = 0:45 in (3)]. Each point shows the average of 40 trials. The error bars between the left and right panels show 95% confidence intervals for P(C) =0:5 and P(C) =0:8. is interchannel time difference; because of the precedence effect [16], [17], the perceived source in this case is localized at the loudspeaker receiving the earlier signal. With the cardioid microphone configuration, the directional cue is interchannel intensity difference, and the perceived position of the talker changes gradually from left to right as the actual position of the talker changes. In our experiment, the simulated talker and listener are effectively 15.1 ft apart (7.2 ft plus 7.9 ft from the front walls), so the 1.5-ft threshold we measured corresponds to an angle shift of about 5.7. This compares favorably with results presented by Mills [18], who reports minimum audible angles for tone bursts in the range of 1-4, depending on the frequency of the stimulus. V. CONCLUSIONS We investigated several types of nonlinearities for reducing the mutual coherence of stereo signals for the purpose of uniquely identifying the impulse responses in acoustic echo cancellation applications. The intention is that this reduction in coherence, while being effective for its intended purpose, does not seriously degrade the subjective quality of the audio source or the ability to localize the direction of sound. First, the parameters of the nonlinearities were selected so as to produce approximately equal reduction of coherence, and then psychoacoustic experiments were conducted to quantify the subjective loss of quality and to determine whether localization is compromised. Of the types of nonlinearities evaluated, half-wave rectification is the simplest to implement and only minimally affects the speech quality. However, for music the nonlinearity parameter must be reduced to maintain the same level of decoherence and, we presume, the same perceptual quality. The smoothed rectifier also provides good speech quality but is a little more difficult to implement because a running estimate of the standard deviation must be computed and used to normalize the break point. However, its performance seems to more uniformly affect speech and music, therefore not requiring readjustment of the nonlinearity parameter. We found no statistically meaningful effect of half-wave rectification on the localization performance. Informal listening also did not reveal any localization impairment for any of the other nonlinearities; mild nonlinearities of any kind seem to have no effect whatsoever. APPENDIX A PRINTED INSTRUCTIONS FOR SUBJECTIVE QUALITY EXPERIMENT

10 MORGAN et al.: NONLINEARITIES FOR USE IN STEREO ACOUSTIC ECHO CANCELLATION 695 APPENDIX B PRINTED INSTRUCTIONS FOR AUDITORY LOCALIZATION EXPERIMENT [10] User s Manual for the SYSid Audio-Band Measurement and Analysis System Version 4.0. Highland Park, NJ: Ariel Corporation, [11] M. R. Schroeder and J. L. Hall, Model for mechanical to neural transduction in the auditory receptor, J. Acoust. Soc. Am., vol. 55, pp , May [12] D. S. Green, Pure tone air conduction thresholds, in Handbook of Clinical Audiology, J. Katz, Ed. Baltimore: Williams & Wilkins, 1972, ch. 5, pp [13] J. B. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am., vol. 65, pp , Apr [14] H. F. Olsen, Modern Sound Reproduction. New York: van Nostrand Reinhold, 1972, pp [15] L. A. Jeffress and D. McFadden, Differences of interaural phase and level in detection and lateralization, J. Acoust. Soc. Am., vol. 49, pp , [16] W. M. Hall, A method for maintaining in a public address system the illusion that the sound comes from the speaker s mouth, J. Acoust. Soc. Am., vol. 7, p. 239, [17] M. B. Gardner, Historical background of the Haas and/or precedence effect, J. Acoust. Soc. Am., vol. 43, pp , [18] A. W. Mills, On the minimum audible angle, J. Acoust. Soc. Am., vol. 30, pp , ACKNOWLEDGMENT The authors would like to thank M. M. Sondhi and G. W. Elko for helpful discussions, and T. Gaensler and the reviewers for providing useful comments to improve the text. REFERENCES [1] M. M. Sondhi, D. R. Morgan, and J. L. Hall, Stereophonic acoustic echo cancellation An overview of the fundamental problem, IEEE Signal Processing Lett., vol. 2, pp , Aug [2] J. Benesty, D. R. Morgan, and M. M. Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation, IEEE Trans. Speech Audio Processing, vol. 6, pp , Mar [3] A. Gilloire and V. Turbin, Using auditory properties to improve the behavior of stereophonic acoustic cancellers, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 1998, pp [4] S. Shimauchi, Y. Haneda, S. Makino, and Y. Kaneda, New configuration for a stereo echo canceller with nonlinear pre-processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 1998, pp [5] J. Benesty, D. R. Morgan, and M. M. Sondhi, A hybrid mono/stereo acoustic echo canceler, IEEE Trans. Speech Audio Processing, vol. 6, pp , Sept [6] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, Synthesized stereo combined with acoustic echo cancellation for desktop conferencing, Bell Labs Tech. J., vol. 3, pp , July Sept [7] A. Papoulis, Probability, Random Variables and Stochastic Processes. New York: McGraw-Hill, 1984, p [8] R. F. Baum, The correlation function of Gaussian noise passed through nonlinear devices, IEEE Trans. Inform. Theory, vol. IT-15, pp , July [9] D. A. Berkley and J. L. Flanagan, HuMaNet: An experimental humanmachine communications network based on ISDN wideband audio, AT&T Tech. J., vol. 69, pp , Sept./Oct Dennis R. Morgan (S 63 S 68 M 69 SM 92) was born in Cincinnati, OH, on February 19, He received the B.S. degree in 1965 from the University of Cincinnati, and the M.S. and Ph.D. degrees from Syracuse University, Syracuse, NY, in 1968 and 1970, respectively, all in electrical engineering. From 1965 to 1984, he was with the Electronics Laboratory, General Electric Company, Syracuse, specializing in the analysis and design of signal processing systems used in radar, sonar, and communications. He is now a Distinguished Member of Technical Staff at Bell Laboratories, Lucent Technologies (formerly AT&T), Murray Hill, where he has been employed since From 1984 to 1990, he was with the Special Systems Analysis Department, Whippany NJ, where he was involved in the analysis and development of advanced signal processing techniques associated with communications, array processing, detection and estimation, and adaptive systems. Since 1990, he has been with the Acoustics Research Department, Murray Hill, NJ, where he is engaged in research on adaptive signal processing techniques applied to communication systems. He has authored numerous journal publications and is coauthor of Active Noise Control Systems: Algorithms and DSP Implementations (New York: Wiley, 1996) and Advances in Network and Acoustic Echo Cancellation (New York: Springer-Verlag, 2001). Dr. Morgan served as Associate Editor for the IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING from 1995 to He is currently serving as Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING. Joseph L. Hall was born in Boston, MA, on January 22, He received the B.A. degree in physics in 1959 from Williams College, Williamstown, MA, and the S.B. and S.M. degrees in electrical engineering in 1959 and the Ph.D. degree in electrical engineering in 1963, all from Massachusetts Institute of Technology (MIT), Cambridge. From 1964 through 1966, he was with the Department of Electrical Engineering and the Department of Biomedical Engineering at Johns Hopkins University, Baltimore, MD. In 1966, he joined the Acoustics and Speech Research Department at Bell Labs, Lucent Technologies (formerly AT&T), Murray Hill, NJ, where he is now a Distinguished Member of Technical Staff. His research interests are in the area of auditory psychophysics. He was Associate Editor of the Journal of the Acoustical Society of America. Dr. Hall is a Fellow of the Acoustical Society of America, where he has served on the society s executive council.

11 696 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 6, SEPTEMBER 2001 Jacob Benesty (M 98) was born in Marrakesh, Morocco, on April 8, He received the M.S. degree in microwaves from Pierre & Marie Curie University, France, in 1987, and the Ph.D. degree in control and signal processing from Orsay University, France, in April While pursuing the Ph.D. degree (from November 1989 to April 1991), he worked on adaptive filters and fast algorithms at the Centre National d Etudes des Telecomunications (CNET), Paris, France. From January 1994 to July 1995, he worked at Telecom Paris on multichannel adaptive filters and acoustic echo cancellation. He joined Bell Labs, Lucent Technologies (formerly AT&T), Murray Hill, NJ, in October 1995, first as a Consultant and then as a Member of Technical Staff. Since this date, he has been working on stereophonic acoustic echo cancellation, adaptive filters, source localization, robust network echo cancellation, and blind deconvolution. He was the Co-chair of the 1999 International Workshop on Acoustic Echo and Noise Control. He coauthored the book Advances in Network and Acoustic Echo Cancellation (New York: Springer-Verlag, 2001). He is also co-editor/coauthor of the book Acoustic Signal Processing for Telecommunication (Norwell, MA: Kluwer, 2000).

ROBUST echo cancellation requires a method for adjusting

ROBUST echo cancellation requires a method for adjusting 1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 6.1 AUDIBILITY OF COMPLEX

More information

Reducing comb filtering on different musical instruments using time delay estimation

Reducing comb filtering on different musical instruments using time delay estimation Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering

More information

SELECTIVE TIME-REVERSAL BLOCK SOLUTION TO THE STEREOPHONIC ACOUSTIC ECHO CANCELLATION PROBLEM

SELECTIVE TIME-REVERSAL BLOCK SOLUTION TO THE STEREOPHONIC ACOUSTIC ECHO CANCELLATION PROBLEM 7th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 4-8, 9 SELECIVE IME-REVERSAL BLOCK SOLUION O HE SEREOPHONIC ACOUSIC ECHO CANCELLAION PROBLEM Dinh-Quy Nguyen, Woon-Seng Gan,

More information

On Regularization in Adaptive Filtering Jacob Benesty, Constantin Paleologu, Member, IEEE, and Silviu Ciochină, Member, IEEE

On Regularization in Adaptive Filtering Jacob Benesty, Constantin Paleologu, Member, IEEE, and Silviu Ciochină, Member, IEEE 1734 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011 On Regularization in Adaptive Filtering Jacob Benesty, Constantin Paleologu, Member, IEEE, and Silviu Ciochină,

More information

FOURIER analysis is a well-known method for nonparametric

FOURIER analysis is a well-known method for nonparametric 386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,

More information

Broadband Microphone Arrays for Speech Acquisition

Broadband Microphone Arrays for Speech Acquisition Broadband Microphone Arrays for Speech Acquisition Darren B. Ward Acoustics and Speech Research Dept. Bell Labs, Lucent Technologies Murray Hill, NJ 07974, USA Robert C. Williamson Dept. of Engineering,

More information

THE EFFECT of multipath fading in wireless systems can

THE EFFECT of multipath fading in wireless systems can IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 47, NO. 1, FEBRUARY 1998 119 The Diversity Gain of Transmit Diversity in Wireless Systems with Rayleigh Fading Jack H. Winters, Fellow, IEEE Abstract In

More information

Enhancing 3D Audio Using Blind Bandwidth Extension

Enhancing 3D Audio Using Blind Bandwidth Extension Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

COM 12 C 288 E October 2011 English only Original: English

COM 12 C 288 E October 2011 English only Original: English Question(s): 9/12 Source: Title: INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 Audience STUDY GROUP 12 CONTRIBUTION 288 P.ONRA Contribution Additional

More information

INTERNATIONAL TELECOMMUNICATION UNION

INTERNATIONAL TELECOMMUNICATION UNION INTERNATIONAL TELECOMMUNICATION UNION ITU-T P.835 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (11/2003) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods

More information

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

APPLICATIONS OF DYNAMIC DIFFUSE SIGNAL PROCESSING IN SOUND REINFORCEMENT AND REPRODUCTION

APPLICATIONS OF DYNAMIC DIFFUSE SIGNAL PROCESSING IN SOUND REINFORCEMENT AND REPRODUCTION APPLICATIONS OF DYNAMIC DIFFUSE SIGNAL PROCESSING IN SOUND REINFORCEMENT AND REPRODUCTION J Moore AJ Hill Department of Electronics, Computing and Mathematics, University of Derby, UK Department of Electronics,

More information

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor Presented by Amir Kiperwas 1 M-element microphone array One desired source One undesired source Ambient noise field Signals: Broadband Mutually

More information

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS PACS: 4.55 Br Gunel, Banu Sonic Arts Research Centre (SARC) School of Computer Science Queen s University Belfast Belfast,

More information

Distortion products and the perceived pitch of harmonic complex tones

Distortion products and the perceived pitch of harmonic complex tones Distortion products and the perceived pitch of harmonic complex tones D. Pressnitzer and R.D. Patterson Centre for the Neural Basis of Hearing, Dept. of Physiology, Downing street, Cambridge CB2 3EG, U.K.

More information

A triangulation method for determining the perceptual center of the head for auditory stimuli

A triangulation method for determining the perceptual center of the head for auditory stimuli A triangulation method for determining the perceptual center of the head for auditory stimuli PACS REFERENCE: 43.66.Qp Brungart, Douglas 1 ; Neelon, Michael 2 ; Kordik, Alexander 3 ; Simpson, Brian 4 1

More information

Convention Paper Presented at the 138th Convention 2015 May 7 10 Warsaw, Poland

Convention Paper Presented at the 138th Convention 2015 May 7 10 Warsaw, Poland Audio Engineering Society Convention Paper Presented at the 38th Convention 25 May 7 Warsaw, Poland This Convention paper was selected based on a submitted abstract and 75-word precis that have been peer

More information

Psychoacoustic Cues in Room Size Perception

Psychoacoustic Cues in Room Size Perception Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany 6084 This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

DESIGN AND APPLICATION OF DDS-CONTROLLED, CARDIOID LOUDSPEAKER ARRAYS

DESIGN AND APPLICATION OF DDS-CONTROLLED, CARDIOID LOUDSPEAKER ARRAYS DESIGN AND APPLICATION OF DDS-CONTROLLED, CARDIOID LOUDSPEAKER ARRAYS Evert Start Duran Audio BV, Zaltbommel, The Netherlands Gerald van Beuningen Duran Audio BV, Zaltbommel, The Netherlands 1 INTRODUCTION

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication FREDRIC LINDSTRÖM 1, MATTIAS DAHL, INGVAR CLAESSON Department of Signal Processing Blekinge Institute of Technology

More information

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA Surround: The Current Technological Situation David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 www.world.std.com/~griesngr There are many open questions 1. What is surround sound 2. Who will listen

More information

DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY

DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY Dr.ir. Evert Start Duran Audio BV, Zaltbommel, The Netherlands The design and optimisation of voice alarm (VA)

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation SEPTIMIU MISCHIE Faculty of Electronics and Telecommunications Politehnica University of Timisoara Vasile

More information

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Analysis of Frontal Localization in Double Layered Loudspeaker Array System Proceedings of 20th International Congress on Acoustics, ICA 2010 23 27 August 2010, Sydney, Australia Analysis of Frontal Localization in Double Layered Loudspeaker Array System Hyunjoo Chung (1), Sang

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Sound Source Localization using HRTF database

Sound Source Localization using HRTF database ICCAS June -, KINTEX, Gyeonggi-Do, Korea Sound Source Localization using HRTF database Sungmok Hwang*, Youngjin Park and Younsik Park * Center for Noise and Vibration Control, Dept. of Mech. Eng., KAIST,

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Real-time Adaptive Concepts in Acoustics

Real-time Adaptive Concepts in Acoustics Real-time Adaptive Concepts in Acoustics Real-time Adaptive Concepts in Acoustics Blind Signal Separation and Multichannel Echo Cancellation by Daniel W.E. Schobben, Ph. D. Philips Research Laboratories

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Michael F. Toner, et. al.. Distortion Measurement. Copyright 2000 CRC Press LLC. < Michael F. Toner, et. al.. "Distortion Measurement." Copyright CRC Press LLC. . Distortion Measurement Michael F. Toner Nortel Networks Gordon W. Roberts McGill University 53.1

More information

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 2, FEBRUARY 2002 187 Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System Xu Zhu Ross D. Murch, Senior Member, IEEE Abstract In

More information

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany 5627 This convention paper has been reproduced from the author s advance manuscript, without

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

A Novel Control Method for Input Output Harmonic Elimination of the PWM Boost Type Rectifier Under Unbalanced Operating Conditions

A Novel Control Method for Input Output Harmonic Elimination of the PWM Boost Type Rectifier Under Unbalanced Operating Conditions IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER 2001 603 A Novel Control Method for Input Output Harmonic Elimination of the PWM Boost Type Rectifier Under Unbalanced Operating Conditions

More information

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)].

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)]. XVI. SIGNAL DETECTION BY HUMAN OBSERVERS Prof. J. A. Swets Prof. D. M. Green Linda E. Branneman P. D. Donahue Susan T. Sewall A. MASKING WITH TWO CONTINUOUS TONES One of the earliest studies in the modern

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques: Multichannel Audio Technologies More on Surround Sound Microphone Techniques: In the last lecture we focused on recording for accurate stereophonic imaging using the LCR channels. Today, we look at the

More information

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 11, NOVEMBER 2002 1719 SNR Estimation in Nakagami-m Fading With Diversity Combining Its Application to Turbo Decoding A. Ramesh, A. Chockalingam, Laurence

More information

Auditory Localization

Auditory Localization Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception

More information

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR BeBeC-2016-S9 BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR Clemens Nau Daimler AG Béla-Barényi-Straße 1, 71063 Sindelfingen, Germany ABSTRACT Physically the conventional beamforming method

More information

On the Estimation of Interleaved Pulse Train Phases

On the Estimation of Interleaved Pulse Train Phases 3420 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 On the Estimation of Interleaved Pulse Train Phases Tanya L. Conroy and John B. Moore, Fellow, IEEE Abstract Some signals are

More information

BEING wideband, chaotic signals are well suited for

BEING wideband, chaotic signals are well suited for 680 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 51, NO. 12, DECEMBER 2004 Performance of Differential Chaos-Shift-Keying Digital Communication Systems Over a Multipath Fading Channel

More information

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma & Department of Electrical Engineering Supported in part by a MURI grant from the Office of

More information

PLL FM Demodulator Performance Under Gaussian Modulation

PLL FM Demodulator Performance Under Gaussian Modulation PLL FM Demodulator Performance Under Gaussian Modulation Pavel Hasan * Lehrstuhl für Nachrichtentechnik, Universität Erlangen-Nürnberg Cauerstr. 7, D-91058 Erlangen, Germany E-mail: hasan@nt.e-technik.uni-erlangen.de

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

The role of intrinsic masker fluctuations on the spectral spread of masking

The role of intrinsic masker fluctuations on the spectral spread of masking The role of intrinsic masker fluctuations on the spectral spread of masking Steven van de Par Philips Research, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands, Steven.van.de.Par@philips.com, Armin

More information

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

6-channel recording/reproduction system for 3-dimensional auralization of sound fields Acoust. Sci. & Tech. 23, 2 (2002) TECHNICAL REPORT 6-channel recording/reproduction system for 3-dimensional auralization of sound fields Sakae Yokoyama 1;*, Kanako Ueno 2;{, Shinichi Sakamoto 2;{ and

More information

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications

More information

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING A.VARLA, A. MÄKIVIRTA, I. MARTIKAINEN, M. PILCHNER 1, R. SCHOUSTAL 1, C. ANET Genelec OY, Finland genelec@genelec.com 1 Pilchner Schoustal Inc, Canada

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Interference in stimuli employed to assess masking by substitution. Bernt Christian Skottun. Ullevaalsalleen 4C Oslo. Norway

Interference in stimuli employed to assess masking by substitution. Bernt Christian Skottun. Ullevaalsalleen 4C Oslo. Norway Interference in stimuli employed to assess masking by substitution Bernt Christian Skottun Ullevaalsalleen 4C 0852 Oslo Norway Short heading: Interference ABSTRACT Enns and Di Lollo (1997, Psychological

More information

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and

More information

The analysis of multi-channel sound reproduction algorithms using HRTF data

The analysis of multi-channel sound reproduction algorithms using HRTF data The analysis of multichannel sound reproduction algorithms using HRTF data B. Wiggins, I. PatersonStephens, P. Schillebeeckx Processing Applications Research Group University of Derby Derby, United Kingdom

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Spatial audio is a field that

Spatial audio is a field that [applications CORNER] Ville Pulkki and Matti Karjalainen Multichannel Audio Rendering Using Amplitude Panning Spatial audio is a field that investigates techniques to reproduce spatial attributes of sound

More information

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio

More information

Low frequency sound reproduction in irregular rooms using CABS (Control Acoustic Bass System) Celestinos, Adrian; Nielsen, Sofus Birkedal

Low frequency sound reproduction in irregular rooms using CABS (Control Acoustic Bass System) Celestinos, Adrian; Nielsen, Sofus Birkedal Aalborg Universitet Low frequency sound reproduction in irregular rooms using CABS (Control Acoustic Bass System) Celestinos, Adrian; Nielsen, Sofus Birkedal Published in: Acustica United with Acta Acustica

More information

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SF Minhas A Barton P Gaydecki School of Electrical and

More information

Pre- and Post Ringing Of Impulse Response

Pre- and Post Ringing Of Impulse Response Pre- and Post Ringing Of Impulse Response Source: http://zone.ni.com/reference/en-xx/help/373398b-01/svaconcepts/svtimemask/ Time (Temporal) Masking.Simultaneous masking describes the effect when the masked

More information

Exposure schedule for multiplexing holograms in photopolymer films

Exposure schedule for multiplexing holograms in photopolymer films Exposure schedule for multiplexing holograms in photopolymer films Allen Pu, MEMBER SPIE Kevin Curtis,* MEMBER SPIE Demetri Psaltis, MEMBER SPIE California Institute of Technology 136-93 Caltech Pasadena,

More information

A Fast Recursive Algorithm for Optimum Sequential Signal Detection in a BLAST System

A Fast Recursive Algorithm for Optimum Sequential Signal Detection in a BLAST System 1722 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 51, NO 7, JULY 2003 A Fast Recursive Algorithm for Optimum Sequential Signal Detection in a BLAST System Jacob Benesty, Member, IEEE, Yiteng (Arden) Huang,

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning Toshiyuki Kimura and Hiroshi Ando Universal Communication Research Institute, National Institute

More information

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920 Detection and discrimination of frequency glides as a function of direction, duration, frequency span, and center frequency John P. Madden and Kevin M. Fire Department of Communication Sciences and Disorders,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY?

IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY? IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY? G. Leembruggen Acoustic Directions, Sydney Australia 1 INTRODUCTION 1.1 Motivation for the Work With over fifteen

More information

Analysis of room transfer function and reverberant signal statistics

Analysis of room transfer function and reverberant signal statistics Analysis of room transfer function and reverberant signal statistics E. Georganti a, J. Mourjopoulos b and F. Jacobsen a a Acoustic Technology Department, Technical University of Denmark, Ørsted Plads,

More information

Sound Processing Technologies for Realistic Sensations in Teleworking

Sound Processing Technologies for Realistic Sensations in Teleworking Sound Processing Technologies for Realistic Sensations in Teleworking Takashi Yazu Makoto Morito In an office environment we usually acquire a large amount of information without any particular effort

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

NSMRL Report JULY 2001

NSMRL Report JULY 2001 Naval Submarine Medical Research Laboratory NSMRL Report 1221 02 JULY 2001 AN ALGORITHM FOR CALCULATING THE ESSENTIAL BANDWIDTH OF A DISCRETE SPECTRUM AND THE ESSENTIAL DURATION OF A DISCRETE TIME-SERIES

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS Sean Enderby and Zlatko Baracskai Department of Digital Media Technology Birmingham City University Birmingham, UK ABSTRACT In this paper several

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

A spatial squeezing approach to ambisonic audio compression

A spatial squeezing approach to ambisonic audio compression University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 A spatial squeezing approach to ambisonic audio compression Bin Cheng

More information

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York Audio Engineering Society Convention Paper Presented at the 115th Convention 2003 October 10 13 New York, New York This convention paper has been reproduced from the author's advance manuscript, without

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino % > SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION Ryo Mukai Shoko Araki Shoji Makino NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Soraku-gun,

More information

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback PURPOSE This lab will introduce you to the laboratory equipment and the software that allows you to link your computer to the hardware.

More information

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting Rec. ITU-R BS.1548-1 1 RECOMMENDATION ITU-R BS.1548-1 User requirements for audio coding systems for digital broadcasting (Question ITU-R 19/6) (2001-2002) The ITU Radiocommunication Assembly, considering

More information

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal

More information

Design of Robust Differential Microphone Arrays

Design of Robust Differential Microphone Arrays IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2014 1455 Design of Robust Differential Microphone Arrays Liheng Zhao, Jacob Benesty, Jingdong Chen, Senior Member,

More information

DIGITAL Radio Mondiale (DRM) is a new

DIGITAL Radio Mondiale (DRM) is a new Synchronization Strategy for a PC-based DRM Receiver Volker Fischer and Alexander Kurpiers Institute for Communication Technology Darmstadt University of Technology Germany v.fischer, a.kurpiers @nt.tu-darmstadt.de

More information

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction S.B. Nielsen a and A. Celestinos b a Aalborg University, Fredrik Bajers Vej 7 B, 9220 Aalborg Ø, Denmark

More information

EE 791 EEG-5 Measures of EEG Dynamic Properties

EE 791 EEG-5 Measures of EEG Dynamic Properties EE 791 EEG-5 Measures of EEG Dynamic Properties Computer analysis of EEG EEG scientists must be especially wary of mathematics in search of applications after all the number of ways to transform data is

More information