FP6 IST

Size: px
Start display at page:

Download "FP6 IST"

Transcription

1 FP6 IST Deliverable 3.1 Multi-channel Acoustic Echo Cancellation, Acoustic Source Localization, and Beamforming Algorithms for Distant-Talking ASR and Surveillance Authors: Lutz Marquardt, Edwin Mabande, Alessio Brutti, Walter Kellermann Affiliations: FAU, FBK-irst Date: 30-Apr 2008 Document Type: R Status/Version: 1.0 Dissemination Level: PU

2 Project Reference Project Acronym Project Full Title Dissemination Level Contractual Date of Delivery Actual Date of Delivery Document Number Type FP6 IST DICIT Distant-talking Interfaces for Control of Interactive TV PU March 2008 Preliminary version: 11-January-2008 Final version: 30-April-2008 DICIT_D3.1_ _PU Deliverable Status & Version 1.0 Number of Pages 4+33 WP Contributing to the Deliverable WP Task responsible WP3 (WP responsible: Walter Kellermann FAU) Lutz Marquardt (FAU) Lutz Marquardt, Edwin Mabande Authors (Affiliation) and Walter Kellermann (FAU), Alessio Brutti (FBK-irst) Other Contributors Reviewer Anne Bajart (till January 31 st 2007), Erwin Valentini (from EC Project Officers February 1 st till October 31 st 2007), Pierre Paul Sondag (from November 1 st 2007) Keywords: multi-channel acoustic echo cancellation, acoustic source localization, beamforming, multi-microphone devices, distant-talking speech recognition devices, voiceoperated devices, Interactive TV, anti-intrusion, surveillance. Abstract: The purpose of this document is to describe the acoustic pre-processing algorithms to be integrated into the first DICIT prototype. These algorithms will be used for the acquisition, extraction and enhancement of the desired speech signals which will be fed into the speech recognizer. DICIT_D3.1_ DICIT Consortium ii

3 Contents Contents... iii List of Figures... iv 1. Introduction Beamforming Array Geometry Data-independent Beamforming Designs for DICIT DSB with Dolph-Chebyshev Window Weighting FSB based on a Dolph-Chebyshev Design Least-Squares Frequency-Invariant Beamformer Beamforming Module Structure for DICIT Multi-channel Acoustic Echo Cancellation (MC-AEC) Generalized Frequency Domain Adaptive Filtering (GFDAF) Channel Decorrelation for MC-AEC Source Localization (SLoc) SLoc in DICIT Array Design Application Scenario Adopted SLoc Approach Global Coherence Field Sub-optimal Least Squares Tracking Experimental Results Multiple Sources Loudspeakers as Additional Sources Real-time Implementation Multi-channel Acoustic Processing Subsystem FBK Hardware Setup FAU Hardware Setup Bibliography DICIT_D3.1_ DICIT Consortium iii

4 List of Figures Figure 1: A linear uniformly-spaced microphone array... 2 Figure 2: Frequency-dependent and frequency-independent beampatterns... 4 Figure 3: Harmonically Nested Array... 5 Figure 4: Nested sub-array structure... 6 Figure 5: Beampattern and WNG for DSB design... 8 Figure 6: Beampattern and WNG for FSB-DC... 9 Figure 7: Beampattern and WNG for LS-FIB Figure 8: Signal flow block diagram Figure 9: MC-AEC in Human-Machine-Interface System Figure 10: MC-AEC misalignment convergence comparison for NLMS and FDAF [8] Figure 11: Phase modulation amplitude as a function of frequency subband [9] Figure 12: Stereo-decorrelation employing frequency-dependent phase modulation [9] Figure 13: Subjective audio quality for pre-processing methods [9] Figure 14: Convergence comparison of pre-processing methods for stereo AEC [9] Figure 15: Loci of points that satisfy a given TDOA at two microphones Figure 16: TDOA given two microphones and a source in far field position Figure 17: Effect of noisy time delay estimations in a double microphone pair set-up Figure 18: SLoc module block diagram Figure 19: Localization performance in terms of angular RMSE for different thresholds Figure 20: GCF acoustic map in presence of two sources Figure 21: Multiple speaker localization Figure 22: GCF map of Figure 20 after the de-emphasis process Figure 23: Configuration of the first DICIT prototype Figure 24: Block structure of Multi-channel Acoustic Processing Figure 25: PC 1 audio acquisition chain DICIT_D3.1_ DICIT Consortium iv

5 1. Introduction The general objective of WP3 is to find the most effective multi-channel pre-processing for the DICIT system in order to acquire, extract and enhance the signals uttered by the desired speakers. This pre-processing should help to maximize speech recognition performance in the noisy and reverberant DICIT scenarios, which require an abandonment of close-talk microphones. The four main research challenges for reaching this goal are reflected by tasks T3.1 Multichannel Acoustic Echo Cancellation (MC-AEC), T3.2 Adaptive Beamforming for Dereverberation and Noise Reduction, T3.3 Source Localization Algorithms for supporting Beamforming and T3.4 Blind Source Separation for Noisy and Reverberant Environments. While Blind Source Separation (BSS) as the fourth field of research will be investigated for the second prototype only, the first three research topics were addressed during the first year of the project in order to meet the requirements for the first prototype. This document describes the respective work conducted in connection with tasks for T3.1 to T3.3. For task T3.2, fixed beamformers with steering capabilities are described as these are relevant for the first prototype. First, Section 2 reports on beamforming, which was investigated with respect to conceiving adequate solutions that allow for a straightforward implementation together with MC-AEC. Regarding Acoustic Echo Cancellation (AEC), Section 3 describes the work on MC-AEC and in particular on two-channel AEC as to its employment and adaptation to the DICIT scenarios. The usage of a new pre-processing scheme for channel decorrelation as an important feature for increasing the performance of an existing algorithm is described in Section 3.2. Source Localization (Sloc) which provides the beamformer with steering information and also allows to track movements of the active speakers is described in Section 4. It is indispensable as long as no BSS algorithms are foreseen for desired signal acquisition. Besides the preparation of these DICIT-tailored mechanisms for the localization of a single active source, also a novel approach to handle the localization of the active source in the presence of the stereo loudspeaker outputs is described. Finally the integration of the respective modules within the Multi-channel Acoustic Processing Subsystem is reported on in Section 5. 1

6 2. Beamforming D 3.1 MC-AEC, Acoustic Source Localization, and Beamforming Array signal processing makes use of an array, which consists of a group of transducers, to extract information from an environment. Microphone arrays have been successfully applied to spatial filtering problems where a desired acoustic source, usually obstructed by noise or interferers, needs to be extracted from an observed wavefield. The paradigm used for this task is called beamforming. By definition, a beamformer is a processor that is used in conjunction with an array of microphones in order to produce a versatile form of spatial filtering [1]. The beamformer exploits the spatial distribution of desired sources and interferers in order to attenuate the latter. The multiple microphone signals are jointly processed to form a beam, that is a region of high acoustic sensitivity, which is steered towards the desired source. For narrowband signals, the classical Delay-and-Sum Beamformer (DSB) [1] may be used. In the case of broadband signal acquisition, the goal of beamforming is to obtain a frequencyindependent beam. This is necessary in order to avoid also extracting low-pass versions of the noise or interferers from the observed wavefield. The width of the main beam is directly related to the length of the array and the wavelength of the signal [2]. This goal may be accomplished by either implementing a Filter-and-Sum Beamformer (FSB) or utilizing nested arrays or a combination of the two [3]. Source Microphones Beamformig Filters Output Spacing Figure 1: A linear uniformly-spaced microphone array Figure 1 depicts a linear, uniformly-spaced microphone array Filter-and-Sum Beamformer (FSB). In the following we consider an array that consists of N = 2 M + 1 microphones with a uniform spacing d. The source signal is captured by the microphones and digitized by 2

7 analog-to-digital converters. The digitized signals are then fed into the beamforming filters before being combined to produce the output. The FSB response for a frequency ω and angleν relative to the array axis is given by [4] L 1 = B( ω, ν ) = M n= M W ( ω) e n jωτ ( ν ) jkω where W ( ω) = w ( ) and k 0 n k e τ n ( ν ) = nd cos( ν ) f s / c. Note that w n are the filter coefficients of FIR filters of length L, f s is the sampling frequency and c is the speed of sound in air. The magnitude square of the beamformer response is known as the beampattern of the beamformer. The beampattern describes the beamformer s ability to capture acoustic energy as a function of the angle of arrival of the plane wave. It is defined as [4] P ( ω, ν ) = 20log10 ( B( ω, ν ) ) The directivity for a linear array, which is the ratio of the beampattern in the desired direction with the average over all directions, is given by [2] where ν o is the steering direction. D( ω) = π 1/ 2 0 B( ω, ν ) B( ω, ν ) 2 o 2 n sin( ν ) dν Figure 2 depicts the beampatterns obtained from a 5-element linear uniformly spaced array by utilizing narrowband beamforming design (DSB) [2] and a broadband beamforming design [5] respectively. It can be clearly seen that the beamwidth of the main beam of the DSB design varies with frequency. This leads to marked reduction in directivity as the frequency decreases. On the other hand, the main beam of the broadband beamforming design is approximately frequency-independent and therefore the variation in directivity with frequency is limited. 3

8 200 [db] 0 Frequency in Hz ϑ in degrees [db] Frequency in Hz ϑ in degrees Figure 2: Frequency-dependent and frequency-independent beampatterns The White Noise Gain (WNG) quantifies a beamformers ability to suppress spatially white noise with respect to frequency. It is given by WNG( ω) = w w T F H F d w 2 F [ ] T jωτ M ( ν o ) jωτ M ( ν o ) where d e, K, e and w F = W M ( ω), K, WM ( ω) denote the so-called steering vector and the frequency responses of the beamforming filters respectively. Note that a small WNG at a particular frequency corresponds to a low ability to suppress spatially white noise, resulting in an amplification of the noise at that frequency. Important errors, such as amplitude and phase errors in microphone channels, and microphone position errors, are nearly uncorrelated from sensor to sensor and affect the beamformer in a manner similar to spatially white noise [6]. Hence the WNG is a good measure of the beamformers robustness to errors. = [ ] T 4

9 2.1 Array Geometry D 3.1 MC-AEC, Acoustic Source Localization, and Beamforming The array can take on a variety of different geometries depending on the application of interest. As mentioned previously a simple method of approximating a frequency-independent beampattern and thus cover broadband signals, is to implement the array as a series of subarrays which are themselves linear arrays with uniform spacing. The nested array structure chosen for the DICIT scenario is depicted in Figure 3. It consists of four sub-arrays, three of which consist of five microphones each and one which consists of seven microphones. The microphone spacings for the four sub-arrays are 0.04 m, 0.08 m, 0.16 m and 0.32 m, respectively. The array also consists of two additional microphones which are mounted directly 32 cm above the left- and right-most microphones in the nested array. These will not be utilized for beamforming here since only linear arrays are considered. Figure 3: Harmonically Nested Array Each sub-array is operating in a different frequency range by applying appropriate bandpass filtering to the sub-array outputs. The overall array output is obtained by combining the outputs of the bandlimited sub-arrays as depicted in Figure 4. For a general sub-array broadband beamformer, the beamforming filters are applied to the microphone signals before applying the bandpass filters. For the DICIT prototype bandpass filters were chosen that cover 100K900, 901L1800, 1801L3600, 3601L8000 Hz frequency bands. The sampling frequency is 48 khz. The bandpass filters are FIR filters of length L= 256 and were designed according to the frequency sampling-based finite impulse response filter design [7]. 5

10 Figure 4: Nested sub-array structure 6

11 2.2 Data-independent Beamforming Designs for DICIT The design choice will be made between two fixed non-superdirective beamforming designs and one fixed superdirective beamformer design. Simulations results (e.g. beampatterns, WNG e.t.c) for each of these designs configurations, as applied to the DICIT scenario, will be shown in the following subsections DSB with Dolph-Chebyshev Window Weighting For the DSB with Dolph-Chebyshev Window Weighting (DSB-DC) [2] design, the microphone signals are first weighted by applying a Dolph-Chebyshev window before being processed by a DSB. The DSB design is based on the idea that the desired output contribution of each of the array microphones will be the same, except that each one will be delayed by a different amount. Therefore, if the output of each of the sensors is delayed and weighted appropriately, the signal originating from a desired spatial region will be reinforced, while noise and interfering signals from other spatial regions will generally be attenuated. This is the most robust beamforming design considered for the DICIT project. The major disadvantage of DSB-DC is that it produces a frequency-dependent beampattern since it is a narrowband beamforming design. Figure 5 depicts the beampattern and WNG for a DSB-DC design utilizing the whole nested array. It is clear that by using the nested array a relatively frequency invariant beampattern is obtained and this leads to improved spatial selectivity. Note that the sidelobes appearing at about 7 khz are due to spatial aliasing. The WNG figure shows that this design is very robust to errors. The WNG at lower frequencies is higher than that at the higher frequencies. This is due to the fact that the sub-array covering the lower frequencies consists of seven microphones while the other three sub-arrays consist of five microphones each. The higher the number of microphones used in the sub-array the higher the WNG for the DSB-DC design. 7

12 Figure 5: Beampattern and WNG for DSB design FSB based on a Dolph-Chebyshev Design For a FSB based on a Dolph-Chebyshev Design (FSB-DC), the FIR filters are obtained by applying Dolph-Chebyshev windows with a predefined frequency-independent peak-to-zero distance of the beampattern [4] to a set of discrete frequencies. These frequency-dependent Dolph-Chebyshev windows are then fed into the Fourier approximation filter design to determine the FIR filters. A FSB-DC is designed, where the first null is frequencyindependent for frequencies greater than a given lower limit. This lower limit is determined by the microphone spacing [4]. For frequencies below this limit, a simple DSB is designed. This beamforming design is less robust than the DSB-DC but guaranties a frequencyindependent beampattern above the given lower limit. It suffers from the same problems as 8

13 the DSB-DC below this limit. Figure 6 depicts the beampattern and WNG obtained for a FSB- DC design using the nested array with each sub-array consisting of 5 microphones. In comparison to the beampattern in Figure 5, the main beam is narrower and shows an improved frequency-invariance but it also shows that the attenuation of off-axis signals is lower. The WNG is very similar to that of the DSB-DC design and this design is therefore also robust. Figure 6: Beampattern and WNG for FSB-DC Least-Squares Frequency-Invariant Beamformer As a novel and very general beamformer design method, the Least-Squares Frequency- Invariant Beamformer (LS-FIB) design [5] uses a linear basis which optimally approximates desired spatio-spectral array characteristics in the least-squares sense and inherently leads to superdirective beamformers for low frequencies, if the aperture is small relative to the wavelengths [5]. 9

14 Figure 7 depicts the beampattern and WNG obtained for a LS-FIB design using the nested array with each sub-array consisting of 5 microphones. In comparison to the beampatterns of the previous designs, the main lobe is narrowest and compares favorably with the FSB-DC in terms of frequency-invariance. The major advantage of this design is that there is good spatial selectivity at the very low frequencies. The WNG is very small at very low frequencies. This means that this design is very sensitive to errors. Figure 7: Beampattern and WNG for LS-FIB Due to its superdirective nature, the LS-FIB design gives the best spatial selectivity when the number of sensors is small. It has no restrictions on sensor positioning but is very sensitive to random errors and thus small random errors lead to a significant loss in spatial selectivity. This becomes more significant as the number of microphones increases. The sensitivity of the design may be reduced by adjusting some design parameters but this leads to a loss in spatial 10

15 selectivity. The use of matched microphones with a low self-noise and well calibrated arrays is strongly recommended when using this design. 2.3 Beamforming Module Structure for DICIT In the first prototype for the DICIT project, the adaptive beamforming module will be made up of two units, namely, the Steering Unit (SU) and the Fixed Beamforming (FBF) unit as depicted in Figure 8. The SU consists of a set of fractional delay filters which facilitate the steering of the beam to the desired look direction in order to track movements of the source. The desired look direction will be supplied by the source localization module. In the FBF unit the FSB-DC will be utilized due to its relatively frequency-invariant main beam and its robustness to errors. Figure 8: Signal flow block diagram 3. Multi-channel Acoustic Echo Cancellation (MC-AEC) Acoustic echoes appear due to the coupling between loudspeakers and microphones, i.e. due to the lack of acoustical barriers: Apart from the speech uttered by the near-end speaker v(k) and a noise signal, the microphones in the receiving room also acquire the far-end signal that is played back via the loudspeakers. The term Acoustic Echo Cancellation is employed for each signal processing technique that aims at reducing the reverberated loudspeaker signals within a microphone signal y(k). In the DICIT scenario the loudspeaker signals constitute the DICIT system audio output, which consists of the TV audio signals and the Text-to-Speech output driven by the system s dialogue manager. Instead of disturbing a communication between humans, in this case acoustic echoes impair machine-based speech recognition. 11

16 Thus AEC is a crucial means to improve the recognition rate of an Automatic Speech Recognizer (ASR), providing the ASR with the echo-compensated signal e(k) that should mainly contain the utterance of the desired speaker v(k). Figure 9 depicts the employment of AEC in a Human-Machine-Interface System as is implemented in the DICIT project. Figure 9: MC-AEC in Human-Machine-Interface System The relation between the original loudspeaker signals and their contribution to y(k) is established by the time-variant impulse responses of the Loudspeaker-Enclosure-Microphone (LEM) system ĥ 1 (k)...ĥ p (k), with P being the number of channels time-variance is due to continuous changes of the acoustic environment, e.g. caused by temperature changes, door openings or user movements. An Acoustic Echo Canceller (AEC) as depicted in Figure 9 models these impulse responses by means of digital filters ĥ 1 (k) ĥ p (k). The echo replicas ŷ 1 (k)...ŷ p (k) computed via convolution of the AEC filter responses with the known loudspeaker signals x 1 (k)...x p (k) are then subtracted from the microphone signal y(k), leading to the desired echo reduction. As to the design of ĥ 1 (k)..ĥ p (k), adaptive filters are an adequate means to track the temporal variations of the LEM system. Among the different filter structures that have been studied, finite impulse response (FIR) filters are usually chosen for simplicity reasons on the one hand but they also guarantee stability during adaptation, which an infinite impulse response (IIR) structure does not. However, the employment of FIR filters necessarily implies a certain error due to the approximation of infinite LEM impulse responses by finite models. The related system mismatch (tail effect), is considered as part of the noise contribution. In the following, the Generalized Frequency Domain Adaptive Filtering (GFDAF) concept is outlined in Section 3.1 as an adequate algorithm for realizing MC-AEC in DICIT, and Stereo- AEC in particular for the first prototype [8]. Section 3.2 describes a new channel decorrelation approach according to [9]. 12

17 3.1 Generalized Frequency Domain Adaptive Filtering (GFDAF) As described in [8], the low-complexity algorithms used in conventional single-channel AEC, such as the Normalized Least Mean Square (NLMS) algorithm, do not achieve sufficient convergence results when used for MC-AEC. This is due to the fact that these algorithms do not take the cross-correlations between the different channels into account. Consequently, not only the convergence rate is slowed down, but moreover, the solution for the adaptive filters may diverge. Figure 10: MC-AEC misalignment convergence comparison for NLMS and FDAF [8] The effect of taking the cross-correlations into account is depicted in Figure 10, which shows the misalignment convergence for FDAF and the basic NLMS in the multi-channel cases P=2 (respective lowest curve) to P=5 (respective uppermost curve) [8]. In the case of time-invariant environments and stationary highly correlated signals, the optimal choice for the AEC adaptive algorithm in the time-domain is the Recursive Least Squares (RLS) algorithm. However, its high computational complexity and its sensitivity to nonstationary signal statistics discourage its use in real-time applications. In the case of MC- AEC, the correlation matrix is worse conditioned compared to single-channel AEC scenarios (i.e. not only are the input channels highly auto-correlated but also cross-correlated), which implies that the inversion of the autocorrelation matrix of the input channel becomes numerically highly sensitive and the recursive computation of the inverse autocorrelation matrix for the RLS using the matrix inversion lemma leads almost surely to numerical instabilities. In conclusion, with respect to MC-AEC the use of both simple NLMS-like algorithms as well as RLS-based algorithms is discarded due to poor efficiency or high computational complexity and numerical sensitivity, respectively. As an alternative solution, algorithms for frequency-domain adaptive filtering offer fast convergence combined with acceptable computational complexity. These types of algorithms 13

18 are based on a realization of all filtering operations as fast convolutions in the DFT domain and the resulting applicability of the Fast Fourier Transform (FFT). The algorithm employed for the DICIT scenario is based on the Generalized Frequency Domain Adaptive Filtering (GFDAF) paradigm, presented in [8]. Employing the computational efficiency of the FFT it factors in the cross-correlations among the different channels of the input signal and thereby enables a faster convergence of the filters. This faster filter adaptation manifests itself in faster echo suppression. In DICIT this is especially important because user movements have to be expected in the Interactive TV scenario, which in turn imply rapid changes of the impulse responses of the LEM-system that has to be identified by the adaptive filters. The chosen algorithm thus constitutes an appropriate solution to the addressed problem. A preceding channel decorrelation that will be described in the following Section 3.2 allows a further speeding-up of the filter convergence. For the first prototype a Stereo-AEC algorithm has been implemented, which is intended to be extended to a 5.1-version for the second prototype. 3.2 Channel Decorrelation for MC-AEC As already mentioned above, the channels of the reference input signal are usually very similar and therefore not only highly auto-correlated but also often strongly cross-correlated. Without decorrelating the different channels prior to playback and echo path estimation, these strong cross-correlations lead to an ambiguity with respect to the solution that minimizes the underlying error criterion. Therefore the algorithm might converge to a solution, which minimizes the block error signal and therefore leads to an echo reduction, but without modelling the correct impulse responses of the LEM-system. Consequently, a change of the acoustical environment might result in a total system mismatch and thus in a breakdown of the AEC performance until the filters have converged to a new solution. Nonetheless, this is not the only requirement that has to be met by the employed channel decorrelation technique. In addition, subjective audio quality of the TV output must not be impaired by the decorrelation, i.e. the introduced signal manipulations must not cause audible artifacts this is especially important in a multimedia application like the DICIT interactive TV scenario. Furthermore, with respect to the real-time application and costs simplicity is a crucial issue, as the channel pre-processing should not require an excessive amount of computational resources, in order to minimize the total computational expenses, and consequently equipments cost. Summarizing, the decorrelation of the loudspeaker signals is thus decisive for robust AEC and a fast convergence of the filters. Compared to the nonlinear pre-processing method that has been applied so far, the phase modulation-based approach according to [9] that will be described in the following enhances convergence behavior of the adaptive filters without impairing the subjective audio quality, i.e. without destroying the spatial (stereo) image of the reproduced sound. Please note that for DICIT the first running real-time implementation of this scheme has been developed. 14

19 The decorrelation strategy to be employed in DICIT is based on psychoacoustics and the realization that applying a time-varying phase modulation to a monophonic audio signal is a relatively simple method which does not damage the perceptual sound quality of the signal. However, applying a phase-modulation to a stereo pair might degrade the perceived sound image. Therefore, in order to obtain a maximum decorrelation of the signals, but without altering the stereo image, the interaural phase differences introduced by the time-varing phase modulation must not surpass the threshold of perception however, to achieve the highest possible degree of channel decorrelation, these introduced signal manipulations should be chosen as close as possible to the limit given by the threshold of perception. Moreover, the phase differences are not equally perceived by the human hearing in different frequency ranges [9] as depicted in Figure 11, the sensitivity decreases gradually with increasing frequency until it vanishes for frequencies above 4 khz. Therefore, a frequencyselective approach based on psychoacoustics appears appropriate to deliver best possible channel decorrelation for the DICIT scenario. Figure 11: Phase modulation amplitude as a function of frequency subband [9] In practice, it is implemented by employing pairs of analysis and synthesis filterbanks, and a phase modulation to the transform-domain coefficients after analysis. The complete phase modulation block diagram, for a stereophonic application, is depicted in Figure 12. Figure 12: Stereo-decorrelation employing frequency-dependent phase modulation [9] As for the filterbank design, a Modulated Complex Lapped Transform (MCLT) is employed, which was first introduced in [10]. A window length chosen as L=128 results in 64 15

20 MCLT domain subbands. This complex-valued modulation allows for perfect reconstruction of the signal after overlapping the output blocks by 50%. Using a complex-valued filter bank allows the easy manipulation of the phase in the individual frequency subbands. The phase modulation is performed by a multiplication or division of the MCLT-domain coefficients with ( t, ) e ϕ s, where ϕ ( t, s) is the time-varying phase shift. This phase shift is composed of a modulation function multiplied by a subband-dependent scaling factor. As explained in [9], the chosen modulation function must be smooth and the modulation frequency must be low, to be sure to not introduce a perceptible frequency shift (a frequency modulation with a frequency shift proportional to the derivative of the phase modulation function is introduced as a consequence of the phase modulation). Nevertheless, the modulation frequency has to be carefully chosen, due to the fact that the decorrelation introduced by an extremely low modulation frequency will not lead to a sufficient enhancement of the echo cancellation performance. For reasons of simplicity, a sine wave with a relatively low modulation frequency was chosen. The time-varying phase shift is given by ϕ ( t, s) = a( s)sin(2π f, t), with f, = m stereo m stereo The subband-dependent scaling factors a(s) were designed and optimized by a listening test procedure by the Fraunhofer Institute for Integrated Circuits (IIS, Erlangen, Germany), Audio Group [9]. The modulation scale factors for the first 12 subbands are chosen according to the curve depicted in Figure 11. Note that the amplitude of the phase modulation introduced in the first coefficients is very low but increases with increasing subband number and becomes maximal within the seventh subband this corresponds to a frequency of 2.5 khz (given M=64). Besides its simplicity the chosen decorrelation scheme is also attractive as it can easily be extended to applications with more than two playback channels, such as the foreseen second DICIT prototype enabling 5.1-playback. In this case the channels are grouped into channel pairs with each pair being treated like a separate stereo signal. Every pair will be preprocessed by employing the described modulation scheme, but with different modulation frequencies for the phase modulation. To provide an orthogonal modulation of all the modulators, the modulation frequencies chosen must be non-commensurate. It was mentioned before that the design of the algorithmic parameters were justified by listening tests. The following Figure 13 depicts the results of a listening test according to the MUSHRA standard ( Multi Stimulus test with Hidden Reference and Anchor ) that has been conducted by the Fraunhofer Institute for Integrated Circuits (IIS, Erlangen, Germany) [9]. The outcome is based on the assessment of ten subjects, nine of them experienced listeners. It is evident that the phase modulation scheme (indicated in green) clearly outperforms all other investigated decorrelation methods, including the nonlinear pre-processing which is represented by the blue bars. Being the only method whose corresponding signal quality was 16

21 constantly rated excellent except for the signal glock (nevertheless delivering the best result good ) the phase modulation is the only approach that practically preserves subjective audio quality concerning state-of-the-art. Figure 13: Subjective audio quality for pre-processing methods [9] Note that the result already reflects the quality of the pre-processing scheme employed in a 5.1-scenario. Since the processing of more than two channels is based on a pair-wise preprocessing as noted before the results illustrated in Figure 13 are not valid for 5.1 only, but also for the stereo case. Figure 14: Convergence comparison of pre-processing methods for stereo AEC [9] Concluding, Figure 14, taken out of [9], illustrates the convergence acceleration of the coefficient error norm due to the various pre-processing methods. Considering the fact that the phase modulation approach practically does not impair the subjective quality of the signals, a significant improvement of the convergence behavior is observable, thus suggesting this approach to be an adequate processing scheme for the DICIT prototype. 17

22 4. Source Localization (SLoc) From a general point of view, the source localization (SLoc) problem consists in deriving the position of one or more emitting sources given the acoustic measurements provided by a set of sensors. This research field has been widely investigated over the years and a wide range of different approaches has been proposed in the literature [14]. The most common technique relies on the estimation of the time difference that characterizes the different propagation paths between a source and two sensors. This difference is referred to as Time Difference of Arrival (TDOA). For a single microphone pair, the loci of points which pertains to a given TDOA is one of the sheets of a hyperboloid of two sheets as depicted in Figure 15. Figure 15: Loci of points that satisfy a given TDOA at two microphones m1 and m2 When a point-like source is in a far field position the wave fronts can be assumed to be planar and the hyperboloid can be approximated by a cone. If we restrict the analysis to a plane, each TDOA corresponds to a Direction of Arrival (DOA) as explained in Figure 16. Figure 16: TDOA given two microphones and a source in far field position. ν identifies the direction of arrival. When a set of TDOA estimations is available from a set of microphone pairs, one can derive the position of the source as the point in space that best fits all the measurements. From this point of view, distributed microphone networks, similar to those exploited in the CHIL project, guarantee a more uniform coverage of the space than compact arrays and permit to design very accurate localization algorithms. The most critical issue for a SLoc algorithm is reverberation which is generated by the reflections that characterize the acoustic wave propagation in an enclosure. Reverberation is 18

23 critical because a virtual sound source is generated wherever a reflection occurs [15]. Although reflections considerably weaken the signal and hence the real source is always predominant on virtual ones, in some cases constructive interferences between reflected patterns may compromise the TDOA estimation. Environmental noise and presence of coherent noise sources concur to render the SLoc task more difficult. 4.1 SLoc in DICIT Within the DICIT project, the goal of the SLoc module is to provide the beamformer with accurate information about the position of the active speaker. The localization information is made available also to other modules, as for instance the Smart Speech Filtering (SSF) module, which may exploit the desired speaker s position as an extra feature in an attempt to improve their own performance. Besides the typical issues related to the SLoc problem, goals and constraints of the DICIT project, as for instance the application scenario and the sensor setup, introduce particular issues which influence the algorithm design. Next sections briefly describe such problems. Finally, the DICIT scenario calls for the localization of several sources if several users are taken into account. For this situation there is an established method based on Blind Source Separation [11], [12], for which a real-time demonstrator is already available and which will be evaluated for the DICIT scenario in Task 3.4 and which will be described in Deliverable D3.2. In Task 3.3, covered by this deliverable, a new coherence-based localization algorithm for multiple sources will be described that provides a beamformer with accurate information about the position of the desired speaker Array Design In recent years, the SLoc community has shown a growing interest for distributed microphone networks and circular arrays. Unfortunately, the characteristics of the DICIT application do not allow this kind of sensor deployment and hence a linear nested array is adopted, as depicted in Figure 4. The nested array allows to exploit different sub-arrays in order to meet the requirements of each different technology in terms of inter-microphone distance. For SLoc purposes we exploit the 7 microphones at 32 cm distance plus the two vertical ones which permit to estimate the vertical position of the speaker. When using TDOA estimation for localization, the distance between sensors is a crucial aspect: a large inter-microphone distance guarantees a higher resolution but at the cost of a reduced performance in terms of TDOA estimation as the coherence of the desired signal between the microphones decreases. An inter-microphone distance of 32 cm is a reasonable trade-off for the first prototype. It is worth underlining that the adoption of a compact array, where microphones are deployed along a single horizontal line, renders the estimation of distance difficult [16], and makes the estimation of elevation impossible. Figure 17 illustrates how microphone arrangement influences the localization accuracy. Let us assume that two microphone pairs are deployed on the same wall and let us consider a set of positions distributed on a grid 7x7 in a room. From 19

24 each of the TDOAs corresponding to those positions a set of 1000 noisy TDOAs is derived by summing white Gaussian noise. The source positions are then computed from the noisy TDOAs through simple triangulation, i.e. crossing DOAs, resulting in 1000 points distributed around the original position. Obviously the shapes of the estimation distributions depend on the original position and the noisy TDOA measurements introduce a higher uncertainty along the radial direction relative to the microphone pairs center. Figure 17: Effect of noisy time delay estimations in a double microphone pair set-up. Microphones-pairs are identified by the two pairs of black points on the left. As final remark, a compact array clearly requires that users speak towards, i.e. frontally, the sensors, otherwise localization can not be carried out due to lack of direct propagation paths. However, inviting the user to look at the television he/she is trying to control is a reasonable constraint Application Scenario Aside from the above-mentioned array design issues, the SLoc problem in DICIT is further complicated by the application scenario. The behavior of naïve users was monitored in a series of WOZ experiments. In particular it was observed that users tend to pronounce very short utterances that correspond to single commands for the system. As a consequence, silence is predominant and the overall length of speech segments is only about 15-20% of the interaction with the system. It was also observed that users change their positions while being silent between two consecutive commands. Finally, it is worth mentioning that the final prototype is expected to work in the presence of a home-theatre audio system where some loudspeakers will be located in a frontal position with respect to the microphone array. As a consequence loudspeaker signals must be correctly dealt with in order to ensure the correct system operation. 20

25 4.2 Adopted SLoc Approach Given the project goals and specific issues, in this section we present the solution adopted in DICIT. Figure 18 presents the block diagram of the SLoc module. Figure 18: SLoc module block diagram. As already mentioned, the most common approach to the SLoc problem is based on TDOA estimations at a set of microphone pairs. A very efficient method to estimate the TDOA is provided by the CrossPower-Spectrum Phase analysis (CSP) [17], also known as Generalized Cross Correlation Phase Transform (GCC-PHAT) [18], which is in general used for a single source, but shall now also form the basis for localizing several simultaneously active sources. Note that multiple simultaneously active sound sources will also be considered in Task 3.4 of the DICIT project and TDOA estimation methods based on blind source separation [11], [12], [13] will be investigated in this context. As for the CSP method for a single source, let us consider a microphone pair p and denote as x p1 (k) and x p2 (k) the signals acquired by the two microphones. CSP is defined as: C ( τ ) = DFT p 1 X X p1 p1 X X p2 p2 Where X p1 and X p2 are the DFT transforms of the acquired signals and τ denotes the time lag. CSP provides a measure of the similarity between the two signals aligned according to the given time lag. It has been demonstrated that in noiseless free-field conditions, CSP presents a prominent peak in correspondence with the actual TDOA. Conversely, in highly reverberant environments reflections may give rise to spurious peaks. A large inter-microphone distance guarantees a better resolution but decreases the robustness of the approach. On the other hand, a higher sampling rate may be exploited to increase the resolution at the cost of a heavier computational load. As indicated before, in a multi-microphone scenario, as the one envisioned in DICIT, a set of TDOA measurements, computed for a set of microphone pairs, can be combined in order to obtain a single accurate estimation of the source position. In general the computed result is the 21

26 point in space that best fits the set of measurements. Depending on the characteristics of the particular problem, several different approaches can be found in literature. In our implementation we chose a steered beamformer-like approach, based on CSP, which performs a full search on the space of possible source positions. Let us assume that a set of P microphones is available and let us define a grid Σ of points s in order to uniformly cover the spatial area of interest. The adopted approach computes a function, defined over the above introduced grid that represents a measure of the plausibility that an active source is present in s. The resulting function, which can be evaluated in either three or two dimensions, is also referred to as acoustic map as it gives a representation of the acoustic activity in an enclosure. The speaker position estimation is obtained by maximizing such a map. The plausibility can be evaluated in several ways; in the next sections we will present the two methods we have so far implemented in DICIT. This kind of approach is easy to implement and it represents a straightforward way to exploit the redundancy provided by a microphone array. It guarantees a high level of flexibility that allows us to quickly modify the implementation according to both the experimental results and the behaviour of the first prototype. It can be easily downscaled in order to meet potential computational power limitation. Moreover this approach does not make any assumption on the characteristics of the scenario and so it is suitable to be employed as prototype in a real environment. Finally, as we will see later, this method is proper to apply in a multispeaker context and allows to efficiently handle the TV surrounding Global Coherence Field The first approach that we have implemented is based on the Global Coherence Field (GCF) theory [19]. For each point s on the grid Σ, the GCF function is defined as follows: 1 GCF( s) = C p ( δ p ( s)) P P 1 p= 0 Where δ p (s) is the geometrically determined TDOA at microphone pair p if the source is located in s. As mentioned above, the source position can be estimated as the point that maximizes the GCF. In our implementation δ p (s) is rounded to the closest integer delay (in samples). In a distributed microphone network scenario, GCF is suitable to be extended to the so-called Oriented Global Coherence Field (OGCF) [20] which is capable of estimating also the orientation of the source. However, the adopted linear array provides a limited angular coverage for a directional ( oriented ) source and as a consequence the potential of OGCF can be only partially utilized. 22

27 4.2.2 Sub-optimal Least Squares As an alternative approach, a sub-optimal Least Squares (LS) method has been implemented. The solution is sub-optimal because the search for the point that minimizes the LS criterion is restricted to a sampled version of the space of source coordinates, i.e. the grid Σ. The resulting acoustic map is computed in the following way [21]: 1 LS( s) = P P 1 p= 0 ( τ δ () s ) p p where τ p is the time lag that maximizes Cp(τ). 2 Again the source position estimation is the point that maximizes the objective function. This kind of approach is less critical from a computational load point of view since it does not require to always access the CSP vectors. However it is intrinsically weaker than the GCF because the decision on τ p is taken before combining single contributions. Nevertheless, in a linear array scenario, where the speaker is always supposed to be facing the microphones, this approach delivers satisfactory performance that is in line with the one obtained with a GCF method. Moreover, this solution allows refining the TDOA estimations through interpolation, which is not feasible in practice on the whole CSP functions as required in a GCF implementation Tracking In order to guarantee smooth localization outputs, single-frame localization estimates are processed by a threshold-and-hold filter. As a matter of fact, in a real implementation some localization estimates are less reliable due to the spectral content of the corresponding speech frames as well as due to the level of noise. Such localizations are characterized by a low peak in the acoustic map and are randomly spread over the search area. The idea is to filter out those localizations whose corresponding acoustic map peaks are below a given threshold and those isolated outliers that are too distant from the current tracking area. When a frame is skipped the filter keeps the previous localization estimate. The post-processing works in the following 3 steps: 1. If the acoustic map peak is below the threshold, skip the frame, otherwise go to next step; 2. If the current localization is close to previous localizations, take it as a good localization otherwise go to next step; 3. If a sufficient number of localization estimates concentrates in a particular area, take the current localization as a good localization. This simple and light computation is capable of guaranteeing sufficient accuracy as reported in the next section. As it was observed that users prefer to move while not speaking this implementation is ready to quickly react when the user moves to a different position. 23

28 It is worth mentioning that an activity investigating the implementation of a particle filtering approach in the DICIT scenario has been recently started Experimental Results A series of localization experiments were run on the acoustic WOZ data collection (please refer to D6.2 for further details on the data collection and for a description of the annotation and labelling process). It is worth remarking that the addressed scenario is very challenging from a localization point of view because of its characteristics such as very short spoken sentences and very long pauses. The evaluation metrics are derived from those adopted in previous evaluations [22]. The basic metric to evaluate SLoc methods is the euclidean distance between the coordinates delivered by the localization system and the reference coordinates. An error is classified as fine if it is lower than 50 cm; otherwise it is classified as gross. A fine error is due to a correct but noisy source position estimation, on the other hand a gross error occurs when a faulty localization is delivered by the tracking algorithm. Distinction between fine and gross errors guarantees a better understanding of the real performance of a localization system. As a matter of fact few large errors, due to particular environmental conditions, may considerably affect the overall performance. Given the above mentioned metrics, the evaluation of SLoc algorithms is carried out in terms of: Localization rate: percentage of fine errors with respect to all the localization outputs; RMSE: overall root mean square error (mm); fine RMSE: root mean square error computed only on fine errors (mm); Angular RMSE: root mean square angular error with respect to the center of the array; Bias: single coordinate average error (mm); Deletion rate: percentage of frames for which a localization is not available (due to post-processing) and the previous value is kept. The localization algorithm executes a 2D search for a single source and it does not perform any speech activity detection. Given the 2D position estimate, the speaker height is derived by exploiting the vertical TDOA, i.e. computed at vertical microphone pair, with the highest CSP value. As a matter of fact, the automatic extraction of speaker coordinates from video recording and the manual transcriptions are not completely reliable. For this reason we first compare our algorithms under more controlled conditions where spatial and temporal labelings are easier to extract and control. The evaluation is hence restricted to the very beginning of each WOZ session when users are asked to read some phonetically rich sentences while sitting in front of the array. Figure 19 shows localization performance in terms of angular RMSE when different thresholds are applied, resulting in different deletion rates. 24

29 4 3.5 GCF LS 3 RMS (deg) deletion rate Figure 19: Localization performance in terms of angular RMSE when different thresholds, corresponding to different deletion rates, are applied As expected, there is not significant difference in the performance of the two methods. It is hence not possible to definitely assess the effectiveness of the two approaches under investigation on the basis of these evaluation results. It is worth underling that LS seems to perform slightly better than GCF as soon as few localizations are discarded. The explanation lies in the fact that LS does not require rounding of time delays. Table 1 describes the evaluation results on the whole WOZ data collection in terms of the above introduced measures. The reported results are obtained setting the post-processing threshold to the value which delivers the best overall performance and are measured on speech frames only by applying a manual segmentation, based on transcriptions, as speech activity detector. Method Loc Rate Fine RMS (mm) RMS (mm) Angular RMS (deg) Bias (mm) Deletion rate GCF 92% (-15,-180,-7) 13.5% LS 93% (-7, -166,-5) 10% Table 1: Evaluation results on the WOZ data collection. As mentioned above, references and transcriptions are prone to errors and hence these results must be considered as tendencies more than localization performance. As general results, it is worth mentioning that due to the limited vertical coverage the speaker elevation is less accurate which is acceptable as it is not critical for the overall system performance. Evaluation results in terms of bias confirm also that the uncertainty is higher along the y-axis that is orthogonal to the array. Concluding, although this analysis allowed us to understand the criticalities of the problem and gave us an idea of the potential of our algorithms, we think that the behaviour of the overall DICIT prototype is the best metric to evaluate each front-end component. Frontal/Non-Frontal Speaker Detection As a side effect of the localization process, one can obtain clues that help to understand whether the talker is facing the system, and hence whether he is talking to it or to somebody 25

30 else. The peak of the acoustic map, or CSP, turned out to be a feature related to the speaker s head orientation. However, some experimental results showed that this information alone is not enough and alternative features should be found out and applied in combination. For instance the SSF module could integrate the above mentioned feature with any other information available at that level. Further research activities on this topic will be conducted during the next months Multiple Sources The presented approaches have been widely adopted to tackle the SLoc problem when limited to a single source. When two or more sources are simultaneously active, it is reasonable to assume that an acoustic map presents two or more peaks in correspondence with the sources. However, searching for two local maxima may fail in the given context. In the presence of two speakers, depending on the spectral contents of the involved signals, the main peak jumps from one source to the other while the second peak may be considerably lower than the main one and may be overtaken by spurious peaks. Figure 20 shows an example of the GCF function when two sources are simultaneously active. In this case the two sources, denoted by circles, are on the left and on the right of the nested microphone array that is placed in the lower part of the picture. It can be observed that most of the coherence concentrates around the speaker on the left, while the peak on the right is quite smooth. Figure 20: GCF acoustic map in presence of two sources. Dark colors mean small GCF valueswhile bright colors identify large values. In order to deal with this problem we devised an approach that attempts to de-emphasize the main peak in order to make the detection of the second one easier. Our proposed method works as follows [23]: 1. search for the main peak; 2. modify each single CSP by lowering those values at time lags that generated the main peak; 3. compute a new map and search for the maximum. If we define as s 1 the position of the main peak, the de-emphasized version of each single CSP contribution is obtained as follows: 26

31 C p ( ) ( τ ) = C ( τ ) φ τ δ ( ) p, p s 1 A suitable definition of the de-emphasis function φ is the following: φ ( τ, μ) = α b τ μ e b b The parameter b determines the spatial selectivity of the de-emphasis function. Figure 21: Multiple speaker localization. The de-emphasis process is highlighted in the dotted box. Figure 21 graphically shows how the de-emphasis process is applied in order to identify the peak associated to the second speaker. Figure 22 shows the same GCF map as in Figure 20 after the CSP de-emphasis process has been applied. Figure 22: GCF map of Figure 20 after the de-emphasis process. Unfortunately the de-emphasis process not only highlights the second source but also increases the relative level of background amplitude in the map due to reverberation. However, in general the method is very effective in enhancing the peak related to the second sound source position. This approach can be combined with a spatio-temporal clustering that monitors the spatial and time persistency of source positions. 27

32 It is worth underlining that the same method can be applied to both GCF and LS localization approaches presented in this document Loudspeakers as Additional Sources In the DICIT project the SLoc module is expected to work even in the presence of surrounding TV outputs. As far as the first prototype is concerned, the TV output is diffused by two loudspeakers located next to the TV and the array as depicted in Figure 23. Figure 23: Configuration of the first DICIT prototype. This configuration may turn out to be more or less critical for SLoc depending on the reverberation level of the room. In particular, in a highly reverberant room, the small coherence contribution of loudspeakers tend to disappear in the overall reverberation and does not affect the localization performance. Conversely, in a less reverberant room, as the one where the first prototype will be running, an even small coherence contribution is evident. As a consequence TV outputs must be taken into account in order to guarantee an accurate estimation of the speaker position. Each loudspeaker can be handled as a further speaker whose position is known and can be deleted adopting the same approach that is exploited to tackle the multiple speaker scenario. Notice that also in a reverberant environment, a human talker facing the microphones is always prevailing on loudspeakers and it is not weakened by the deletion process, even in the presence of high TV output levels. It is worth remarking that all the above statements are valid in the given configuration and there is no claim of generalization. In particular when a 5.1 TV output system will be adopted, as envisioned in the final prototype, reverberation will not help anymore as some loudspeakers will be frontal to the array Real-time Implementation A real time implementation of the source localization module is available based on a multithread architecture. The SLoc module is run in a thread and reads input data from a ring buffer. The implementations allows for on-line switching between the two map computation 28

33 approaches (GCF and LS) and for on-line parameter tuning. Current implementation correctly tackles the presence of two loudspeakers by applying the multiple source localization approach. Localization of multiple users is not yet implemented in real time as it is not foreseen in the first prototype. 5. Multi-channel Acoustic Processing Subsystem This section deals with the implementation issues concerning the Multi-channel Acoustic Processing Subsystem (MAPS). The audio processing of the MAPS is obtained combining different software modules: Beamforming (BF), Source Localization (SLoc), Preprocessing (PreProc), Two-channel Acoustic Echo Cancellation (2C-AEC), and Smart Speech Filtering (SSF) including Speech Activity Detection (SAD). Within the MAPS a main program will be executed that will take care of the audio input/output and data processing. The various processing modules will be organized as libraries that will be exchanged between the partners. After an initial phase of setup and configuration, the main loop of the program is composed by the acquisition section and three modules that run sequentially. However, parallel processing will be investigated where possible, using threads and a multi-core CPU. Figure 24 shows the software structure of the MAPS. The acquired input data frame is made available in a ring buffer that will be available to the module. In order to ensure that the system s response time will not be affected, BF module will not wait for SLoc output but will use the previous results, in case the localization module runs slower than real-time. Figure 24: Block structure of Multi-channel Acoustic Processing 29

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING ADAPTIVE ANTENNAS TYPES OF BEAMFORMING 1 1- Outlines This chapter will introduce : Essential terminologies for beamforming; BF Demonstrating the function of the complex weights and how the phase and amplitude

More information

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

arxiv: v1 [cs.sd] 4 Dec 2018

arxiv: v1 [cs.sd] 4 Dec 2018 LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and

More information

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Joint Position-Pitch Decomposition for Multi-Speaker Tracking Joint Position-Pitch Decomposition for Multi-Speaker Tracking SPSC Laboratory, TU Graz 1 Contents: 1. Microphone Arrays SPSC circular array Beamforming 2. Source Localization Direction of Arrival (DoA)

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques Antennas and Propagation : Array Signal Processing and Parametric Estimation Techniques Introduction Time-domain Signal Processing Fourier spectral analysis Identify important frequency-content of signal

More information

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Airo Interantional Research Journal September, 2013 Volume II, ISSN: Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction

More information

Smart antenna technology

Smart antenna technology Smart antenna technology In mobile communication systems, capacity and performance are usually limited by two major impairments. They are multipath and co-channel interference [5]. Multipath is a condition

More information

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory

More information

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k DSP First, 2e Signal Processing First Lab S-3: Beamforming with Phasors Pre-Lab: Read the Pre-Lab and do all the exercises in the Pre-Lab section prior to attending lab. Verification: The Exercise section

More information

Broadband Microphone Arrays for Speech Acquisition

Broadband Microphone Arrays for Speech Acquisition Broadband Microphone Arrays for Speech Acquisition Darren B. Ward Acoustics and Speech Research Dept. Bell Labs, Lucent Technologies Murray Hill, NJ 07974, USA Robert C. Williamson Dept. of Engineering,

More information

Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany Audio Engineering Society Convention Paper Presented at the 6th Convention 2004 May 8 Berlin, Germany This convention paper has been reproduced from the author's advance manuscript, without editing, corrections,

More information

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE APPLICATION NOTE AN22 FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE This application note covers engineering details behind the latency of MEMS microphones. Major components of

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Module 1: Introduction to Experimental Techniques Lecture 2: Sources of error. The Lecture Contains: Sources of Error in Measurement

Module 1: Introduction to Experimental Techniques Lecture 2: Sources of error. The Lecture Contains: Sources of Error in Measurement The Lecture Contains: Sources of Error in Measurement Signal-To-Noise Ratio Analog-to-Digital Conversion of Measurement Data A/D Conversion Digitalization Errors due to A/D Conversion file:///g /optical_measurement/lecture2/2_1.htm[5/7/2012

More information

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer Michael Brandstein Darren Ward (Eds.) Microphone Arrays Signal Processing Techniques and Applications With 149 Figures Springer Contents Part I. Speech Enhancement 1 Constant Directivity Beamforming Darren

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

An Introduction to Digital Steering

An Introduction to Digital Steering An Introduction to Digital Steering The line array s introduction to the professional audio market in the 90s signaled a revolution for both live concert applications and installations. With a high directivity

More information

Microphone Array Design and Beamforming

Microphone Array Design and Beamforming Microphone Array Design and Beamforming Heinrich Löllmann Multimedia Communications and Signal Processing heinrich.loellmann@fau.de with contributions from Vladi Tourbabin and Hendrik Barfuss EUSIPCO Tutorial

More information

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking The 7th International Conference on Signal Processing Applications & Technology, Boston MA, pp. 476-480, 7-10 October 1996. Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic

More information

6 Uplink is from the mobile to the base station.

6 Uplink is from the mobile to the base station. It is well known that by using the directional properties of adaptive arrays, the interference from multiple users operating on the same channel as the desired user in a time division multiple access (TDMA)

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Measuring impulse responses containing complete spatial information ABSTRACT

Measuring impulse responses containing complete spatial information ABSTRACT Measuring impulse responses containing complete spatial information Angelo Farina, Paolo Martignon, Andrea Capra, Simone Fontana University of Parma, Industrial Eng. Dept., via delle Scienze 181/A, 43100

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.2 MICROPHONE ARRAY

More information

Comparison of LMS Adaptive Beamforming Techniques in Microphone Arrays

Comparison of LMS Adaptive Beamforming Techniques in Microphone Arrays SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 12, No. 1, February 2015, 1-16 UDC: 621.395.61/.616:621.3.072.9 DOI: 10.2298/SJEE1501001B Comparison of LMS Adaptive Beamforming Techniques in Microphone

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

GSM Interference Cancellation For Forensic Audio

GSM Interference Cancellation For Forensic Audio Application Report BACK April 2001 GSM Interference Cancellation For Forensic Audio Philip Harrison and Dr Boaz Rafaely (supervisor) Institute of Sound and Vibration Research (ISVR) University of Southampton,

More information

ONE of the most common and robust beamforming algorithms

ONE of the most common and robust beamforming algorithms TECHNICAL NOTE 1 Beamforming algorithms - beamformers Jørgen Grythe, Norsonic AS, Oslo, Norway Abstract Beamforming is the name given to a wide variety of array processing algorithms that focus or steer

More information

Adaptive Systems Homework Assignment 3

Adaptive Systems Homework Assignment 3 Signal Processing and Speech Communication Lab Graz University of Technology Adaptive Systems Homework Assignment 3 The analytical part of your homework (your calculation sheets) as well as the MATLAB

More information

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21) Ambiguity Function Computation Using Over-Sampled DFT Filter Banks ENNETH P. BENTZ The Aerospace Corporation 5049 Conference Center Dr. Chantilly, VA, USA 90245-469 Abstract: - This paper will demonstrate

More information

Performance Analysis of Acoustic Echo Cancellation in Sound Processing

Performance Analysis of Acoustic Echo Cancellation in Sound Processing 2016 IJSRSET Volume 2 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Performance Analysis of Acoustic Echo Cancellation in Sound Processing N. Sakthi

More information

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION Aviva Atkins, Yuval Ben-Hur, Israel Cohen Department of Electrical Engineering Technion - Israel Institute of Technology Technion City, Haifa

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

Chapter 2 Channel Equalization

Chapter 2 Channel Equalization Chapter 2 Channel Equalization 2.1 Introduction In wireless communication systems signal experiences distortion due to fading [17]. As signal propagates, it follows multiple paths between transmitter and

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Some Notes on Beamforming.

Some Notes on Beamforming. The Medicina IRA-SKA Engineering Group Some Notes on Beamforming. S. Montebugnoli, G. Bianchi, A. Cattani, F. Ghelfi, A. Maccaferri, F. Perini. IRA N. 353/04 1) Introduction: consideration on beamforming

More information

FOURIER analysis is a well-known method for nonparametric

FOURIER analysis is a well-known method for nonparametric 386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Effect of Fading Correlation on the Performance of Spatial Multiplexed MIMO systems with circular antennas M. A. Mangoud Department of Electrical and Electronics Engineering, University of Bahrain P. O.

More information

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco Research Journal of Applied Sciences, Engineering and Technology 8(9): 1132-1138, 2014 DOI:10.19026/raset.8.1077 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

System Identification and CDMA Communication

System Identification and CDMA Communication System Identification and CDMA Communication A (partial) sample report by Nathan A. Goodman Abstract This (sample) report describes theory and simulations associated with a class project on system identification

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM Sandip A. Zade 1, Prof. Sameena Zafar 2 1 Mtech student,department of EC Engg., Patel college of Science and Technology Bhopal(India)

More information

Subband Analysis of Time Delay Estimation in STFT Domain

Subband Analysis of Time Delay Estimation in STFT Domain PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

Digital Loudspeaker Arrays driven by 1-bit signals

Digital Loudspeaker Arrays driven by 1-bit signals Digital Loudspeaer Arrays driven by 1-bit signals Nicolas Alexander Tatlas and John Mourjopoulos Audiogroup, Electrical Engineering and Computer Engineering Department, University of Patras, Patras, 265

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band

Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band 4.1. Introduction The demands for wireless mobile communication are increasing rapidly, and they have become an indispensable part

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI

More information

Implementation of decentralized active control of power transformer noise

Implementation of decentralized active control of power transformer noise Implementation of decentralized active control of power transformer noise P. Micheau, E. Leboucher, A. Berry G.A.U.S., Université de Sherbrooke, 25 boulevard de l Université,J1K 2R1, Québec, Canada Philippe.micheau@gme.usherb.ca

More information

Electronically Steerable planer Phased Array Antenna

Electronically Steerable planer Phased Array Antenna Electronically Steerable planer Phased Array Antenna Amandeep Kaur Department of Electronics and Communication Technology, Guru Nanak Dev University, Amritsar, India Abstract- A planar phased-array antenna

More information

An Introduction to Electronic Beam Steering

An Introduction to Electronic Beam Steering An Introduction to Electronic Beam Steering The line array s introduction to the professional audio market in the 90s signaled a revolution for both live concert applications and installations. With a

More information

Ultrasound Bioinstrumentation. Topic 2 (lecture 3) Beamforming

Ultrasound Bioinstrumentation. Topic 2 (lecture 3) Beamforming Ultrasound Bioinstrumentation Topic 2 (lecture 3) Beamforming Angular Spectrum 2D Fourier transform of aperture Angular spectrum Propagation of Angular Spectrum Propagation as a Linear Spatial Filter Free

More information

Time Delay Estimation: Applications and Algorithms

Time Delay Estimation: Applications and Algorithms Time Delay Estimation: Applications and Algorithms Hing Cheung So http://www.ee.cityu.edu.hk/~hcso Department of Electronic Engineering City University of Hong Kong H. C. So Page 1 Outline Introduction

More information

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016 Measurement and Visualization of Room Impulse Responses with Spherical Microphone Arrays (Messung und Visualisierung von Raumimpulsantworten mit kugelförmigen Mikrofonarrays) Michael Kerscher 1, Benjamin

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

9.4 Temporal Channel Models

9.4 Temporal Channel Models ECEn 665: Antennas and Propagation for Wireless Communications 127 9.4 Temporal Channel Models The Rayleigh and Ricean fading models provide a statistical model for the variation of the power received

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

1 Publishable summary

1 Publishable summary 1 Publishable summary 1.1 Introduction The DIRHA (Distant-speech Interaction for Robust Home Applications) project was launched as STREP project FP7-288121 in the Commission s Seventh Framework Programme

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

ULTRASONIC SIGNAL PROCESSING TOOLBOX User Manual v1.0

ULTRASONIC SIGNAL PROCESSING TOOLBOX User Manual v1.0 ULTRASONIC SIGNAL PROCESSING TOOLBOX User Manual v1.0 Acknowledgment The authors would like to acknowledge the financial support of European Commission within the project FIKS-CT-2000-00065 copyright Lars

More information

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array 2012 2nd International Conference on Computer Design and Engineering (ICCDE 2012) IPCSIT vol. 49 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V49.14 Simultaneous Recognition of Speech

More information

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR BeBeC-2016-S9 BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR Clemens Nau Daimler AG Béla-Barényi-Straße 1, 71063 Sindelfingen, Germany ABSTRACT Physically the conventional beamforming method

More information

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,

More information

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

Analysis of LMS and NLMS Adaptive Beamforming Algorithms Analysis of LMS and NLMS Adaptive Beamforming Algorithms PG Student.Minal. A. Nemade Dept. of Electronics Engg. Asst. Professor D. G. Ganage Dept. of E&TC Engg. Professor & Head M. B. Mali Dept. of E&TC

More information

Digital Signal Processing of Speech for the Hearing Impaired

Digital Signal Processing of Speech for the Hearing Impaired Digital Signal Processing of Speech for the Hearing Impaired N. Magotra, F. Livingston, S. Savadatti, S. Kamath Texas Instruments Incorporated 12203 Southwest Freeway Stafford TX 77477 Abstract This paper

More information

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication FREDRIC LINDSTRÖM 1, MATTIAS DAHL, INGVAR CLAESSON Department of Signal Processing Blekinge Institute of Technology

More information

Speech Enhancement Using Microphone Arrays

Speech Enhancement Using Microphone Arrays Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Speech Enhancement Using Microphone Arrays International Audio Laboratories Erlangen Prof. Dr. ir. Emanuël A. P. Habets Friedrich-Alexander

More information

Time-of-arrival estimation for blind beamforming

Time-of-arrival estimation for blind beamforming Time-of-arrival estimation for blind beamforming Pasi Pertilä, pasi.pertila (at) tut.fi www.cs.tut.fi/~pertila/ Aki Tinakari, aki.tinakari (at) tut.fi Tampere University of Technology Tampere, Finland

More information

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Microphone Array Feedback Suppression. for Indoor Room Acoustics Microphone Array Feedback Suppression for Indoor Room Acoustics by Tanmay Prakash Advisor: Dr. Jeffrey Krolik Department of Electrical and Computer Engineering Duke University 1 Abstract The objective

More information

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction S.B. Nielsen a and A. Celestinos b a Aalborg University, Fredrik Bajers Vej 7 B, 9220 Aalborg Ø, Denmark

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

Chapter - 1 PART - A GENERAL INTRODUCTION

Chapter - 1 PART - A GENERAL INTRODUCTION Chapter - 1 PART - A GENERAL INTRODUCTION This chapter highlights the literature survey on the topic of resynthesis of array antennas stating the objective of the thesis and giving a brief idea on how

More information

Reducing comb filtering on different musical instruments using time delay estimation

Reducing comb filtering on different musical instruments using time delay estimation Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

2. The use of beam steering speakers in a Public Address system

2. The use of beam steering speakers in a Public Address system 2. The use of beam steering speakers in a Public Address system According to Meyer Sound (2002) "Manipulating the magnitude and phase of every loudspeaker in an array of loudspeakers is commonly referred

More information

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation Gal Reuven Under supervision of Sharon Gannot 1 and Israel Cohen 2 1 School of Engineering, Bar-Ilan University,

More information

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Udo Klein, Member, IEEE, and TrInh Qu6c VO School of Electrical Engineering, International University,

More information

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003 CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

A Spatial Mean and Median Filter For Noise Removal in Digital Images

A Spatial Mean and Median Filter For Noise Removal in Digital Images A Spatial Mean and Median Filter For Noise Removal in Digital Images N.Rajesh Kumar 1, J.Uday Kumar 2 Associate Professor, Dept. of ECE, Jaya Prakash Narayan College of Engineering, Mahabubnagar, Telangana,

More information

Abstract. Marío A. Bedoya-Martinez. He joined Fujitsu Europe Telecom R&D Centre (UK), where he has been working on R&D of Second-and

Abstract. Marío A. Bedoya-Martinez. He joined Fujitsu Europe Telecom R&D Centre (UK), where he has been working on R&D of Second-and Abstract The adaptive antenna array is one of the advanced techniques which could be implemented in the IMT-2 mobile telecommunications systems to achieve high system capacity. In this paper, an integrated

More information

Performance Evaluation of STBC-OFDM System for Wireless Communication

Performance Evaluation of STBC-OFDM System for Wireless Communication Performance Evaluation of STBC-OFDM System for Wireless Communication Apeksha Deshmukh, Prof. Dr. M. D. Kokate Department of E&TC, K.K.W.I.E.R. College, Nasik, apeksha19may@gmail.com Abstract In this paper

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH).

K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH). Smart Antenna K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH). ABSTRACT:- One of the most rapidly developing areas of communications is Smart Antenna systems. This paper

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing Fourth Edition John G. Proakis Department of Electrical and Computer Engineering Northeastern University Boston, Massachusetts Dimitris G. Manolakis MIT Lincoln Laboratory Lexington,

More information

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Localization of underwater moving sound source based on time delay estimation using hydrophone array Journal of Physics: Conference Series PAPER OPEN ACCESS Localization of underwater moving sound source based on time delay estimation using hydrophone array To cite this article: S. A. Rahman et al 2016

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information