IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY

Size: px
Start display at page:

Download "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY"

Transcription

1 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY Spotforming: Spatial Filtering With Distributed Arrays for Position-Selective Sound Acquisition Maja Taseska, Student Member, IEEE, and Emanuël A. P. Habets, Senior Member, IEEE Abstract Hands-free capture of speech often requires extraction of sources from a certain spot of interest (SOI), while reducing interferers and background noise. Although state-of-theart spatial filters are fully data-dependent and computed using the power spectral density (PSD) matrices of the desired and the undesired signals, the existing solutions to extract sources from a SOI are only partially data-dependent. Estimating the timevarying PSD matrices from the data is a challenging problem, especially in dynamic and quickly time-varying acoustic scenes. Hence, the spot signal statistics are often pre-computed based on a near-field propagation model, resulting in suboptimal filters. In this work, we propose a fully data-dependent spatial filtering framework for extraction of speech signals that originate from a SOI. To achieve position-based spatial selectivity, distributed arrays are used, which offer larger spatial diversity compared to arrays of closely spaced microphones. The PSD matrices of the desired and the undesired signals are updated at each time-frequency bin by using a minimum Bayes risk detector that is based on a probabilistic model of narrowband position estimates. The proposed framework is applicable in challenging multitalk situations, without requiring any prior information, except the geometry, location, and orientation of the arrays. Index Terms Source extraction, distributed arrays, spatial filtering, PSD matrix estimation, signal detection. I. INTRODUCTION I N HANDS-FREE applications involving human-to-human or human-to-machine interaction, the desired speech signal is often contaminated by background noise and interferers. Therefore, to ensure high quality speech acquisition, enhancement of the desired speech is necessary. The objectives of multi-microphone speech enhancement systems can be coarsely classified into one of the following categories: i) extraction of a subset of sources from a mixture [1], ii) source separation where a separate filter is computed to extract each source [2] [4], and iii) extraction of sources that originate from a user-defined SOI [5] [10]. In this work, we focus on the last problem, hereafter referred to as acoustic spotforming, to emphasize that in contrast to traditional beamforming which extracts sources from desired directions [11], [12], spotforming extracts sources from a desired SOI. Directional signals originating from the SOI will be referred to as spot signals, while the background Manuscript received August 28, 2015; revised January 17, 2016 and March 01, 2016; accepted March 01, Date of publication March 10, 2016; date of current version May 09, The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Richard Christian Hendriks. The authors are with the International Audio Laboratories Erlangen, University of Erlangen-Nuremberg and Fraunhofer IIS, Erlangen 91058, Germany ( maja.taseska@audiolabs-erlangen.de). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TASLP noise and directional signals from outside the SOI represent undesired signals. The term spotforming has been used earlier in ultra-wideband (UWB) array processing to emphasize that UWB waveforms can focus on spots, as opposed to directions in narrowband processing [13]. Although not referred to as acoustic spotformers, beamformers which can achieve this task are known as softconstrained, space-constrained, orregion-based beamformers [7], [8], [10]. Alternatively, spotforming can be achieved using ideas from robust adaptive beamforming. The source location uncertainty region used in the design of robust beamformers, can be interpreted as a SOI in the spotforming context. For instance, a spotformer can be realized by a linearly constrained minimum variance (LCMV) robust beamformer with eigenvector constraints that impose low distortion across the SOI [5], [9]. The approaches in [5], [7] [10] are based on a near-field model, where the spot signal statistics are estimated by integrating the near-field steering vectors across the SOI. The statistics are then used to compute the maximum signal-to-noise ratio (maxsnr) filter [8], the minimum mean squared error (MMSE) (Wiener) filter [7], [10], or to design eigenvector constraints [5], [9]. However, as the statistics are data-independent, the resulting filters are sub-optimal. To include the room acoustics, the near-field propagation model can be substituted by measured (or estimated) acoustic transfer functions (ATFs), which have been shown to improve the speech quality and the spatial selectivity of the beamformers [14], [15]. Sound extraction from a volume using a filter-andsum beamformer has been proposed in [15], where the filters are matched to the ATFs of multiple distributed microphones. This approach is data-independent (and hence sub-optimal), requires a large number of microphones to achieve good performance, and requires knowledge of the ATFs. Data-dependent approach based on estimated ATFs has been proposed in [14]. ATF characteristics in a given SOI are first extracted and then used during processing to identify and track the ATFs. This framework is applicable only for a small-scale movements within the SOI, provided that a-priori knowledge of the desired signal subspace is available. In all of the above mentioned approaches [5], [7] [10], [14], [15], scenarios with non-stationary and moving interferers, as often encountered in practice, are not considered. The estimation of time-varying statistics is often done by using fullband voice activity detectors (VADs) [10], [14] that identify periods when the desired signal is active. Such VADs work well if the undesired signal is relatively stationary compared to the desired signal, which is not the case for speech interferers IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 1292 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 A fully data-dependent spotformer applicable to highly dynamic scenarios was recently proposed by the present authors in [6]. In contrast to the state-of-the-art discussed previously, the data-dependent approach does not assume a propagation model and stationarity of the undesired signals. The PSD matrices of both the spot signal and the undesired signal are estimated online and used to compute the time-varying spotformer coefficients. The current paper is based on the ideas developed in [6], with the following additional contributions: (i) a more complete discussion of the optimal design and the computation of the spotformer using estimated PSD matrices, (ii) formulation of a minimum Bayes risk signal detector used to estimate the PSD matrices, and (iii) evaluation in different dynamic scenarios with measured and simulated data. Assuming relatively small spot sizes, a crucial observation underlying our work is the fact that due to the speech sparsity in the short-time Fourier transform (STFT) domain [16] and the online estimation of the PSD matrices, the spot signal PSD matrix at each time-frequency (TF) bin can be approximated by a rank-one matrix, even if there are multiple sources in the SOI. Therefore, the spot signal can be extracted by a minimum variance distortionless response (MVDR) filter with a time-varying constraint. In contrast to the state-of-the-art LCMV filters with multiple eigenvector constraints, the MVDR filter has a single constraint and offers more degrees of freedom to reduce undesired signals. While different formulations of the MVDR filter given the PSD matrices are studied in literature [17], the application of an MVDR filter with a time-varying constraint to solve the spotforming problem in dynamic scenarios is a novel contribution of this work. The time-varying PSD matrices need to be estimated from the microphone signals. In recent research, spatial cues such as phase differences, direction of arrival (DOA) estimates, or position estimates have been used to to detect the dominant source and update its PSD matrix [2] [4], [18], [19]. We use this idea to estimate the PSD matrices in the context of spotforming [6]. At each TF bin, a position estimate is obtained by triangulation of the DOAs at the distributed arrays. By defining an appropriate model for the distribution of the position estimates, the probability that the spot signal is dominant, referred to as the spot probability, can be evaluated and used to classify each TF bin to the spot signal or the undesired signal. The classification can be done using a minimum Bayes risk rule, similarly to [20], where the goal was to detect a source using DOA estimates. While different ideas from our previous work are used to develop the spot signal detection system detailed in Section IV, the underlying probabilistic models are unique to the current work on spotforming. It is important to note that although distributed arrays are required to obtain narrowband position estimates for the spot signal detector, the spotformer can be computed using an arbitrary subset of arrays or microphones. We will experimentally show that due to the spatial diversity of distributed arrays, multi-array spotforming improves the spatial selectivity compared to single-array spotforming, at the cost of a larger spot signal distortion. Determining the optimal subset of microphones is outside the scope of this paper. Furthermore, we assume that all signals are synchronized and available at a centralized processor. For details on signal synchronization the reader is referred to [21] and references therein. The rest of the paper is organized as follows: in Section II, the signal model is described. In Section III, the state-of-the-art and the proposed spotforming methods are discussed. In Section IV, the position-based minimum Bayes risk detector required for the PSD matrix estimation is proposed. A comprehensive performance evaluation is presented in Section V, and Section VI concludes the paper. II. PROBLEM FORMULATION Consider a setup of at least two distributed arrays, with at least two microphones each, where the total number of microphones is M. LetS denote a SOI, where signals originating from S are desired, while background noise and signals from outside S are undesired. Assuming point sources, the signal at the m-th microphone at time t is given by y m (t) =x m (t)+u m (t)+v m (t) (1) = h r,m (t) s r (t) dr + u m (t)+v m (t), r S where denotes the convolution operator, h r,m is the room impulse response (RIR) between position r and the m-thmicrophone, s r is the signal from a source at position r S, u m is the sum of all signals from outside S, and v m is the sum of background and sensor noise. If there is no source at position r, then s r (t) =0. For sufficiently long STFT frames, the multiplicative transfer function approximation [22] holds and y m (t) is given in the STFT domain as follows Y m (n, k) =X m (n, k)+u m (n, k)+v m (n, k) (2) = H r,m (k) S r (n, k) dr + U m (n, k)+v m (n, k), r S where capital letters denote the TF domain signals of the respective time domain counterparts. In the rest of the paper, all processing is performed in the TF domain, and lower case bold letters are used to denote vectors in the TF domain. Given the ATFs H r,m and assuming H r,1 0, the relative transfer function (RTF) vector with respect to the first microphone, which describes the coupling between the microphones as a response to a source located at r, is defined as g r (k) =[1,H r,2 (k)/h r,1 (k),...,h r,m (k)/h r,1 (k)]. (3) Note that the first microphone is chosen as a reference without loss of generality. Stacking all microphone signals as a vector y, the signal model in vector notation is given by y(n, k) = g r (k) X r,1 (n, k) dr + u(n, k)+v(n, k), r S where X r,1 denotes the signal from a source located at position r, as captured at the first microphone. The PSD matrix of the microphone signals is given by Φ y (n, k) = E [ y(n, k) y H (n, k) ]. If the different signals are modeled as (4)

3 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1293 mutually uncorrelated random processes, the following holds Φ y (n, k) =Φ x (n, k)+φ u (n, k)+φ v (n, k), where (5) Φ x (n, k) = φ r,1 (n, k) g r (k) gr H (k) dr, (6) r S and φ r,1 =E [ X r,1 2] is the PSD of X r,1. In addition, we denote by Φ u+v and Φ x+v the PSD matrices of u + v and x + v, respectively. As the processes are mutually uncorrelated, it holds Φ u+v = Φ u + Φ v and Φ x+v = Φ x + Φ v. The objective in this work is to compute an estimate of the signal X 1 = r S X r,1(n, k) dr, which represents the sum of all signals that originate from S, as captured at the first microphone. At each TF bin, X 1 is to be estimated by linearly combining the signals of all microphones, using a data-dependent, time-varying spotformer w as follows X 1 (n, k) =w H (n, k) y(n, k). (7) The spotformer should be able to extract the spot signal with low distortion while sufficiently reducing the undesired signals in non-stationary scenarios where the speech sources move, new sources appear, or existing sources disappear. III. ACOUSTIC SPOTFORMING In this part, we first review two state-of-the-art spotformers in Section III-A and discuss their limitations. The proposed data-dependent spotformer is described in Section III-B. In Section III-C, the recursive time averaging approach for PSD matrix estimation is provided for completeness. A. State-of-the-art Approaches to Spotforming As mentioned in the introduction, spotforming can be realized using ideas from robust adaptive beamforming. Inspired by [5], [9], we consider the robust LCMV filter with eigenvector constraints. The low distortion requirement across S is expressed by the M S constraint matrix G S (S M)as G H S (k) w(n, k) =1 S 1, (8) where the columns of G S are the near-field steering vectors for S sampled positions from the SOI. If the ATF or RTF vectors are known for each position, they can substitute the near-field steering vectors to take the room acoustics into account. Note that although in [5], [9] the near-field design has been used, we will proceed by using the RTFs in order avoid violations of near-field model and to have a fair comparison to the proposed data-dependent spotformer. Eigenvector constraints are computed by substituting G S in the overdetermined system (8) by its rank-r approximation, where r<m. The singular value decomposition (SVD) and the rank-r approximation of G are given by G S = U Σ V H and G S,r = U r Σ r V H r, (9) where U r and V r contain the first r columns of U and V, corresponding to the r largest singular values, and Σ r is a r r diagonal matrix containing these singular values. Using the M r matrix G S,r, the new constraint is given by V r Σ r U H r ŵ = 1 r 1, (10) which can be rearranged a similar form as (8), namely, U H r ŵ = Σ 1 r Vr H 1 r 1, (11) Finally, the LCMV filter with eigenvector constraints is obtained by solving the following optimization problem arg min w H Φ u+v w, subject to 11. (12) w Denoting the r 1 constraint vector on the right-hand side in (11) by c, the LCMV filter is given by w opt = Φ 1 u+v U r (U H r Φ 1 u+v U r) 1 c. (13) The rank r required to ensure that the distortion across the SOI is lower than a given threshold can be determined from the eigenstructure of the matrix G S [23]. An alternative way to realize a spotformer based on existing techniques is to use the ATF vector for the centroid of S as a constraint in an MVDR beamformer. This design was inspired by [15], where sounds from a SOI were extracted by a matched filter. The authors in [15] argue that due to the correlation between the ATFs of neighboring positions, a matched filter extracts sounds from a wider region. However, note that in this case, there is no explicit control of the size of S. To ensure fair comparison to our proposed spotformer, we will use the RTF vector instead of the ATF vector in the implementation. The existing spotformers do not take the statistics of the spot signal into account. The constraints that ensure low distortion are computed using only prior information about the SOI and the propagation vectors. Furthermore, the undesired signal PSD matrix Φ u+v is usually estimated using full-band VADs, which is applicable if Φ u+v varies slowly compared to Φ x and if there are periods where the spot signal is absent. B. Proposed Data-Dependent Approach to Spotforming There are two key observations which underlie the datadependent spotformer: (1) SOIs of relatively small sizes with only a few sources are common in most applications and (2) the spot PSD matrix at each TF bin has a low rank due to the speech sparsity in the STFT domain [16]. Sparsity implies that at each TF bin the energy of only one source is dominant, and hence, Φ x (n, k) can be approximated by a rank-one matrix as Φ x (n, k) φ x1 (n, k) g(n, k) g H (n, k), (14) where φ x1 is the spot signal PSD at the first microphone and g represents the RTF vector between the position of the dominant source and the microphones. Therefore, the spot signal at the first microphone can be extracted with an MVDR filter [24] obtained as the solution to arg min w H Φ u+v (n, k) w, subject tow H g(n, k) =1. (15) w

4 1294 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 Solving (15) using Lagrangian multipliers [17], we obtain Φ 1 u+v (n, k) g(n, k) w opt (n, k) = g H (n, k) Φ 1 (16) u+v (n, k) g(n, k). The realization of the data-dependent MVDR spotformer involves two main tasks (1) finding a rank-one approximation of Φ x (n, k) that provides an optimal (in a sense that will be discussed next) constraint vector g(n, k) at each TF bin, and (2) estimating the PSD matrices Φ x (n, k) and Φ u+v (n, k). Under the single dominant source assumption, the first task is equivalent to RTF estimation [1], [25], [26] and state-of-theart methods will be described in the remaining of this section. The second task is extremely challenging in scenarios with nonstationary, moving and appearing/disappearing sources and will be detailed in Section IV. 1) MMSE-Based Rank-One Approximation: Under the rank-one approximation, for all realizations of the random process x(n, k) the following relation holds between the spot signal at the reference and the spot signal across all microphones x(n, k) g(n, k)x 1 (n, k). (17) The optimal g(n, k) in the MMSE sense is obtained by solving arg min E [ (x gx 1 ) H (x gx 1 ) ]. (18) g If e 1 =[1, 0, 0] is an M 1 vector, the solution to (18) is g(n, k) = Φ x(n, k) e 1 e H 1 Φ x(n, k) e 1. (19) Hence, the MMSE-optimal g(n, k) is given by the first column of Φ x, normalized by the PSD φ x1. An estimate of Φ x required to evaluate (19) can be obtained as Φ x = Φ x+v Φ v, where Φ x+v and Φ v are estimated by recursive temporal averaging (see Section III-C). In practice, due to estimation errors, Φ x might not be positive semi-definite at some TF bins. To avoid erroneous look direction of the spotformer, the constraint g(n, k) is not updated at such TF bins and g(n 1,k) from the previous time frame is used. Under the single dominant source assumption, the expression (19) is also known as the covariance subtraction-based RTF estimator [26]. Although based on a single-source model, it was experimentally shown [27] that multiple sources can be extracted with reasonably low distortion, which corroborates its applicability in the spotforming context where multiple sources in the spot might be present. 2) Least-Squares-Based Rank-One Approximation: Adifferent way to approximate Φ x by a rank-one matrix is by minimizing the Frobenius norm of the matrix difference arg min Φ x φ x1 g g H F (20) g According to the matrix approximation lemma [28], the optimal solution for g is the principal eigenvector of Φ x.note that due to the presence of background noise in the microphone signals, only an estimate of Φ x+v rather than Φ x can be obtained (see Section III-C). Assuming that an estimate of Φ v is available, g is obtained by first computing the principal eigenvector v max of the whitened PSD matrix Φ Φ v x+v, 1 and performing de-whitening, i.e. g Φ v v max. The scaling is determined by definition, as the first element of g is equal 1 to 1. To avoid the explicit inversion in Φ Φ v x+v, the vector v max can be computed from the generalized eigenvalue decomposition (GEVD) of of the matrix pencil (Φ x+v, Φ v ) [28]. Under the single dominant source assumption, this approach of computing g is known as the covariance whitening-based RTF estimator [26], [29] and it has been shown to outperform the covariance subtraction in low signal-to-noise ratio (SNR) conditions [26]. Note that the complexity of performing GEVD at each TF bin can be reduced by employing an adaptive estimation of the principal eigenvector [30]. Similarly as the covariance subtraction-based estimator, the covariance whitening can also be applied to extract multiple sources due to the speech sparsity in the STFT domain. 3) Projection-Based Rank-One Approximation: When there are multiple sources in the SOI and the rank of Φ x increases, distortion of the spot signal is unavoidable when using an MVDR spotformer. In order to improve the performance of the MVDR filter in such scenarios, we proposed an RTF estimator in [31] that does not explicitly use a rank-one model for Φ x. The RTF of the dominant source at each TF bin is computed using the instantaneous signal y and an estimate of the higher-dimensional signal subspace. This approach can also be applied to obtain g in the spotformer, and we briefly review it here for completeness. The performance of this method is compared to the explicit rank-one model-based ones in the Experiment 6 of Section V-C. Consider the GEVD of the matrix pencil (Φ x+v, Φ v ) for a two-source scenario (rank-two model) (φ r1 g r1 g H r 1 + φ r2 g r2 g H r 2 + Φ v ) v = λ Φ v v. (21) Equation (21) can be rearranged as follows c 1 g r1 + c 2 g r2 =(λ 1) Φ v v, where c 1 =(φ r1 gr H 1 u) 1,c 2 =(φ r2 gr H 1 v) 1. (22) The generalized eigenvectors v provide two linear combinations of the RTF vectors and hence a basis for the signal subspace. As discussed in [31], due to the speech sparsity, two eigenvectors per frequency bin suffice to approximate the signal subspace for up to four concurrent sources. A basis U x for the subspace can be computed by orthonormalization of the two largest eigenvectors of ( Φ x+v, Φ v ). Let us denote a projection matrix onto the signal subspace by P x = U x U H x. The key idea of the RTF estimator in [31] is to enforce the instantaneous RTF estimate g inst (n, k) y(n, k)y1 (n, k) g inst (n, k) = Y 1 (n, k) 2, (23) to lie in the estimated signal subspace, by performing the following subspace projection at each TF bin g proj (n, k) = P x(n, k) g inst (n, k) e H 1 P x(n, k) g inst (n, k), (24)

5 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1295 where the denominator normalizes the first element to one. The vector g inst captures the spatial information of the dominant source, whereas the projection denoises g inst by confining it onto the signal subspace. Denoting the output of a binary signal detector by H x, the final RTF estimate to be used as a timevarying constraint in the spotformer (16) is obtained as g(n, k) =H x (n, k) g proj (n, k)+[1 H x (n, k)] g(n 1,k). (25) Hence, when the spot signal is dominant (H x =1)the g is obtained by (24), whereas when the spot signal is absent, g(n 1,k) from the previous frame is used. Note that although the general framework for the projection-based RTF estimator has been developed in [31], the computation of the detector H x required to apply it in the spotforming context, is specific to this work and proposed in Section IV. C. Estimating the PSD Matrices In practice, the PSD matrices Φ u+v and Φ x need to be estimated from the microphone signals. This is commonly done by recursive averaging, where an estimate at the current time frame is obtained using the PSD matrix estimate from the previous frame and the current signal y(n, k) in the following manner (the frequency index was omitted for brevity) Φ u+v (n) = α uv (n) Φ u+v (n 1) + [1 α uv (n)] y(n) y H (n) Φ x+v (n) = α x (n) Φ x+v (n 1) + [1 α x (n)] y(n) y H (n) Φ v (n) = α v (n) Φ v (n 1) + [1 α v (n)] y(n) y H (n). (26) Since the background noise is always present, the recursive averaging yields an estimate Φ x+v rather than Φ x. The averaging parameters α uv, α x, and α v should allow for quick adaptation of the PSD matrices in case of changes in the acoustic scene such as moving or emerging sources. Moreover, to avoid leakage of the undesired signal into the spot signal s PSD matrix and vice versa, α uv, α x, and α v should ensure that only the PSD matrix of the dominant signal is updated, requiring a spot signal detection mechanism at each TF bin. IV. BIN-WISE SIGNAL DETECTION For the purpose of signal detection, we can relax the sparsity assumption and only assume that there exist TF bins where either sources from S or sources outside S and/or background noise are dominant. We define the following hypotheses H v : speech is absent, i.e y v, H x : speech from S is dominant, i.e y x + v, H u : speech outside S is dominant, i.e y u + v, H xu = H x H u : speech is present H uv = H u H v : undesired signal is dominant. (27a) (27b) (27c) (27d) (27e) As illustrated in Figure 1, the probability of hypothesis H xu represents a speech presence probability (SPP) denoted by p xu. Given that speech is present, meaning that H xu is true (upper Fig. 1. Graphical model illustrating the bin-wise hypotheses and the associated probabilities. branch of the figure), the probability that the speech originates from the SOI (hypothesis H x )isp x = p(h x H xu ) and will be referred to as the conditional spot probability. The framework for estimating the matrices Φ x, Φ u+v and Φ v consists of building probabilistic models, computing the probabilities in Figure 1, and using the probabilities to control the parameters α x, α uv and α v in (26). The estimation of the SPP and Φ v are well studied and the state-of-the-art approach used in this work is reviewed in Section IV-A. However, the hierarchical model in Figure 1, and the computation of the conditional spot probability are proposed within our spotforming framework and will be discussed in Section IV-B. A. Speech Presence Probability and Estimation of Φ v In SPP-based noise PSD matrix estimation (see [32] and references therein), given the a-posteriori SPP p xu = p(h xu y), the parameter α v (n, k) used to recursively estimate Φ v (n, k) in (26), is computed as α v (n, k) =1 [1 p xu (n, k)] (1 α v ), α v (0, 1). (28) At TF bins with high SPP p xu 1, it follows α v 1 and hence, the current signals y(n, k) have little influence on the updated noise PSD matrix Φ v, i.e Φ v (n, k) Φ v (n 1,k). At TF bins with low SPP, the signals y(n, k) have a weight factor α v α v in the recursion (26), hence updating the noise PSD matrix. Appropriate values of α v are given in Section V. To compute the SPP, a Gaussian signal model for the STFT coefficients is used [33], where the data likelihoods under speech absence and speech presence, are given by 1 p(y H v )= H π M det[φ v ] e y Φ 1 v y, 1 p(y H xu )= H π M det[φ y ] e y Φ 1 y y. (29) The SPP, can be obtained by applying the Bayes theorem p xu = p(y H xu ) q p(y H xu ) q + p(y H v ) (1 q), (30) where q denotes the a-priori SPP. Note that at TF bin (n, k), before the likelihoods in (29) and the SPP are computed, only an estimate of Φ v (n 1) from the previous frame is available. As the noise is assumed to be relatively stationary, the SPP and noise PSD matrix estimation loop can be implemented by using in (29) the PSD matrix estimate Φ v (n 1) from the previous time frame.

6 1296 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 Although the a-priori SPP q can be computed in advance [34], changes in the noise PSD tend to be erroneously detected as speech onsets, unless a signal-dependent q is used. To this end, we proposed a direct-to-diffuse ratio (DDR)-based a-priori SPP estimator in [32]. The DDR Γ(n, k) was estimated at each array based on the coherence between two microphones [35], however any DDR estimator suitable for the array geometry can be used. Based on the observation that low values of ˆΓ(n, k) indicate the absence of coherent speech, whereas high values of ˆΓ(n, k) indicate its presence, we proposed in [32] to compute the a-priori SPP as q =1 f[ˆγ], where f[ˆγ] is a sigmoid-like mapping defined as ] 10 f [ˆΓ(n, cρ/10 k) = l min +(l max l min ) 10 cρ/10 +Γ(n, k). ρ (31) The parameters l min and l max determine the minimum and maximum values that the function can attain, c (in db) controls the offset along the Γ axis, and ρ defines the steepness of transition region between speech and non-speech detection. The choice of appropriate parameters is discussed in Section V. B. Spot Probability and Estimation of Φ x+v and Φ u+v When the undesired signals contain speech, an implementation of the spot probability and undesired signal PSD matrix estimation loop similar to the one used for SPP estimation does not provide reliable results due to the speech non-stationarity. We have shown in [6], that a likelihood model for the bin-wise position estimates is suitable for computing the spot probability. In the current contribution, we provide a more detailed discussion on the choice of a likelihood model and its estimation from training data. We subsequently formulate a minimum Bayes risk spot signal detector and based on the detector s decision update either Φ x+v or Φ u+v. 1) Position-Based Minimum Bayes Risk Detector: Let ˆr(n, k) denote a position estimate obtained by triangulating narrowband DOA estimates from two distributed arrays. To estimate the narrowband DOAs we used the method proposed in [36]. If multiple arrays are available, we choose for triangulation the two arrays with largest instantaneous signal amplitude at the given TF bin. If the accuracy of the DOA estimates at the different arrays can be quantified, the arrays with most accurate DOA estimates can be chosen. Let d 1, d 2 denote the locations of the two chosen arrays and e d1, e d2 denote the corresponding DOA unit vectors. The position estimate ˆr is found as the intersection of the lines defined by the array centers and the DOA vectors, by first solving d 1 + e d1 ξ 1 = d 2 + e d2 ξ 2 (32) for ξ 1 and ξ 2, and substituting to find ˆr in either d 1 + e d1 ξ 1 = ˆr, or d 2 + e d2 ξ 2 = ˆr. (33) Given a position estimate ˆr, an optimal minimum Bayes risk detector for a false alarm cost C du > 0 (deciding that the desired signal is dominant, when the undesired is dominant) and a miss cost C ud > 0 (deciding that the undesired signal is dominant when the desired is dominant) is given by the following decision rule [37] p(h x ˆr) decide H x =1, H uv =0if 1 p(h x ˆr) > C du, C ud decide H x =0, H uv =1otherwise. (34) The averaging parameters α x and α uv are then computed as α x =1 H x (1 α x ), α u =1 H uv (1 α uv ). (35) When the spot signal is absent, i.e. H x =0, then α x =1and Φ x+v is not updated. The values of the constants α x and α uv are given in Section V. To obtain the detector s decision, the spot probability p(h x ˆr) is required, which can be decomposed as p[h x ˆr] =p[h x, H xu ˆr] =p[h x H xu, ˆr] p[h xu ]. (36) Recalling the graphical model in Figure 1, we recognize p[h x H xu, ˆr] and p[h xu ] to be the conditional spot probability p x and the SPP p xu, respectively. The SPP can be computed using the framework described in Section IV-A, and the remaining task is to evaluate p[h x H xu, ˆr]. 2) Conditional Spot Probability p[h x H xu, ˆr] : If r denotes a two-dimensional random variable modeling the true source position, the conditional spot probability can be computed by evaluating the probability that r belongs in S. The latter can be computed by integrating the corresponding probability density function (PDF) over S as follows p[h x H xu, ˆr] = f(r H xu, ˆr) dr. (37) r S Let the room be uniformly sampled at N positions r i with i I = {1, 2,...,N} and define a subset I S Iwith cardinality N S such that if i I S then r i S. The integral (37) can then be numerically approximated as follows f(r H xu, ˆr) dr 1 f(r i H xu, ˆr), (38) N S i I S r S Next, each of the N S terms in the sum needs to be evaluated. By applying the Bayes theorem we obtain for each term f(r i H xu, ˆr) =f(ˆr H xu, r i ) f(r i ) f(ˆr H xu ). (39) The denominator f(ˆr H xu ) is obtained by marginalization over the true source position r, namely f(ˆr H xu )= f(ˆr H xu, r) f(r) dr, (40) which can be numerically approximated as follows f(ˆr H xu ) 1 N r f(ˆr H xu, r i ) f(r i ). (41) i I

7 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1297 Finally, combining (38), (39) and (41), the conditional spot probability is computed as p[h x H xu, ˆr] = 1 N S f(ˆr H xu, r i ) f(r i ) 1 i I S f(ˆr H xu, r i ) f(r i ). N i I (42) Hence, to compute the spot probability we only need know the PDFs f(r i ) and f(ˆr H xu, r i ), for all i I. The PDF f(r i ) represents the prior knowledge where speech sources are located in the room. If no information about the scenario and the source locations is provided, f(r i ) is assumed to be uniform, i.e., f(r i )=1/N. In the next section, we discuss the computation of the likelihood PDFs f(ˆr H xu, r i ). 3) Likelihood Models f(ˆr H xu, r i ) : For each position r i in the room, i I, the PDF f(ˆr H xu, r i ) can be estimated in a training stage. To avoid training, in the initial publication [6] we modeled f(ˆr H xu, r i ) by a symmetric Gaussian distribution with mean r i and a fixed variance σ 2 I. In this work, we include a training stage for each position r i. Training was performed only once in a simulated shoebox room with 200 ms reverberation time and low ambient noise level, where the test signal was a 10 seconds of white noise. The white source was placed at r i and the mean and covariance matrix of the Gaussian distribution f(ˆr H xu, r i ) were estimated in the maximum-likelihood sense from the estimated positions ˆr. The process was repeated for each i I. The estimated PDFs were applied in all experiments in Section V, that encompass rooms with reverberation times ms, different noise levels, and different source constellations, and no significant performance loss directly caused by mismatch between train and test conditions was observed. Hence, the experiments indicated that PDFs obtained by training in typical office conditions, generalize well to different source constellations, reverberation and noise levels, provided that the array geometry and the DOA estimators are fixed. The good generalization can be intuitively explained as follows: reverberation and noise affect the variance of f(ˆr H xu, r i ), but not the directions of the principal axes that depend mostly on the array geometry and orientation. As the increase of variance happens for all r i, i I, the detector is only affected by a minor shift in the false alarm and miss rates (which, if necessary, can be adjusted by modifying the Bayes costs). C. Discussion To illustrate the detector operation, a scenario with one source in the spot and one interferer is shown in Figure 2. The dots represent position estimates observed during 5 seconds of multi-talk. The lightest shade denotes positions from TF bins with H x =0, whereas the darker shades denote positions with H x =1. Each shade illustrates results with different Bayes costs: C ud was fixed to 1, and C du was varied (2,4, and 8, lightest to darkest shade). As indicated in Figure 2, increasing the false alarm cost, reduces the region of spot signal detections. The difference between the detector without training and the one with training is visible from the shape of the shaded Fig. 2. Visualization of the detector for different Bayes costs without training (left) and with training (right). Fig. 3. A block diagram of the proposed data-dependent spotforming framework. The shaded blocks are required for the signal detector described in Section IV, whereas the white blocks use the detector to estimate the PSD matrices and the distortionless constraint, as described in Section III. regions. The training takes into account the true variance of the position estimates associated with the source and reduces the false alarm rate compared to the case where no training is performed, especially when an interferer is located near the spot. The false alarms trigger updates of the spot signal PSD matrix when the undesired signal is present, resulting in the spotformer not focusing on the SOI and introducing audible distortion of the spot signal. The false alarm rate can to some extent be controlled by appropriately adjusting the Bayes costs. Nevertheless, in extremely adverse acoustic conditions where the spot signal-to-undesired signal ratio is low (< 0 db), the state-of-the-art fixed spotformers are likely to provide a preferable speech quality, even though the undesired signal reduction ability is limited. Finally, a detailed diagram of the proposed spotformer summarizing all described processing blocks is given in Figure 3. V. PERFORMANCE EVALUATION The spotformer performance was evaluated in different acoustic conditions with both measured and simulated data. In Section V-A, the experimental setups are described, in Section V-B the performance measures are overviewed, and in Section V-C, the experimental results are discussed. Related audio examples are available online at A. Experimental Setup and Overview Measurements were carried out in a room with T ms and dimensions m 3. Three circular arrays with

8 1298 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 B. Objective Performance Measures The performance measures are computed for nonoverlapping segments of length T =30 ms. The final values, shown in the tables and the plots in Section V-C, are computed by averaging the segment-wise values. For a segment i, the input SNR and signal-to-interference ratio (SIR) at the m-th microphone are computed as Fig. 4. Experimental setups for the measurement (left), simulation for moving sources (middle), and simulation with multiple sources in the spot (right). diameter 2.9 cm and three DPA miniature microphones per array were arranged as shown in Figure 4 (left). The RIR between the positions 1-5 and the microphones were measured, where five GENELEC loudspeakers were used as sources. To generate diffuse sound, the RIRs from ten loudspeakers facing the walls to the microphones were measured. The sampling rate was set to 16 khz. Note that as spatial aliasing in the DOA estimates occurs around 7 khz for this setup, the signals were band-limited to 7 khz before processing. The speech signals at the microphones were obtained by convolving clean speech with the measured RIRs from positions 1-5. The clean speech samples consisted of male and female speech in English, German and French, recorded by a close-talking microphone. Babble noise signals were convolved with the RIR for the ten loudspeakers facing the walls, which added together result in approximately diffuse sound. Finally, the microphone signals were obtained by adding the speech signals, the diffuse signal and a measured sensor noise signal. The SNR with respect to the sensor noise was 35 db and only the diffuse noise level was varied in the experiments to test different input SNRs (given in Section V-C). The SNR was measured as the ratio between the spot signal power and the noise power at the reference microphone. The effect of reverberation on the performance, was investigated using simulated RIRs. The simulated room geometry was the same as in the measurements, with the freedom to vary the T 60 using an implementation of the image source model [38]. Diffuse sound was generated according to [39], and uncorrelated Gaussian noise with SNR 35 db was added as sensor noise. In addition, simulations were used for moving source scenarios, as detailed in Section V-C. The remaining processing parameters are as follows: STFT frame size was 64 ms with 50 % overlap, the Bayes detector costs were C du =7and C ud =1(discussion in Section V-C, Experiment 3), the averaging constants α x, α uv and α v were 0.75, 0.94, and 0.98, respectively, corresponding to time constants of 0.1, 0.5, and 1.6 seconds. The room was sampled at 10 positions per meter to obtain r i required in the spot probability computation. The a-priori SPP parameters from Section IV- Awereρ =1.2, c =5, l min =0.05, l max =0.95. Note that the framework does not impose restrictions on the geometry of the arrays, except that the DOA estimators cover angular range of 360 degrees so that triangulation can be performed. Nevertheless, in applications where the front-back ambiguity is not problematic (such as, for example for arrays mounted on a wall), linear arrays can also be used. isnr(i) =10log 10 x m (t) 2 / v m (t) 2, (43) isir(i) =10log 10 x m (t) 2 / u m (t) 2,t ((i 1)T,iT], where denotes average over t. The overall segmental isnr and isir are computed by averaging over all segments i. The output values osnr and osir are computed similarly, by using the filtered versions of the signals. The performance measures used in the evaluation can be summarized as follows i) SNR improvement Δ SNR and SIR improvement Δ SIR (also known as the array gains with respect to the noise and the interference, respectively) are computed as Δ SNR =osnr isnr, Δ SIR =osir isir. (44) To compute the average array gains Δ SNR and Δ SIR over the whole signal, only segments i with isnr(i) and isir(i) in the range [ 15, 30] were considered, so that both desired and undesired signals contribute significant energy. ii) Speech distortion (SD) index ν sd, attains values in [0, 1]. Values close to zero indicate low distortion. A reference signal is the desired source signal at the m-th microphone. For the i-th segment, the SD index is given by ν sd (i) = x m (t) ˆx m (t) 2 / x m (t) 2,t ((i 1)T,iT], (45) where the hat indicates filtering by the spotformer. iii) Interference reduction Δ IR and noise reduction Δ NR. The IR for the i-th segment is computed as Δ IR (i) =10log 10 u(t) 2 / û(t) 2,t ((i 1)T,iT], (46) Similarly, Δ NR is computed from v(t) and ˆv(t). iv) To evaluate the detector, we consider the false alarm rate (FR), and the miss rate (MR) given by FR = / [H x =1 H ideal =0] [H ideal =0], n,k n,k MR = / [H x =0 H ideal =1] [H ideal =1], n,k n,k (47) where n,k [ ] denotes summation over all TF bins of the value of the logical expression inside the brackets. A groundtruth detector H ideal is created by considering the instantaneous desired-to-undesired signal power ratio at each TF. H ideal is set to 1 if the desired signal is dominant, and to 0 otherwise. The FR and MR can be controlled by the Bayes costs, as discussed in Experiment 3. C. Results and Discussion In this section, we discuss six experiments related to the following aspects: In Experiment 1, we present the spotformer

9 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1299 TABLE I EXPERIMENT 1, OBJECTIVE PERFORMANCE EVALUATION. THE RESULTS ARE AVERAGED OVER ALL SCENARIOS WITH ONE INTERFERER performance in scenarios with different number of interferers, different number of arrays, and different noise levels. In Experiment 2, we focus on the comparison between the proposed and a state-of-the art spotforming approach. Experiment 3 investigates the influence of the detector, Experiment 4 the effect of reverberation, Experiment 5 the performance in scenarios with moving sources, and Experiment 6 examines a scenario with multiple sources inside the SOI. Experiment 1: This experiment assumes a single source in the SOI and investigates the undesired signal reduction. As there is only one desired source, we evaluate the spotformer with rank-one model-based RTF estimators described in Section III-B1 (denoted by MMSE in the results) and Section III-B2 (denoted by LS in the results). Various aspects are evaluated using the measured data: i) different number of interferers, ii) extraction with one, two, and three arrays, and iii) different SNRs. As a state-of-the-art (SoA), we use an MVDR spotformer with a fixed constraint computed from the RTF vector at the spot centre, where the desired source was located. The undesired signal PSD matrix Φ u+v for the SoA spotformer is estimated from a segment where the spot signal is inactive. In practice, such segment has to be detected, and in some cases might not exist, which poses a limitation to the approaches that estimate the Φ u+v in this manner. However, as oracle information when to estimate Φ u+v is used in this case, this experiment provides ideal settings for the fixed spotformer. Moreover, the steering vector of the SoA spotformer is perfectly matched to true RTF of the source. Under these conditions, the SoA is expected to have superior performance and the goal is to evaluate the degradation when using our proposed framework that estimates all quantities blindly from the data. The spot was a circle with radius 0.4 m. For a given number of sources, the results were averaged over all source combinations from Figure 4 (left). The sources in each scenario were active simultaneously for 20 seconds, and the input SIR for one, two, and three interferers was 2 db, -2 db, and -3.5 db, respectively. Each scenario was evaluated for a moderate and a low SNR ( 16 db and 6dB) and the signal detector with training was used. The SNR was computed with respect to the sum of diffuse and sensor noise and the diffuse-to-sensor noise ratio was 40 db and 30 db. The results are given in Table I for the case of one interferer, and in Table II for two and three interferers. The conclusions are summarized as follows a) The oracle spotformer is almost distortionless in all scenarios, as the constraint is based on the true RTFs at the source location. The SD index ν sd for the proposed data-dependent spotformer reaches up to 0.15 when using one array and up to 0.25 when using three arrays. Using multiple arrays increases TABLE II EXPERIMENT 1, OBJECTIVE PERFORMANCE EVALUATION FOR THE SCENARIOS WITH TWO AND THREE INTERFERERS. RESULTS WITH THREE-ARRAY SPOTFORMER AND ISNR =6dB the SD index as the PSD matrices are more sensitive to detection errors and the RTFs are longer filters which are more difficult to estimate accurately. Note however, that the reference signal for computing the SD index was the spot signal as received at the microphone. Hence, the increase in ν sd is partially contributed to dereverberation. This finding is further discussed in Experiment 4. b) The fact that detection errors are particularly critical when using multiple arrays is reflected in the interference reduction Δ IR as well. The multi-array oracle spotformer outperforms by 10 db the single-array oracle spotformer, whereas in the data-dependent case, only a gain of 3 db is obtained. Hence, the spatial selectivity of multiple arrays is not maximally utilized due to the detection errors. Our experiments indicated that the advantage of spatial selectivity is mainly manifested when increasing from one to two arrays, whereas the improvement when adding further arrays is less significant. c) There was no significant performance loss when the input SNR was decreased from 16 db to 6 db. The degradation in interference reduction is less than 1 db, whereas the SD index is improved in the noisier case. This can be explained as follows: for high SNR, the position estimates are accurate and concentrated around the true source locations, which leads to false alarms in the detector when an interferer is near the SOI. For low SNR however, the density of position estimates around the true source locations decreases, which in turn reduces the false alarms arising due to the nearby interferer. d) In Table II, the results for two and three interferers are shown only for the three-array spotformer. The remaining results follow similar trend as discussed above. Note that larger the number of interferers does not worsen the distortion, and the loss in interference reduction is less than 1 db. e) The undesired signal reduction with MMSE rank-one approximation (i.e., covariance subtraction-based RTF estimation) deteriorates at lower input SIR and SNR, compared to the LS approach (i.e., covariance whitening-based RTF estimation). Similar finding was presented in [26], where the

10 1300 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 Fig. 5. Experiment 2, interference power at the input and at the output of a three-array spotformer. TABLE III EXPERIMENT 2, AVERAGE INTERFERENCE AND NOISE REDUCTION OF THE STATE-OF-THE-ART AND THE PROPOSED SPOTFORMERS covariance subtraction was shown to be less accurate than the covariance whitening in low SNR conditions. Therefore in the following, unless stated otherwise, we only discuss the spotformer with the LS-based rank-one approximation. Experiment 2: The goal of this experiment is to compare the proposed data-dependent spotformer over the state-of-theart spotformer discussed in Experiment 1, when applied in dynamic scenarios. Such a scenario is obtained from the measured data as follows: a circular spot with radius 0.4 m is centered at the position 1, shown in Figure 4 (left). The first ten seconds, the desired source and an interferer at position 3 are active (average input SIR 0 db), while the next ten seconds the desired source and two interferers at positions 4 and 5 are active (input SIR -2 db). The experiment was repeated for 12 different combinations of speakers in various languages and the average input SNR was 6 db. The interference power at the input and at the spotformer outputs is plotted in Figure 5 across time for one set of signals. In the first 10 seconds, the SoA spotformer offers better interference reduction, as Φ u+v is estimated using the signal of the currently active interferer. Clearly, when the new interferers become active, such framework can not track the change in Φ u+v, resulting in inferior interference reduction compared to the proposed spotformer which is adapted using the data. The averaged results across the 12 experiments are given in Table III, whereby the results for each individual experiment followed similar trend as shown in Figure 5. Experiment 3: In this experiment, the effect of the detector on the spotformer performance is examined and the advantage of incorporating training is demonstrated. Measured data was used with the desired source and the spot center at position 2 in Figure 4 (left). Two multi-talk scenarios were considered: in the first one, an interferer at position 3 was active (relatively near to the spot) with input SIR of 0 db, and in the second one, an interferer at position 4 was active (relatively far from the spot) with input SIR of 1.5 db. The input SNR was 6 db. The training is particularly advantageous when the interferer is near the spot (< 1 m). This is visible when comparing the array gains Δ SIR in Figure 6(a) (interferer at position 4) and in Figure 6(b) (interferer at position 1). When the interferer is far, both detectors lead to similar Δ SIR, constant for all radii, whereas when the interferer is near, the training improves Δ SIR by up to 6 db for moderately large radii. As the spot radius increases, Δ SIR for both detectors deteriorates due to the increasing false alarm rate. The SNR improvement Δ SNR is constant regardless of the interferer location. When there is no interferer [Figure 6(f)], Δ SNR increases as all degrees of freedom are used for noise reduction. The previous discussion can be corroborated by analyzing the performance of the detectors in terms of FR and MR, plotted in Figure 7. The FR remains low when the interferer is far from the spot, while rapidly increasing for larger spot radii when the interferer is near. The advantage of using training is particularly noticeable in terms of the FR when the interferer is near the spot borders. The MR is however not significantly influenced by the training. Hence, the Bayes costs C du and C ud need to prioritize low FR. With this in mind, they were determined a-priori to achieve FR<0.1%, while maintaining MR not larger than 0.9%, so that Φ x update sufficiently frequently in case of moving sources (in our implementation, C ud =1and C du =7). Note that the costs might need to be revised if different parameter estimators (SPP, DOA, DDR) are used. However, for a given implementation of the estimators, chosen costs that satisfy the FR/MR trade-off, generalize well to a very wide range of acoustic conditions. Experiment 4: Due to multi-path propagation in reverberant environments, the accuracy of the position estimates decreases resulting in larger FR of the detectors. To examine the effect of reverberation on the signal quality at the spotformer output, we simulated shoe box rooms with reverberation times T 60 from 200 to 700 ms. The SNR was fixed to 9 db, which represents a significant level of babble noise. Scenarios when an interferer is far (>2 m) and when an interferer is near the spot (0.5-1 m) were simulated. For both cases, the results are averaged over 10 random source locations. The findings from this experiment are summarized as follows a) Due to the increased FR, the SD index increases as visible in Figure 8. Nevertheless, note that the high SD is partially attributed to dereverberation. To confirm this, we computed the signal-to-reverberation-modulation ratio (SRMR) [40] of the desired signal at the reference microphone, and of the desired signal after spotforming. The difference of the two SRMR values, shown in Figure 9, indicates the amount of dereverberation (larger values indicate larger dereverberation). b) Reverberation does not severely affect the noise and interference reduction. The noise reduction Δ NR is independent of the T 60 and equal to 7 db for one array, 9.2 db for two arrays, and 10.2 db for three arrays. The interference reduction Δ IR is illustrated in Figure 8 (right).

11 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1301 Fig. 6. Experiment 3, comparison of the detector with and without training for varying spot size, in terms of objective performance measures at the output of the proposed spotformer with LS-based rank-approximation (Section III-B2). Fig. 7. Experiment 3: Detection in terms of FR and MR. Fig. 9. Experiment 4: SRMR Improvement after spotforming. TABLE IV E XPERIMENT 5: O BJECTIVE P ERFORMANCE R ESULTS FOR A S CENARIO W ITH M OVING S OURCES Fig. 8. Experiment 4, effect of reverberation on the spotformer performance (LS rank-one approximation). c) Similarly as observed in Experiment 3, when the interferer is far from the spot, the performance for detector with and without training is identical, whereas when the interferer is near the spot, training improves the interference reduction, as shown in Figure 8 (right). Experiment 5: To examine the spotformer performance with moving sources, we simulated a moderately reverberant room (T60 = 300 ms) with the setup shown in Figure 4 (middle). The desired source traverses the trajectory A-B-A-B-A (solid line), whereas the interferer traverses A-B-A (dotted line), during 20 seconds of double-talk. The average input SNR and SIR were 6 db and 0 db, respectively. The experiment was repeated for 12 different sets of signals and the averaged results are summarized in Table IV. In terms of the objective measures, the spotformer achieves similar performance as in a fixed scenario with comparable acoustic conditions (see Figure 8 for T60 = 300 ms), with less than 0.5 db difference in ΔIR and and less than 0.03 difference in νsd. Although not reflected in the objective measures, it is to be noted that in the case of moving sources, the perceptual quality of the multi-array spotformer was degraded.

12 1302 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 Fig. 10. Experiment 6: Average objective performance results in a scenario with two sources inside the spot. However, when using a single-array spotformer, the signal quality was comparable to the one in fixed scenarios, confirming the potential of the spotformer in highly dynamic scenarios. A comparable stand-alone framework to track the time-varying PSD matrices is not known to the authors at this point, hence, a comparison to the state-of-the-art method is not provided. Experiment 6: In applications requiring larger spot size, multiple sources in S might be active. In the spotformer framework, the goal is to extract the sources using a single constraint without having to estimate each source RTFs. The constraint described in Section III-B3 (denoted as Proj) is suitable in this scenario and is compared to the LS constraint from Section III-B2. Additionally, we compared two fixed constraints from the state-of-the-art: 1) the RTF at the spot centroid (denoted as Fix_centre), and 2) the eigenvector constraint corresponding to the principal eigenvector of G S described in Section III-A (denoted as Fix_eig). To make a fair comparison with the proposed MVDR spotformers in terms of undesired signal reduction, we did not compute an LCMV filter with multiple constraints but rather an MVDR filter with a single eigenvector constraint. Furthermore, to only focus on the effect of the constraint, both the proposed and the state-of-the-art designs used the PSD matrix estimate Φ u+v obtained by the framework proposed in this work. The scenario shown in Figure 4 (right) was simulated, where two sources inside S and an interferer are simultaneously active. The experiment was repeated for 10 different combinations of speakers in various languages. The results are shown in Figure 10 for a single-array and a three-array spotformer, for SNRs of 17 db and 9 db, and T 60 = 200 ms. It can be observed that the projection method improves the Δ NR by 1.5 db and thearraygainδ SIR by 1-2 db compared to the LS method, while also slightly reducing the distortion ν SD of the spot signal, which in this case is the sum of the two source signals. The data-independent constraints result is notably worse performance both in terms of spot signal distortion and undesired signal reduction. Finally, to illustrate an example of the spatial pattern, the spotformer coefficients from all frequencies at a given frame were applied to source signals located at different positions on a square grid of 10 positions per meter. For each position, the ratio of the source power at the output to the source power at the input of the spotformer is coded in the color in Figure 11. Largest attenuation is visible at the location of the interferers, showing the spotformer ability to blindly create spatial notches to reduce the interferers. The images also illustrate that while Fig. 11. Spatial selectivity pattern of the spotformer. The plus signs denote the arrays and the squares denote the sources. The colormap is [-11,-1] db (darkest to brightest). multiple arrays increase the spatial selectivity, they also lead to increased spot signal distortion. VI. CONCLUSIONS A fully data-dependent acoustic spotformer was proposed to extract signals originating from the spot of interest (SOI), while reducing noise and interference. It relies on a low-rank approximation of the spot signal PSD matrix, which is valid due to the speech sparsity in the TF domain. An important contribution that enables PSD matrix estimation in practice is the underlying probabilistic model which exploits spatial information and a minimum Bayes-risk decision rule to determine the TF bins where the spot signal is dominant. The main advantage of the proposed framework over existing approaches is that the spot PSD matrix computed from the data tends to be low rank, allowing for an MVDR spotformer design which uses maximum degrees of freedom for undesired signal reduction. Different methods to design the MVDR spotformer constraint based on state-of-the-art RTF estimators were discussed and evaluated. Thanks to the proposed signal detection framework for PSD matrix estimation, the spotformer adapts almost instantaneously in changing acoustic conditions and appearing/disappearing sources. REFERENCES [1] S. Markovich, S. Gannot, and I. Cohen, Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals, IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp , Aug [2] M. Taseska and E. A. P. Habets, Informed spatial filtering with distributed arrays, IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 7, pp , Jul [3] D. H. Tran Vu and R. Haeb-Umbach, An EM approach to integrated multichannel speech separation and noise suppression, in Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC), 2010, pp. 1 4.

13 TASESKA AND HABETS: SPOTFORMING: SPATIAL FILTERING WITH DISTRIBUTED ARRAYS 1303 [4] M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, A multichannel MMSE-based framework for speech source separation and noise reduction, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 9, pp , Sep [5] Y. Grenier, A microphone array for car environments, Speech Commun., vol. 12, pp , Mar [6] M. Taseska and E. A. P. Habets, Spotforming using distributed microphone arrays, in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013, pp [7] N. Grbic and S. Nordholm, Soft constrained subband beamforming for hands-free speech enhancement, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2002, vol. 1, pp. I-885 I-888. [8] J. Martinez, N. Gaubitch, and W. B. Kleijn, A robust region-based near-field beamformer, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Apr. 2015, pp [9] Y. Zheng, R. Goubran, and M. El-Tanany, Robust near-field adaptive beamforming with distance discrimination, IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp , Sep [10] A. Davis, S. Y. Low, S. Nordholm, and N. Grbic, A subband space constrained beamformer incorporating voice activity detection, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2005, pp. iii/65 iii/68. [11] O. Thiergart, M. Taseska, and E. A. P. Habets, An informed parametric spatial filter based on instantaneous direction-of-arrival estimates, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp , Dec [12] C. A. Anderson, P. D. Teal, and M. A. Poletti, Spatially robust farfield beamforming using the von Mises(-Fisher) distribution, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 12, pp , Dec [13] F. Dowla and A. Spiridon, Spotforming with an array of ultra-wideband radio transmitters, in Proc. IEEE Conf. Ultra Wideband Syst. Technol., Nov. 2003, pp [14] S. Affès and Y. Grenier, A signal subspace tracking algorithm for microphone array processing of speech, IEEE Trans. Speech Audio Process., vol. 5, no. 5, pp , Sep [15] E. E. Jan and J. Flanagan, Sound capture from spatial volumes: Matchedfilter processing of microphone arrays having randomly-distributed sensors, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Atlanta, GA, USA, May 1996, pp [16] Ò. Yilmaz and S. Rickard, Blind separation of speech mixture via time-frequency masking, IEEE Trans. Signal Process., vol. 52, no. 7, pp , Jul [17] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing. Berlin, Germany: Springer-Verlag, [18] S. Araki, H. Sawada, and S. Makino, Blind speech separation in a meeting situation with maximum SNR beamformers, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2007, pp. I-41 I-44. [19] D. P. Jarrett, M. Taseska, E. A. P. Habets, and P. Naylor, Noise reduction in the spherical harmonic domain using a tradeoff beamformer and narrowband DOA estimates, IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 5, pp , May [20] M. Taseska and E. A. P. Habets, Minimum Bayes risk signal detection for speech enhancement based on a narrowband DOA model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015, pp [21] D. Cherkassky and S. Gannot, Blind synchronization in wireless sensor networks with application to speech enhancement, in Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC), Juan-les-Pins, France, Sep. 2014, pp [22] Y. Avargel and I. Cohen, On multiplicative transfer function approximation in the short-time Fourier transform domain, IEEE Signal Process. Lett., vol. 14, no. 5, pp , May [23] M. K. Buckley, Spatial/spectral filtering with linearly constrained minimum variance beamformers, IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no. 3, pp , Mar [24] J. Capon, High resolution frequency-wavenumber spectrum analysis, Proc. IEEE, vol. 57, no. 8, pp , Aug [25] S. Gannot, D. Burshtein, and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Trans. Signal Process., vol. 49, no. 8, pp , Aug [26] S. Markovich-Golan and S. Gannot, Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2015, pp [27] B. Cornelis, S. Doclo, T. Van dan Bogaert, M. Moonen, and J. Wouters, Performance analysis of multichannel Wiener filter-based noise reduction in hearing aids under second order statistics estimation errors, IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp , Jul [28] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Balimore, MD, USA: The Johns Hopkins Univ. Press, [29] A. Krueger, E. Warsitz, and R. Haeb-Umbach, Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation, IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 1, pp , Jan [30] E. Warsitz and R. Haeb-Umbach, Blind acoustic beamforming based on generalized eigenvalue decomposition, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp , Jul [31] M. Taseska and E. A. P. Habets, Relative transfer function estimation exploiting instantaneous signals and the signal subspace, in Proc. Eur. Signal Process. Conf. (EUSIPCO), Nice, France, Sep. 2015, pp [32] M. Taseska and E. A. P. Habets, MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator, in Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC), Sep. 2012, pp [33] M. Souden, J. Chen, J. Benesty, and S. Affès, Gaussian model-based multichannel speech presence probability, IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp , Jul [34] T. Gerkmann, C. Breithaupt, and R. Martin, Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors, IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp , Jul [35] O. Thiergart, G. Del Galdo, and E. A. P. Habets, Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2012, pp [36] S. Araki, H. Sawada, R. Mukai, and S. Makino, A novel blind source separation method with observation vector clustering, in Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC), Sep. 2005, pp [37] S. Kay, Fundamentals of Statistical Signal Processing, Volume II: Detection Theory. Englewood Cliffs, NJ, USA: Prentice-Hall, [38] E. A. P. Habets, Room impulse response generator, Technische Universiteit Eindhoven, Tech. Rep., Eindhoven, The Netherlands, Sep [39] E. A. P. Habets and S. Gannot, Generating sensor signals in isotropic noise fields, J. Acoust. Soc. Amer., vol. 122, no. 6, pp , Dec [40] T. Falk, C. Zheng, and W.-Y. Chan, A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech, IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp , Sep Maja Taseska (S 13) was born in Ohrid, Macedonia, in She received the B.Sc. degree in electrical engineering from the Jacobs University, Bremen, Germany, in 2010, and the M.Sc. degree from the Friedrich-Alexander-University, Erlangen, Germany, in She is currently pursuing the Ph.D. degree in informed spatial filtering at the International Audio Laboratories Erlangen, Erlangen, Germany. Her research interests include informed spatial filtering, source localization and tracking, blind source separation, and noise reduction.

14 1304 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 Emanuël A.P. Habets (S 02 M 07 SM 11) received the B.Sc. degree in electrical engineering from the Hogeschool Limburg, Heerlen, The Netherlands, in 1999, and the M.Sc. and Ph.D. degrees in electrical engineering from the Technische Universiteit Eindhoven, Eindhoven, The Netherlands, in 2002 and 2007, respectively. He is an Associate Professor with the International Audio Laboratories Erlangen (a joint institution of the Friedrich-Alexander- University Erlangen-Nürnberg and Fraunhofer IIS), and the Head of the Spatial Audio Research Group at Fraunhofer IIS, Germany. From 2007 to 2009, he was a Postdoctoral Fellow with the Technion-Israel Institute of Technology, Haifa, Israel, and the Bar-Ilan University, Ramat Gan, Israel. From 2009 to 2010, he was a Research Fellow with the Communication and Signal Processing Group, Imperial College London, London, U.K. His research interests include audio and acoustic signal processing, spatial audio signal processing, spatial sound recording and reproduction, speech enhancement (dereverberation, noise reduction, echo reduction), and sound localization and tracking. Dr. Habets was a member of the organization committee of the 2005 International Workshop on Acoustic Echo and Noise Control (IWAENC), Eindhoven, The Netherlands, a General Co-Chair of the 2013 International Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, and a General Co-Chair of the 2014 International Conference on Spatial Audio (ICSA), Erlangen, Germany. He was a member of the IEEE Signal Processing Society Standing Committee on Industry Digital Signal Processing Technology ( ), and a Guest Editor for the IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING. Currently, he is an Associate Editor of the IEEE SIGNAL PROCESSING LETTERS, and an Editor-in-Chief of the EURASIP Journal on Audio, Speech, and Music Processing.

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 7, JULY 2014 1195 Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays Maja Taseska, Student

More information

Speech Enhancement Using Microphone Arrays

Speech Enhancement Using Microphone Arrays Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Speech Enhancement Using Microphone Arrays International Audio Laboratories Erlangen Prof. Dr. ir. Emanuël A. P. Habets Friedrich-Alexander

More information

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor Presented by Amir Kiperwas 1 M-element microphone array One desired source One undesired source Ambient noise field Signals: Broadband Mutually

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE 260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY 2010 On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction Mehrez Souden, Student Member,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 1071 Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals

More information

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO Antennas and Propagation b: Path Models Rayleigh, Rician Fading, MIMO Introduction From last lecture How do we model H p? Discrete path model (physical, plane waves) Random matrix models (forget H p and

More information

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,

More information

arxiv: v1 [cs.sd] 4 Dec 2018

arxiv: v1 [cs.sd] 4 Dec 2018 LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and

More information

SIGNAL MODEL AND PARAMETER ESTIMATION FOR COLOCATED MIMO RADAR

SIGNAL MODEL AND PARAMETER ESTIMATION FOR COLOCATED MIMO RADAR SIGNAL MODEL AND PARAMETER ESTIMATION FOR COLOCATED MIMO RADAR Moein Ahmadi*, Kamal Mohamed-pour K.N. Toosi University of Technology, Iran.*moein@ee.kntu.ac.ir, kmpour@kntu.ac.ir Keywords: Multiple-input

More information

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya

More information

Optimum Beamforming. ECE 754 Supplemental Notes Kathleen E. Wage. March 31, Background Beampatterns for optimal processors Array gain

Optimum Beamforming. ECE 754 Supplemental Notes Kathleen E. Wage. March 31, Background Beampatterns for optimal processors Array gain Optimum Beamforming ECE 754 Supplemental Notes Kathleen E. Wage March 31, 29 ECE 754 Supplemental Notes: Optimum Beamforming 1/39 Signal and noise models Models Beamformers For this set of notes, we assume

More information

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques Antennas and Propagation : Array Signal Processing and Parametric Estimation Techniques Introduction Time-domain Signal Processing Fourier spectral analysis Identify important frequency-content of signal

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION Aviva Atkins, Yuval Ben-Hur, Israel Cohen Department of Electrical Engineering Technion - Israel Institute of Technology Technion City, Haifa

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

Blind Blur Estimation Using Low Rank Approximation of Cepstrum Blind Blur Estimation Using Low Rank Approximation of Cepstrum Adeel A. Bhutta and Hassan Foroosh School of Electrical Engineering and Computer Science, University of Central Florida, 4 Central Florida

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

Blind Beamforming for Cyclostationary Signals

Blind Beamforming for Cyclostationary Signals Course Page 1 of 12 Submission date: 13 th December, Blind Beamforming for Cyclostationary Signals Preeti Nagvanshi Aditya Jagannatham UCSD ECE Department 9500 Gilman Drive, La Jolla, CA 92093 Course Project

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

MIMO Receiver Design in Impulsive Noise

MIMO Receiver Design in Impulsive Noise COPYRIGHT c 007. ALL RIGHTS RESERVED. 1 MIMO Receiver Design in Impulsive Noise Aditya Chopra and Kapil Gulati Final Project Report Advanced Space Time Communications Prof. Robert Heath December 7 th,

More information

IN recent years, there has been great interest in the analysis

IN recent years, there has been great interest in the analysis 2890 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 7, JULY 2006 On the Power Efficiency of Sensory and Ad Hoc Wireless Networks Amir F. Dana, Student Member, IEEE, and Babak Hassibi Abstract We

More information

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS Jürgen Freudenberger, Sebastian Stenzel, Benjamin Venditti

More information

QUANTIZATION NOISE ESTIMATION FOR LOG-PCM. Mohamed Konaté and Peter Kabal

QUANTIZATION NOISE ESTIMATION FOR LOG-PCM. Mohamed Konaté and Peter Kabal QUANTIZATION NOISE ESTIMATION FOR OG-PCM Mohamed Konaté and Peter Kabal McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada, H3A 2A7 e-mail: mohamed.konate2@mail.mcgill.ca,

More information

A Spectral Conversion Approach to Single- Channel Speech Enhancement

A Spectral Conversion Approach to Single- Channel Speech Enhancement University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 2007 A Spectral Conversion Approach to Single- Channel Speech Enhancement Athanasios

More information

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Airo Interantional Research Journal September, 2013 Volume II, ISSN: Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

The fundamentals of detection theory

The fundamentals of detection theory Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection

More information

Lecture 4 Diversity and MIMO Communications

Lecture 4 Diversity and MIMO Communications MIMO Communication Systems Lecture 4 Diversity and MIMO Communications Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 1 Outline Diversity Techniques

More information

Adaptive Waveforms for Target Class Discrimination

Adaptive Waveforms for Target Class Discrimination Adaptive Waveforms for Target Class Discrimination Jun Hyeong Bae and Nathan A. Goodman Department of Electrical and Computer Engineering University of Arizona 3 E. Speedway Blvd, Tucson, Arizona 857 dolbit@email.arizona.edu;

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments Kouei Yamaoka, Shoji Makino, Nobutaka Ono, and Takeshi Yamada University of Tsukuba,

More information

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Stefan Wunsch, Johannes Fink, Friedrich K. Jondral Communications Engineering Lab, Karlsruhe Institute of Technology Stefan.Wunsch@student.kit.edu,

More information

TRANSMIT diversity has emerged in the last decade as an

TRANSMIT diversity has emerged in the last decade as an IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 3, NO. 5, SEPTEMBER 2004 1369 Performance of Alamouti Transmit Diversity Over Time-Varying Rayleigh-Fading Channels Antony Vielmon, Ye (Geoffrey) Li,

More information

Adaptive Wireless. Communications. gl CAMBRIDGE UNIVERSITY PRESS. MIMO Channels and Networks SIDDHARTAN GOVJNDASAMY DANIEL W.

Adaptive Wireless. Communications. gl CAMBRIDGE UNIVERSITY PRESS. MIMO Channels and Networks SIDDHARTAN GOVJNDASAMY DANIEL W. Adaptive Wireless Communications MIMO Channels and Networks DANIEL W. BLISS Arizona State University SIDDHARTAN GOVJNDASAMY Franklin W. Olin College of Engineering, Massachusetts gl CAMBRIDGE UNIVERSITY

More information

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SF Minhas A Barton P Gaydecki School of Electrical and

More information

Adaptive Beamforming. Chapter Signal Steering Vectors

Adaptive Beamforming. Chapter Signal Steering Vectors Chapter 13 Adaptive Beamforming We have already considered deterministic beamformers for such applications as pencil beam arrays and arrays with controlled sidelobes. Beamformers can also be developed

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

MIMO Environmental Capacity Sensitivity

MIMO Environmental Capacity Sensitivity MIMO Environmental Capacity Sensitivity Daniel W. Bliss, Keith W. Forsythe MIT Lincoln Laboratory Lexington, Massachusetts bliss@ll.mit.edu, forsythe@ll.mit.edu Alfred O. Hero University of Michigan Ann

More information

Chapter 2: Signal Representation

Chapter 2: Signal Representation Chapter 2: Signal Representation Aveek Dutta Assistant Professor Department of Electrical and Computer Engineering University at Albany Spring 2018 Images and equations adopted from: Digital Communications

More information

An analysis of blind signal separation for real time application

An analysis of blind signal separation for real time application University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2006 An analysis of blind signal separation for real time application

More information

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE 546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL 17, NO 4, MAY 2009 Relative Transfer Function Identification Using Convolutive Transfer Function Approximation Ronen Talmon, Israel

More information

IN REVERBERANT and noisy environments, multi-channel

IN REVERBERANT and noisy environments, multi-channel 684 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 6, NOVEMBER 2003 Analysis of Two-Channel Generalized Sidelobe Canceller (GSC) With Post-Filtering Israel Cohen, Senior Member, IEEE Abstract

More information

Matched filter. Contents. Derivation of the matched filter

Matched filter. Contents. Derivation of the matched filter Matched filter From Wikipedia, the free encyclopedia In telecommunications, a matched filter (originally known as a North filter [1] ) is obtained by correlating a known signal, or template, with an unknown

More information

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement Mamun Ahmed, Nasimul Hyder Maruf Bhuyan Abstract In this paper, we have presented the design, implementation

More information

Noise Plus Interference Power Estimation in Adaptive OFDM Systems

Noise Plus Interference Power Estimation in Adaptive OFDM Systems Noise Plus Interference Power Estimation in Adaptive OFDM Systems Tevfik Yücek and Hüseyin Arslan Department of Electrical Engineering, University of South Florida 4202 E. Fowler Avenue, ENB-118, Tampa,

More information

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio >Bitzer and Rademacher (Paper Nr. 21)< 1 Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio Joerg Bitzer and Jan Rademacher Abstract One increasing problem for

More information

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR BeBeC-2016-S9 BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR Clemens Nau Daimler AG Béla-Barényi-Straße 1, 71063 Sindelfingen, Germany ABSTRACT Physically the conventional beamforming method

More information

Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm

Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm Volume-8, Issue-2, April 2018 International Journal of Engineering and Management Research Page Number: 50-55 Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm Bhupenmewada 1, Prof. Kamal

More information

Amplitude and Phase Distortions in MIMO and Diversity Systems

Amplitude and Phase Distortions in MIMO and Diversity Systems Amplitude and Phase Distortions in MIMO and Diversity Systems Christiane Kuhnert, Gerd Saala, Christian Waldschmidt, Werner Wiesbeck Institut für Höchstfrequenztechnik und Elektronik (IHE) Universität

More information

ONE of the most common and robust beamforming algorithms

ONE of the most common and robust beamforming algorithms TECHNICAL NOTE 1 Beamforming algorithms - beamformers Jørgen Grythe, Norsonic AS, Oslo, Norway Abstract Beamforming is the name given to a wide variety of array processing algorithms that focus or steer

More information

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Effect of Fading Correlation on the Performance of Spatial Multiplexed MIMO systems with circular antennas M. A. Mangoud Department of Electrical and Electronics Engineering, University of Bahrain P. O.

More information

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING ADAPTIVE ANTENNAS TYPES OF BEAMFORMING 1 1- Outlines This chapter will introduce : Essential terminologies for beamforming; BF Demonstrating the function of the complex weights and how the phase and amplitude

More information

Dynamically Configured Waveform-Agile Sensor Systems

Dynamically Configured Waveform-Agile Sensor Systems Dynamically Configured Waveform-Agile Sensor Systems Antonia Papandreou-Suppappola in collaboration with D. Morrell, D. Cochran, S. Sira, A. Chhetri Arizona State University June 27, 2006 Supported by

More information

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA Qipeng Gong, Benoit Champagne and Peter Kabal Department of Electrical & Computer Engineering, McGill University 3480 University St.,

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

ARQ strategies for MIMO eigenmode transmission with adaptive modulation and coding

ARQ strategies for MIMO eigenmode transmission with adaptive modulation and coding ARQ strategies for MIMO eigenmode transmission with adaptive modulation and coding Elisabeth de Carvalho and Petar Popovski Aalborg University, Niels Jernes Vej 2 9220 Aalborg, Denmark email: {edc,petarp}@es.aau.dk

More information

BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOCK CODES WITH MMSE CHANNEL ESTIMATION

BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOCK CODES WITH MMSE CHANNEL ESTIMATION BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOC CODES WITH MMSE CHANNEL ESTIMATION Lennert Jacobs, Frederik Van Cauter, Frederik Simoens and Marc Moeneclaey

More information

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,

More information

Time-Delay Estimation From Low-Rate Samples: A Union of Subspaces Approach Kfir Gedalyahu and Yonina C. Eldar, Senior Member, IEEE

Time-Delay Estimation From Low-Rate Samples: A Union of Subspaces Approach Kfir Gedalyahu and Yonina C. Eldar, Senior Member, IEEE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 3017 Time-Delay Estimation From Low-Rate Samples: A Union of Subspaces Approach Kfir Gedalyahu and Yonina C. Eldar, Senior Member, IEEE

More information

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS Joonas Nikunen, Tuomas Virtanen Tampere University of Technology Korkeakoulunkatu

More information

Adaptive f-xy Hankel matrix rank reduction filter to attenuate coherent noise Nirupama (Pam) Nagarajappa*, CGGVeritas

Adaptive f-xy Hankel matrix rank reduction filter to attenuate coherent noise Nirupama (Pam) Nagarajappa*, CGGVeritas Adaptive f-xy Hankel matrix rank reduction filter to attenuate coherent noise Nirupama (Pam) Nagarajappa*, CGGVeritas Summary The reliability of seismic attribute estimation depends on reliable signal.

More information

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud PERCEPTION Team, INRIA Grenoble Rhone-Alpes October

More information

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1.

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1. EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code Project #1 is due on Tuesday, October 6, 2009, in class. You may turn the project report in early. Late projects are accepted

More information

Modulation Classification based on Modified Kolmogorov-Smirnov Test

Modulation Classification based on Modified Kolmogorov-Smirnov Test Modulation Classification based on Modified Kolmogorov-Smirnov Test Ali Waqar Azim, Syed Safwan Khalid, Shafayat Abrar ENSIMAG, Institut Polytechnique de Grenoble, 38406, Grenoble, France Email: ali-waqar.azim@ensimag.grenoble-inp.fr

More information

A Blind Array Receiver for Multicarrier DS-CDMA in Fading Channels

A Blind Array Receiver for Multicarrier DS-CDMA in Fading Channels A Blind Array Receiver for Multicarrier DS-CDMA in Fading Channels David J. Sadler and A. Manikas IEE Electronics Letters, Vol. 39, No. 6, 20th March 2003 Abstract A modified MMSE receiver for multicarrier

More information

Smart antenna for doa using music and esprit

Smart antenna for doa using music and esprit IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN : 2278-2834 Volume 1, Issue 1 (May-June 2012), PP 12-17 Smart antenna for doa using music and esprit SURAYA MUBEEN 1, DR.A.M.PRASAD

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Adaptive Systems Homework Assignment 3

Adaptive Systems Homework Assignment 3 Signal Processing and Speech Communication Lab Graz University of Technology Adaptive Systems Homework Assignment 3 The analytical part of your homework (your calculation sheets) as well as the MATLAB

More information

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic

More information

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W. Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W. Published in: IEEE Transactions on Audio, Speech, and Language

More information

A Closed Form for False Location Injection under Time Difference of Arrival

A Closed Form for False Location Injection under Time Difference of Arrival A Closed Form for False Location Injection under Time Difference of Arrival Lauren M. Huie Mark L. Fowler lauren.huie@rl.af.mil mfowler@binghamton.edu Air Force Research Laboratory, Rome, N Department

More information

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE Sam Karimian-Azari, Jacob Benesty,, Jesper Rindom Jensen, and Mads Græsbøll Christensen Audio Analysis Lab, AD:MT, Aalborg University,

More information

Approaches for Angle of Arrival Estimation. Wenguang Mao

Approaches for Angle of Arrival Estimation. Wenguang Mao Approaches for Angle of Arrival Estimation Wenguang Mao Angle of Arrival (AoA) Definition: the elevation and azimuth angle of incoming signals Also called direction of arrival (DoA) AoA Estimation Applications:

More information

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror Image analysis CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror A two- dimensional image can be described as a function of two variables f(x,y). For a grayscale image, the value of f(x,y) specifies the brightness

More information

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI

More information

Acentral problem in the design of wireless networks is how

Acentral problem in the design of wireless networks is how 1968 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 6, SEPTEMBER 1999 Optimal Sequences, Power Control, and User Capacity of Synchronous CDMA Systems with Linear MMSE Multiuser Receivers Pramod

More information

TIIVISTELMÄRAPORTTI (SUMMARY REPORT)

TIIVISTELMÄRAPORTTI (SUMMARY REPORT) 2014/2500M-0015 ISSN 1797-3457 (verkkojulkaisu) ISBN (PDF) 978-951-25-2640-6 TIIVISTELMÄRAPORTTI (SUMMARY REPORT) Modern Signal Processing Methods in Passive Acoustic Surveillance Jaakko Astola*, Bogdan

More information

Indoor Localization based on Multipath Fingerprinting. Presented by: Evgeny Kupershtein Instructed by: Assoc. Prof. Israel Cohen and Dr.

Indoor Localization based on Multipath Fingerprinting. Presented by: Evgeny Kupershtein Instructed by: Assoc. Prof. Israel Cohen and Dr. Indoor Localization based on Multipath Fingerprinting Presented by: Evgeny Kupershtein Instructed by: Assoc. Prof. Israel Cohen and Dr. Mati Wax Research Background This research is based on the work that

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory

More information

1.Discuss the frequency domain techniques of image enhancement in detail.

1.Discuss the frequency domain techniques of image enhancement in detail. 1.Discuss the frequency domain techniques of image enhancement in detail. Enhancement In Frequency Domain: The frequency domain methods of image enhancement are based on convolution theorem. This is represented

More information

A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal

A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal International Journal of ISSN 0974-2107 Systems and Technologies IJST Vol.3, No.1, pp 11-16 KLEF 2010 A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal Gaurav Lohiya 1,

More information

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation Gal Reuven Under supervision of Sharon Gannot 1 and Israel Cohen 2 1 School of Engineering, Bar-Ilan University,

More information

A New Subspace Identification Algorithm for High-Resolution DOA Estimation

A New Subspace Identification Algorithm for High-Resolution DOA Estimation 1382 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 50, NO. 10, OCTOBER 2002 A New Subspace Identification Algorithm for High-Resolution DOA Estimation Michael L. McCloud, Member, IEEE, and Louis

More information

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 2, FEBRUARY 2002 187 Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System Xu Zhu Ross D. Murch, Senior Member, IEEE Abstract In

More information

A Novel Approach for the Characterization of FSK Low Probability of Intercept Radar Signals Via Application of the Reassignment Method

A Novel Approach for the Characterization of FSK Low Probability of Intercept Radar Signals Via Application of the Reassignment Method A Novel Approach for the Characterization of FSK Low Probability of Intercept Radar Signals Via Application of the Reassignment Method Daniel Stevens, Member, IEEE Sensor Data Exploitation Branch Air Force

More information

Communications Overhead as the Cost of Constraints

Communications Overhead as the Cost of Constraints Communications Overhead as the Cost of Constraints J. Nicholas Laneman and Brian. Dunn Department of Electrical Engineering University of Notre Dame Email: {jnl,bdunn}@nd.edu Abstract This paper speculates

More information

Robust Near-Field Adaptive Beamforming with Distance Discrimination

Robust Near-Field Adaptive Beamforming with Distance Discrimination Missouri University of Science and Technology Scholars' Mine Electrical and Computer Engineering Faculty Research & Creative Works Electrical and Computer Engineering 1-1-2004 Robust Near-Field Adaptive

More information

HIGH ORDER MODULATION SHAPED TO WORK WITH RADIO IMPERFECTIONS

HIGH ORDER MODULATION SHAPED TO WORK WITH RADIO IMPERFECTIONS HIGH ORDER MODULATION SHAPED TO WORK WITH RADIO IMPERFECTIONS Karl Martin Gjertsen 1 Nera Networks AS, P.O. Box 79 N-52 Bergen, Norway ABSTRACT A novel layout of constellations has been conceived, promising

More information

STAP approach for DOA estimation using microphone arrays

STAP approach for DOA estimation using microphone arrays STAP approach for DOA estimation using microphone arrays Vera Behar a, Christo Kabakchiev b, Vladimir Kyovtorov c a Institute for Parallel Processing (IPP) Bulgarian Academy of Sciences (BAS), behar@bas.bg;

More information

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

Analysis of LMS and NLMS Adaptive Beamforming Algorithms Analysis of LMS and NLMS Adaptive Beamforming Algorithms PG Student.Minal. A. Nemade Dept. of Electronics Engg. Asst. Professor D. G. Ganage Dept. of E&TC Engg. Professor & Head M. B. Mali Dept. of E&TC

More information