REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY

Size: px
Start display at page:

Download "REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY"

Transcription

1 REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY by Hoang Tran Huy Do A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN THE DIVISION OF ENGINEERING AT BROWN UNIVERSITY September 2009

2 c Copyright by Hoang Tran Huy Do 2010 All Rights Reserved ii

3 This thesis by Hoang Tran Huy Do is accepted in its present form by the Division of Engineering as satisfying the dissertation requirement for the degree of Master of Science. (Dr. Harvey Fox Silverman) Principal Adviser Approved for the University Committee on Graduate Studies. iii

4 Contents 1 Introduction Methods of Pairwise Time-Delay Estimation Methods of Steered Beamforming Scope of this work Acoustic Model Sound Sources Multi-Path Propagation of Sound Waves Near Field and Far Field Direction of Arrival GCC-PHAT Generalized Cross-Correlation (GCC) Derivation of the GCC The Phase Transform (PHAT) GCC-PHAT GCC-PHAT in our system SRP-PHAT Beamformers Steered Response Power(SRP) SRP-PHAT SRP-PHAT as a Source Localization Method iv

5 5 Stochastic Region Contraction (SRC) Overview SRC in SRP-PHAT Coarse-to-Fine Region Contraction (CFRC) 28 7 Computational Cost Signal-Processing Cost Cost of interpolation Cost per functional evaluation, fe Cost of a full grid-search Cost of SRC and CFRC Experiments and Results The Huge Microphone Array (HMA) Experimental System Preliminary Processing for SRP-PHAT data Interpolation Simple Energy Discriminator Results Conclusions and Future Work Conclusions Future Work Bibliography 46 v

6 List of Tables 5.1 Probabilistic relation to determine number of fe s required Performance for LEMSalg, SRP-PHAT using grid-search, CFRC and SRC vi

7 List of Figures 2.1 DOA s of 4 microphones in an array for the near field case DOA in the far field case with azimuth angle θ and elevation angle φ TDOA between two microphones D example of SRC Performance of SRC-I as a function of parameter N for four different source locations D example of CFRC Performance of CFRC as a function of parameter N for four different source locations Top View of HMA SRP-PHAT surface without interpolation SRP-PHAT surface with filter interpolation SRP-PHAT surface with cubic interpolation The simple energy discriminator for Source The simple energy discriminator for Source The simple energy discriminator for Source The simple energy discriminator for Source Performance of CFRC and SRC Cost of CFRC and SRC D illustration of two talker-case (n=2) vii

8 Chapter 1 Introduction Microphone arrays have been used in many applications, such as teleconferencing[8, 14, 21, 22, 40], speech recognition[1, 6, 5, 15, 16, 17, 31], talker characterization[39] and voice capture in reverberant environments[13, 19, 43]. Many array designs have been implemented and studied, including spherical arrays[11], superdirectivity arrays[9], linear arrays[33], etc. On these arrays, work has been done on the construction and designs of hardware as well as the implementation of real-time software. As a result of the fast and tremendous development of inexpensive DSP microprocessors, microphone arrays have been commercialized and implemented in a lot of daily products, just to name a few: cellular phones and bluetooth headsets for hand-free operations, automobile speech enhancement and noise cancellation for audio communication, etc. Companies such as PictureTel and Polycom have applied microphone-array technology in their voice-capture products used in small environments, as well as in automatic steering-camera products to frame active talkers using a 4-element microphone array [40]. In most of these applications, it is essential to estimate the sound-source locations. These estimates are to be produced at a high rate and with very low latency, yet need to be accurate. Extensive research has been carried out to solve the source localization problem. In general, there are two categories of algorithms that have been explored for real systems: the two-stage and one-stage algorithms. Performances and computational costs of these two categories vary depending upon conditions of the 1

9 CHAPTER 1. INTRODUCTION 2 environments, the data being processed, the ways these algorithms are implemented, and also the talkers themselves. A two-stage algorithm processes in two steps. In the first step, or the time-delay estimation (TDE) step, it produces pairwise time-difference of arrivals (TDOA s) of speech sounds between pairs of microphones. In the second step, hyperbolic curves are generated from the available time delay estimates and microphone position data, and then one needs to estimate the source location from the intersection of these curves in some optimal sense. Many methods to solve this task have been proposed, such as maximum likelihood estimation, the least-square error method, a linear intersection method, spherical interpolation, etc. [18, 37, 29, 28]. However, the pairwise technique suffers considerable degradation from acoustic reverberation as one will see later in this thesis. In contrast, one-stage algorithm does not process in a pairwise manner but exploits the multitude of microphones in order to overcome the limitation given by making early decisions and reverberation. A common example of this approach is the beamforming method. Beamforming is the process of delaying the outputs of the microphones in an array s aperture and adding these together, to reinforce the signal with respect to noise or waves propagating in different directions. A beamformer can be used to scan, or steer over the predefined region where the sound source is located, at all possible positions. The point that gives the maximum beamforming power output would be the source location. This method is known as Steered Response Power (SRP) [10]. The computation for this method is generally intensive. However, the trade-off is the capability of combining multiple microphone signals at the same time rather than reducing the data from each pair to a single time-delay parameter. This method also allows one to process efficiently on short data-segments (or frames) by integrating the data from many, or all of the microphones before making a decision, whereas the two-stage method has to make an early decision based on the data from each pair of microphones. Another remarkable advantage of the one-stage method is it can be used to localize multiple simultaneous talkers. In that case, the steered beamformer will peak multiple times corresponding to the locations of multiple talkers.

10 CHAPTER 1. INTRODUCTION 3 Besides these two major categories of location estimation techniques, there is another class of methods gaining considerable attention recently. It is based on highresolution spectral estimation, such as the multiple signal classification (MUSIC) technique and estimation of signal parameters via rotational invariance techniques (ESPRIT)[30, 27]. Among these methods, the MUSIC-based method is widely used and probably the most popular. It has been developed into many variants, such as:rap-music, ROOT-MUSIC, IES-MUSIC [25, 24, 38]. It originally was applied in spectral estimation to estimate the frequencies and characteristics of the wavefronts, then it was developed into a popular direction-of-arrival (DOA) estimation technique for narrowband signals. First, it decomposes the cross-correlation matrix of the microphone signals into signal and noise subspaces using eigenvalue decomposition. Then a search is performed using either the noise subspace or signal subspace over all possible DOA s to determine the most likely one. Although subspace techniques that are applicable to wideband signals exist [41], they do not seem to be considered as a viable alternative to the first two categories of algorithms (TDE and beamformer-based) in applications dealing with speech signals, especially correlated signals [7]. Therefore, in the scope of this thesis, we do not investigate this class of location methods but focus on the former two, namely TDE-based and beamformerbased methods. 1.1 Methods of Pairwise Time-Delay Estimation Sound-source localization techniques relying on time delay estimation (TDE) use the TDOA between two spatially separated microphones in a pair to parameterize the source. Reliable estimates are made when long segments of data are used. However, in a lot of applications such as tracking a moving talker, one needs to produce source location estimates at a high rate. This requires relatively short segments of data as the input. Unfortunately, short data segments are impacted severely by reverberation. Therefore, performance of pairwise TDE based techniques degrades greatly under high noise conditions. The generalized cross-correlation (GCC) [23] has been the most commonly used

11 CHAPTER 1. INTRODUCTION 4 pairwise TDE based method. The incorporation of weighting functions are necessary to improve the performance of GCC in noisy and reverberant environments.several weighting functions have been studied, such as: maximum likelihood (ML), smoothed coherence transform (SCOT), the phase transform (PHAT), the eckart filter, and the roth processor [23]. Among them, ML and PHAT outperform the rest in the noise-only case and reverberation case respectively. The ML weighting is theoretically optimal in the scenario of a single-path propagation with uncorrelated noise. However, the presence of reverberation quickly degrades its performance. On the other hand, the PHAT weighting has been shown to be more robust than ML under high reverberation [3]. Therefore, in a highly reverberant environment, as in our testing room, the PHAT has been chosen to improve the performance of the GCC method. This GCC- PHAT method, also known as Cross-power Spectrum Phase (CSP) has been shown to perform well in realistic environments with relatively low noise and reverberations [26]. Another approach to overcome reverberation effects is to average the TDE over a long segment of data (multiple frames). However, this long-data-based approach does not suit our goal of making estimates at a high rate with resulting low latency. Another short-time TDE based method was presented in [4]. The basic idea of this method is to minimize the weighted least-squares function of the phase data. This method was shown to outperform both GCC-ML and GCC-PHAT. However, the trade-off is a need for a complicated search algorithm for the global minimum in the discontinuous surface of the phase, which results in an expensive computational cost and may not justify the performance improvement over GCC-PHAT. 1.2 Methods of Steered Beamforming The microphone array has the capability of focusing on signals generated from a specific location or direction. Such capability is referred to as a beamformer. The beamformer can be used to steer over a region containing the sound source location. The output of it is known as the steered response. When the point (or direction) of focus matches the true source location, the steered response power (SRP) will peak. The simplest beamformer is a delay-and-sum, or conventional beamformer [20].

12 CHAPTER 1. INTRODUCTION 5 Since microphones in the array are spatially separated, the source signal will arrive at each microphone at a different time. The delay-and-sum beamformer will add appropriate time shifts to the microphone signals to compensate for the propagation delays. Once these signals are time-aligned, they are summed together to create a single, enhanced output signal. A more sophisticated beamformer is a filter-and-sum beamformer, in which one applies some kind of adaptive filters to the microphone signals before summing them. The types of filters determine what beamformers are being used. The steered response beamformer, when using the phase transform filter, defines a one-stage method called steered response power using the phase transform, or SRP- PHAT. This method has been shown to be more robust under high noise and reverberation than the two-stage ones [10]. However, it is limited by the large load of computation because the SRP surface to be searched has many local maxima as one will see later in the thesis. The work in this thesis will focus on improving the SRP-PHAT method, tackling the expensive computational cost problem of one-stage algorithms to make it work efficiently in real-time, with the room reverberation time of 450 milliseconds, and using 25-millisecond data frames. 1.3 Scope of this work This thesis can be divided into three parts. The first part will review the two-stage and one-stage localization techniques, i.e. GCC-PHAT and SRP-PHAT. The second part of the thesis will describe novel approaches to reduce the computational cost of SRP-PHAT. The last part will present experimental set-up and results of performance and cost of GCC-PHAT, SRP-PHAT without and with the improvements introduced in this work. In details, the content of each Chapter is as follows. An acoustic model used in our analysis is derived in Chapter 2. Chapter 3 will study the two-stage method GCC-PHAT. The one-stage method SRP-PHAT will be investigated in Chapter 4. Chapter 5 will study a global optimization technique, stochastic region contraction (SRC), to considerably reduce the computational cost of SRP-PHAT. Chapter 6 will

13 CHAPTER 1. INTRODUCTION 6 investigate another method using the same idea of region contraction, namely coarseto-fine region contraction (CFRC). The computational costs of SRP-PHAT is derived in Chapter 7. Experimental conditions and results will be shown in Chapter 8. Finally, conclusions and ideas of future work will be drawn in Chapter 9.

14 Chapter 2 Acoustic Model 2.1 Sound Sources A speech source, either a human talker or a mechanical transducer, is not an ideal spherical radiator. In realistic environments, it possesses directionality and spatial attenuation. Microphones that are facing the talker will receive stronger signals than microphones off to the side or behind the source. For simplicity, we assume that the sound sources can be effectively modeled as point sources. There are some assumptions needed to be made in order to make the model simple enough to solve: 1. The source emits spherical sound waves. We do not incorporate the complex radiation patterns of the human head models. 2. The medium is homogeneous. This guarantees that the speed of sound, c is constant everywhere. In other words, the acoustic propagation is non-refractive. 3. The medium is lossless. This ensures that the medium does not absorb energy from the propagating waves. Note that the speed of sound, c can change from one experiment to another as the temperature changes, but it does not change during the course of a single experiment. 7

15 CHAPTER 2. ACOUSTIC MODEL Multi-Path Propagation of Sound Waves Our environment is an acoustic enclosure (a room), hence the propagation of sound waves is interfered by objects, such as: walls, furniture and people. This interference creates reverberation, or multi-path propagation of the waves. Reverberation could severely effect the performance of many processes done on the microphone array. Therefore, it must be incorporated into the acoustic model to best cope with the realistic conditions. Let h ( d m, d s, t) denote the room impulse response for both direct-path and reflected paths from the sound source at d s to microphone m at location d m. Let v( d s, t) be the response describing the characteristics of microphone m. Since the position and orientation of microphone m are known and fixed, this response function only depends on the source location d s. The microphone signal at microphone m can be modeled as follow: x m (t) = s(t) h ( d m, d s, t) v( d s, t) + n m (t) (2.1) where s(t) is the source signal, n m (t) is the noise corresponding to the m th -channel, and denotes linear convolution. It is common to assume that n m (t) is uncorrelated to the s(t). The impulse response from the source-output to the microphone-output is the the convolution of h ( d m, d s, t) and v( d s, t). This impulse response only depends on the source location if we take the view that microphone m is located at a fixed known point forever. Denote this response by h( d s, t), Equation 2.1 becomes, x m (t) = s(t) h( d s, t) + n m (t) (2.2) This equation completely describes the signal received at microphone m where the reverberant channel s impulse response and uncorrelated noise are taken into account.

16 CHAPTER 2. ACOUSTIC MODEL Near Field and Far Field When the distance from the sound source to the array is much larger than the array s size, this situation is called the far field situation. In this case, the sound waves appear to be planar when reaching the array. On the other hand, when the distance from the source to the array is about the same or smaller than its size, it is the near field condition. 2.4 Direction of Arrival In the near field situation, for an M-element microphone array, there are M directions of arrival (DOA s). Each one of them is the direct path from the microphone to the source. Mathematically, they can be defined by a point on the unit vector sphere, D m = d m d s d m d s (2.3) where m = 1, 2,..., M Source d s D 1 D 2 D 3 D 4 d 1 d 2 d 3 d 4 Microphone Figure 2.1: DOA s of 4 microphones in an array for the near field case In the far field case, all microphones in the array have the same DOA, which is commonly chosen as the path from the origin of the array to the source. Denoting O

17 Θ CHAPTER 2. ACOUSTIC MODEL 10 as the origin of the array in the coordinate system, the DOA is defined as, D O = d O d s d O d s (2.4) The orientation of the DOA can be defined by the standard azimuth angle θ and the z Microphones D O x Φ O y Figure 2.2: DOA in the far field case with azimuth angle θ and elevation angle φ elevation angle φ: D o = cos φ sin θ cos φ cos θ sin θ (2.5) In talker localization problems, in the far field case, the distance from the source to the array, or range, cannot be determined. The DOA is the only spatial information about the source. In summary, for our case, we would like to locate a single talker in a room. Hence, the acoustic model for our work has the following, 1. A sound source is effectively modeled as a point source. 2. The near field case is applied, where we would like to estimate the source location in 3D (x, y, z-coordinates).

18 CHAPTER 2. ACOUSTIC MODEL The speed of sound,c is constant during the course of an experiment. 4. Reverberation effects and noise are taken into account to best cope with realistic conditions. 5. Location techniques are tested with real data from human talkers.

19 Chapter 3 Generalized Cross-Correlation (GCC) using the Phase Transform (GCC-PHAT) 3.1 Generalized Cross-Correlation (GCC) GCC has been a popular method to determine the time-difference of arrival (TDOA) between two microphones in a pair [23, 26, 36]. Then from multiple TDOA values, one can estimate the source location. Take a 4-element microphone array as an example. Source d s r 1 r 2 r 3 r 4 τ mn = r m - r n c d 1 d 2 d 3 d 4 Microphone Figure 3.1: TDOA between two microphones If the distance from microphone m to the source is r m (m = 1, 2, 3, 4), the time delay 12

20 CHAPTER 3. GCC-PHAT 13 (traveling time) of the signal from the source to that microphone is, τ m = r m c (3.1) Then the time-difference of arrival,tdoa between two microphones m and n can be defined as, τ mn = τ m τ n = r m r n c (3.2) From this relation between the TDOA and the distances from the source to the microphones, r m, one can estimate the source location from multiple TDOA s using several techniques, such as: linear intersection, spherical interpolation, etc.[37, 28, 18]. Now we know how the source location can be estimated from the TDOA s. The next question is, what is the GCC and how do we define the TDOA from the GCC? This will be explained in the following section. 3.2 Derivation of the GCC Recall Equation 2.2 from Chapter 2 for a microphone signal at microphone k: x k (t) = s(t) h( d s, t) + n k (t) (3.3) Consider a signal at another microphone l: x l (t) = s(t τ kl ) h( d s, t) + n l (t) (3.4) Note that to be accurate, we would have to include the time delay τ k into the source signal s(t), i.e. s(t τ k ) in Equation 3.3 to show the signal received at microphone k is a delayed version of the source signal. However, for simplicity, here we normalized so that the time delay from the source to microphone k, τ k is 0. In other words, we are only concerned with the relative time-difference of arrival,τ kl between these two microphones k and l. The cross-correlation of these two microphone signals will show a peak at the

21 CHAPTER 3. GCC-PHAT 14 time-lag where these two shifted signals are aligned, corresponding to the TDOA, τ kl. The cross-correlation of x k (t) and x l (t) is defined as, c kl (τ) x k (t)x l (t + τ)dt (3.5) Taking the Fourier Transform of the cross-correlation results in a cross power spectrum, C kl (ω) = c kl (t)e jωτ dτ (3.6) Applying convolution properties of the Fourier Transform for 3.5 when substituting it into 3.6, we have, C kl (ω) = X k (ω)x l (ω) (3.7) where X i (w) is the Fourier Transform of signal x i (t), and * denotes the complex conjugate. The inverse Fourier Transform of 3.7 gives us the cross-correlation function in terms of the Fourier Transform of the microphone signals: c kl (τ) = 1 X k (ω)xl (ω)e jωτ dω (3.8) 2π The generalized cross-correlation (GCC) of x k (t) and x l (t) is the cross-correlation of their two filtered versions. Denoting the Fourier Transforms of these two filters as W k (ω) and W l (ω), we have the GCC, R kl (τ) is defined as, R kl (τ) 1 2π We define a combined weighting function,ψ kl (ω) as (W k (ω)x k (ω))(w l (ω)x l (ω)) e jωτ dω (3.9) Ψ kl (ω) W k (ω)w l (ω). (3.10)

22 CHAPTER 3. GCC-PHAT 15 Substituting 3.10 into 3.9, the GCC becomes R kl (τ) 1 Ψ kl (ω)x k (ω)xl (ω)e jωτ dω (3.11) 2π The TDOA between two microphone k and l is the time lag τ that maximizes the GCC R kl (τ) in the real range limited by the distance between the microphones: ˆτ kl = argmax R kl (τ) (3.12) τ In reality, R kl (τ) has many local maxima thus making it harder to detect the global maximum. The choice of the weighting functions, Ψ kl (ω) would affect the performance of the GCC. 3.3 The Phase Transform (PHAT) It has been shown that the phase transform (PHAT) weighting function is robust in realistic environments [36, 10]even though it is sub-optimal [4] to the maximum likelihood (ML) weighting function which was studied in [23, 3] under reverberant-free conditions. PHAT is defined as follows, 3.4 GCC-PHAT Ψ kl (ω) 1 X k (ω)x l (ω) (3.13) Applying the weighting function PHAT from Equation 3.13 into the expression for GCC in Equation 3.11, the Generalized Cross-Correlation using the Phase Transform (GCC-PHAT) for two microphones k and l is defined, R kl (τ) 1 1 2π X k (ω)xl (ω) X k(ω)xl (ω)e jωτ dω (3.14)

23 CHAPTER 3. GCC-PHAT 16 In an M-microphone array system, there are (M-choose-2) or M (M 1) 2 pairs of microphones. Using GCC-PHAT on any subset Q of these pairings to estimate the TDOA of each pair creates Q TDOA estimates. For each hypothesized point x in 3D-space of the room containing the sound source, true TDOA s can be calculated for that Q pairs of microphones. From the estimated TDOA s, τˆ Q ( x) and the true TDOA s, τ Q ( x), one can establish the root mean square (RMS) error as follows, E RMS ( x) = sqrt{( τˆ Q ( x) τ Q ( x)) 2 } (3.15) And the source location estimate, x s is, 3.5 GCC-PHAT in our system x s = argmin E RMS ( x) (3.16) x A GCC-PHAT based location algorithm, namely LEMSAlg has been used in our realtime system [36]. Our Huge Microphone Array (HMA) system [33] has 512 microphones, and implements 8 simultaneous LEMSAlg locators in real time. In LEMSAlg, we select 16 pairs of microphones per locator manually. From 24 microphones of each locator, we select microphones in groups of three, taking two independent pairs from each group. Also, microphones are selected from orthogonal sections of the array, that is from panels near a corner of the array. Microphone pairs on orthogonal planes have different and complementary sensitivity of their TDOA s to the source direction, and exploiting this effect improves directional discrimination. Details of the GCC-PHAT-based method, LEMSAlg have been studied extensively in [36, 10], and will not be described again in this thesis. Performance of the LEMSAlg is good when the reverberation and noise are relatively low. However, the real-time LEMSAlg implementation uses over 200ms of data giving it long latency. It degrades quickly under high noise and reverberant conditions, which will be shown in Chapter 8.

24 CHAPTER 3. GCC-PHAT 17 The need for reliable location estimations in the presence of high noise and reverberation can be fulfilled by using a one-stage localization method, namely SRP-PHAT. In the next Chapter, we will study SRP-PHAT.

25 Chapter 4 Steered Response Power (SRP) using the Phase Transform (SRP-PHAT) 4.1 Beamformers As we derived in Equation 2.2, the signal x m (t) at microphone m is, x m (t) = s(t) h( d s, t) + n m (t) In an M-microphone array system, the unitarily weighted delay-and-sum beamformer which has been briefly introduced in Chapter 1 can be created by delaying the microphone signals x m (t) with appropriate steering delays, δ m with m = 1, 2,.., M to make them aligned in time, and then summing all these time-aligned signals together. Mathematically, it is defined as follows, y(t, δ 1, δ 2,..., δ M ) m=m m=1 x m (t δ m ) (4.1) 18

26 CHAPTER 4. SRP-PHAT 19 To make the microphone signals time-aligned, the steering delays, δ m can be set to δ m = τ m τ 0 (4.2) where τ m is the time delay from the source to microphone m, and τ 0 is set to the minimum of all the time delays τ i, i = [1, 2,..., M] to make δ m non-negative and hence the system is casual. Now we can express the output of a delay-and-sum beamformer in terms of the source signal, the channel s impulse response and the noise as follows, y(t, δ 1, δ 2,..., δ M ) = s(t) m=m m=1 h( d s, t τ m + τ 0 ) + m=m m=1 n m (t τ m + τ 0 ) (4.3) When an adaptive filter is applied to the delay-and-sum beamformer, a filter-andsum beamformer is achieved. In the frequency domain, a filter-and-sum beamformer output is, Y (ω, δ 1, δ 2,..., δ M ) = m=m m=1 G m (ω)x m (ω)e jωδ m (4.4) where X m (ω) is the Fourier Transform of the microphone signal x m (t), and G m (ω) is the Fourier Transform of the filter. 4.2 Steered Response Power(SRP) In general, the steered response power (SRP) is the output power of a filter-and-sum beamformer when steering the beamformer over all points x in a predefined region. For each point x, it is a function of the steering delays, and in the frequency domain is defined as, P (δ 1,..., δ M ) Y (ω, δ 1,..., δ M )Y (ω, δ 1,..., δ M )dω (4.5)

27 CHAPTER 4. SRP-PHAT 20 Substituting Equation 4.4 into Equation 4.5, we have: P (δ 1,..., δ M ) k=m ( k=1 Rearranging the expression yields, P (δ 1,..., δ M ) k=m k=1 l=m G k (ω)x k (ω)e jωδ k )( l=1 G l (ω)x l (ω)e jωδ l )dω (4.6) l=m (G k (ω)g l (ω))(x k (ω)xl (ω))e jω(δ l δ k ) dω (4.7) l=1 From Equation 4.2, it is easy to see that δ l δ k = τ l τ k (4.8) Inserting 4.8 back into Equation 4.7, we obtain: P (δ 1,..., δ M ) k=m k=1 l=m (G k (ω)g l (ω))(x k (ω)xl (ω))e jω(τ l τ k ) dω (4.9) l=1 Note that the integral converges because in practice, the microphone signals and the filters have finite energy. Hence, the summations can be interchanged with the integral and moved outside of the integral as follows, P (δ 1,..., δ M ) k=m k=1 l=m l=1 Define the combined weighting function, (G k (ω)g l (ω))(x k (ω)x l (ω))e jω(τ l τ k ) dω (4.10) Ψ kl (ω) G k (ω)g l (ω). (4.11) Recall that Equation 3.2 gives us τ l τ k = τ lk (4.12)

28 CHAPTER 4. SRP-PHAT 21 Substituting the expressions in equation 4.11 and 4.12 back into 4.10 gives us the expression for the SRP: P (δ 1,..., δ M ) k=m k=1 l=m l=1 Now we recall the GCC from Equation 3.11: Ψ kl (ω)x k (ω)x l (ω)e jωτ lk dω (4.13) R kl (τ) 1 Ψ kl (ω)x k (ω)xl (ω)e jωτ dω 2π It can easily be seen that the SRP and the GCC have almost identical expressions, except that the SRP is summed over all pairs of microphones, and there is a constant offset of 2π. Therefore, this provides us to a means to calculate the steered response power (SRP) of a microphone array by summing the generalized cross-correlation (GCC) of all pairs of microphones in the array (here the constant offset is ignored since it is just a scalar). 4.3 SRP-PHAT Similar to the idea of GCC-PHAT, when the weighting function phase transform, PHAT is applied to the steered response power (SRP), we obtain the steered response power using the phase transform (SRP-PHAT). The SRP-PHAT for each point x in the space is defined as follows, P (δ 1,..., δ M ) k=m k=1 l=m l=1 1 X k (ω)x l (ω) X k(ω)x l (ω)e jωτ lk dω (4.14) 4.4 SRP-PHAT as a Source Localization Method Since the GCC between microphone k and microphone l is the same as the GCC between microphone l and k, the elements summing to form the above SRP-PHAT functional form a symmetric matrix with fixed energy terms on the diagonal. Therefore, the part of the SRP-PHAT that changes with x is either the upper-part or

29 CHAPTER 4. SRP-PHAT 22 lower-part of the matrix. In other words, for a particular point x in the space, the part of the SRP-PHAT in Equation 4.14 that changes with x can be computed by summing the GCC of not all pairs of the M-microphone array, but only a subset Q of the pairs, where Q = [k, l], k [1,.., M 1], M l > k, P (δ 1,..., δ M ) k=m l=m k=1 l=k+1 1 X k (ω)x l (ω) X k(ω)x l (ω)e jωτ lk dω (4.15) As briefly described in Chapter 1, to find the source locations, we steer the beamformer over all possible points in a focal volume containing the source. The points that give the maximum weighted output power (SRP-PHAT) of the beamformer will be the source locations. For a single source, the location estimate x s is, x s = argmax P ( x) (4.16) x where P ( x) is the SRP-PHAT at point x and is defined in equation Note that the calculation of any particular point of P ( x) will be called a functional evaluation(fe). The hypothesis is that the SRP-PHAT will peak at the actual source location even under very noisy and highly reverberant conditions. However, the problem with SRP- PHAT is its expensive computational cost because the search space has many local maxima, and thus computationally intensive grid-search methods have been required to find the global maximum. This thesis will provide the solutions to this problem, cutting down the computational cost to less than one percent of the full grid-search and hence, making the SRP-PHAT practical in real-time. The solutions presented in this work are: stochastic region contraction (SRC) and the coarse-to-fine region contraction (CFRC). The next Chapter will be devoted to the first solution, SRC.

30 Chapter 5 Stochastic Region Contraction (SRC) 5.1 Overview Stochastic region contraction (SRC) was first introduced by Berger and Silverman [2] as a technique to optimize a microphone array s configuration. An optimization problem for placements and gains of microphones in an array can be solved by minimizing the power spectral dispersion function (PSD) at its worst case over some given noise field. However, the PSD surface exhibits many local maxima and minima (hundreds of thousands). Hence, common optimization techniques like gradient descent or simplex search cannot be used. The nonlinear optimization technique, SRC has been shown to be robust in finding the global optimum in this case. Observing a similarity between this and the global maximum finding problem existed in SRP-PHAT, we propose using SRC to optimize the searching in SRP-PHAT, thus making SRP-PHAT more efficient and practical for real-time use. 5.2 SRC in SRP-PHAT The basic idea of the SRC algorithm is, given an initial rectangular search volume containing the desired global optimum and perhaps many local maxima or minima, 23

31 CHAPTER 5. STOCHASTIC REGION CONTRACTION (SRC) 24 gradually, in an iterative process, contract the original volume until a sufficiently small subvolume is reached in which the global optimum is trapped (the uncertainty voxel (volume V u ). The contraction operation on iteration i is based on a stochastic exploration of the SRP-PHAT, P ( x) functional in the current subvolume. Figure 5.1: 2D example of SRC: The surface is P ( x). j is the iteration index. The rectangular regions show the contracting search regions The first step is to determine the number of random points, J 0, that need to be evaluated to ensure that one or more is likely to be in the volume, V peak, of higher values (than the rest of the focal volume) surrounding the global maximum of P ( x). see, e.g. Figure 5.1. Unfortunately, V peak is not easy to determine and in our data changes substantially as the source is farther from the microphones. However, if V room is the original search volume, we can estimate the number of fe s needed to ensure that the probability of missing V peak altogether is less than a given percent. is, The probability of a random point hitting V peak in the initial search volume V room Hence, the probability of a random point missing V peak is, P(hit V peak ) = V peak V room (5.1) P(miss V peak ) = 1 V peak V room (5.2)

32 CHAPTER 5. STOCHASTIC REGION CONTRACTION (SRC) 25 The event of throwing a random point is independent from one to another. Therefore, the probability of throwing n random points missing V peak is, P(n-misses) = (1 V peak V room ) n (5.3) Taking the logarithm of both sides, separating n to one side, we have, n = log P(n-misses) log(1 V peak V room ) (5.4) From this relationship between the probability of throwing n random points missing V peak and the ratio between V peak and V room, we can determine how many random points,n needed to throw to ensure that the missing probability, P(n-misses) is negligible (substantially small in a realistic sense). We calculate n for different values of missing probability and ratio V peak V room as shown in follows, V peak V room P(miss V peak ) 1% , % , % ,099 Table 5.1: Number of fe s required for three probabilities of missing V peak and four values of the ratio V peak V room. In our case, V room = 400cm 100cm 600cm = cm 3, and from preliminary experimental results, V peak = cm 3 for a low SNR situation. This makes V peak V room 0.005, from Table 5.1 implying that a value of J 0 = 3000 will err by missing the peak volume less than 0.1% of the time. We define J i as the number of random points evaluated for iteration i, N i, the number of points used to define the new source volume, V i+1, having a rectangular boundary vector B i+1 [x max (i+1)x min (i+1)y max (i+1)y min (i+1)z max (i+1)z min (i+ 1)], and I the number of iterations, and F E i the total number of fe s evaluated as of iteration i,with Φ the maximum number of fe s allowed to be computed. We have found it very effective for our problem to set a fixed value for N i based

33 CHAPTER 5. STOCHASTIC REGION CONTRACTION (SRC) 26 on experimentation as shown for our problem in Figure 5.2. Here we see that a value of N = 100 gives the best results with the lowest cost for all cases of tested SNR s. That is, let N i N = Source 2 Source 4 Accuracy (Percent) Source 1 Source N Figure 5.2: Performance of SRC-I as a function of parameter N for four different source locations is, As J 0 and N i defined as above, the SRC algorithm for finding the global maximum 1. Initialize: i = 0, J 0 = 3000, N i = 100 and V 0 = V room. 2. Evaluate: P ( x) for J i points. 3. Sort: the best N i J i points 4. Contract: the search region to the smaller region V i+1, B i+1 that contains these N i points. 5. Test: IF: V i+1 < V u, or F E i > Φ and V i+1 < T 1 V u, where T 1 is a parameter (about 10); determine x s (i ), I = i, STOP, KEEP RESULT. ELSE IF F E i > Φ, STOP, DISCARD RESULT. ELSE: Among the N i points, keep a subset G i points that have values greater than the mean, µ i of the N i points. 6. Evaluate: J i+1 new random points in V i Form: the set of the N i+1 as the union of G i and the best N i+1 G i points from the J i+1 just evaluated. This gives N i+1 high points for iteration i Iterate: i = i + 1. GO TO STEP 4.

34 CHAPTER 5. STOCHASTIC REGION CONTRACTION (SRC) 27 Depending on how J i with i 1 is defined, we have three variants of the SRC algorithm as follows, SRC-I. Let J i be that number of random fe s needed find N i G i points greater than µ i. Guarantees monotone increasing µ i. Use finite value of Φ. SRC-II. Let J i be that number of random fe s needed to find N i G i points higher than the minimum of the full set N i. µ i increases for almost all iterations. Use finite value of Φ. SRC-III. Fix Ji = J. Keep the highest N i G i points for each iteration. Does not guarantee monotone increasing µ i. Set Φ. In the next Chapter, another method, coarse-to-fine region contraction (CFRC) using the same idea of region contraction will be presented. However, instead of evaluating random points, we evaluate on a coarse grid iterating to succeedingly finer grids to find the global maximum.

35 Chapter 6 Coarse-to-Fine Region Contraction (CFRC) Similar to stochastic region contraction (SRC), coarse-to-fine region contraction (CFRC) method also uses the idea of region contraction, or contracting the search volume smaller and smaller until the global maximum is captured. However, in CFRC, the contraction operation of iteration i is based on a sub-grid search of the P ( x) functional in the current sub-volume. SRP surface has many local maxima and minima (SNR ~ 1.9dB) i=4 i=5 i=2 i=1 i=3 Figure 6.1: 2D example of CFRC: The surface P ( x) has many local maxima. j is the iteration index. The rectangular regions show the contracting search regions. 28

36 CHAPTER 6. COARSE-TO-FINE REGION CONTRACTION (CFRC) 29 Analogous to SRC, the first step of CFRC is to determine the number of initial grid points, J 0, that need to be evaluated to guarantee that at least a grid-point lies in V peak. Again, from our preliminary experimental data for a low SNR case, V peak is determined to be 28cm 10cm 30cm. Recall that our V room is 400cm 100cm 600cm. Hence, to have at least a grid-point in V peak, we have to evaluate grid-points in x, = 10 grid-points in y, and = 20 grid-points in z, implying J 0 = = 3000 equally spaced grid points in 3D. We define J i as the number of grid points evaluated for iteration i. N i, V i+1, B i+1, F E i and Φ are defined the same way as in SRC. CFRC can be implemented in many ways, the difference is usually determined by the methods used to update J i and V i for each iteration. The general algorithm is, 1. Initialize iteration: i = 0, J 0, N 0, V 0 = V room 2. Evaluate: P ( x) for J 0 points. 3. Sort: the best N 1 J 0 points. 4. Contract: the search region to the smaller region V i+1, B i+1 that contains these N i points. 5. Test: IF: V i+1 < V u, and F E i < Φ; determine x s, STOP, KEEP RESULT. ELSE IF F E i > Φ, STOP, DISCARD RESULT. ELSE: From the N i points, keep a subset G i points that have values the mean, µ i of the N i points. 6. Evaluate: J i+1 new grid points in V i Sort: to obtain the best N i+1 points from the union of G i and J i+1 points just evaluated. 8. Iterate: i = i + 1. GO TO STEP 4. For our data, we chose the simplest algorithm to select J i and N i. We made each of these constant, although for iterations i 1, J i J J 0. J was selected to give perfect performance with lowest cost, J = = 750 points. We ran the algorithm for various values of N, as shown in Figure 6.2. An N = 100 turned out to preserve perfect performance at a low cost. To see how these two newly introduced algorithms, SRC and CFRC, improve the SRP-PHAT, we would like to see their significantly reduced computational cost as

37 CHAPTER 6. COARSE-TO-FINE REGION CONTRACTION (CFRC) Accuracy (%) Source 4 Source 3 Source 2 Source N Figure 6.2: Performance of CFRC as a function of parameter N for four different source locations well as their correct performance relatively to a full grid-search. In the next Chapter, we will calculate the computational cost of the SRP-PHAT using a full grid-search as well as SRC and CFRC.

38 Chapter 7 Computational Cost 7.1 Signal-Processing Cost SRP-PHAT requires frequency-domain processing to do the phase transforms. Denote M as the number of microphones used in a locator, the computation of the P ( x) requires Q = M(M 1) 2 phase transforms. For a DFT size of L, counting additions (ADD s) and multiplications (MULT s) as separate arithmetic operations, an FFT (radix 2) takes L 2 log 2 L complex MULTs and L log 2 L complex ADDs. Note that, a complex MULT takes 4 real MULTs and 2 real ADDs, while a complex ADD takes 2 real ADDs. Hence, a real FFT costs 5L log 2 L operations. 1. DFT: We do an FFT for M microphones, hence the total DFT cost is M 5L log 2 L. 2. Spectral Processing: For each pair of microphones, we do the followings, Cross-Power Spectrum: A complex MULT for L-point cross-power spectrum, or (4 + 2)L = 6L operations. Phase Transform: A L-point division over the magnitude of the cross-power spectrum, which costs L operations. Therefore, the total spectral processing cost for Q pairs of microphones is 7QL. 3. IDFT: Q pairs of microphones require Q real IFFT s or 5QL log 2 L. 31

39 CHAPTER 7. COMPUTATIONAL COST 32 In the current HMA system we use M = 24 microphones for a locator, implying Q = 276. A reasonable compromise among sufficient data, worst-case TDOA, and potential movement of the source is L = This totals to ops/frame or 37.7 mo/f (million operations per frame). 7.2 Cost of interpolation It has been shown that doing some interpolation on the spectral samples makes the SRP-PHAT surface much smoother especially near the important peaks, i.e. see Chapter 8. Different interpolation techniques have been tested, such as: filter interpolation, cubic splines. Cubic splines gives the same accuracy that filter interpolation offered, yet is 4 times cheaper. Hence, we chose cubic interpolation in our work. The cost for an 800-to-8000 cubic interpolation is about 800 (71MULT s + 38ADDs) Q = ops/f. 7.3 Cost per functional evaluation, fe For each point x (fe), the following steps are required: 1. Obtain the M(= 24) Euclidean distances, d n i ( x), from x to each microphone. Cost: 3 mults, 5 adds, 1 square root( 12ops). Cost: 20ops/mic or 480ops/fe. 2. Determine Q(= 276) TDOA s. τij( x) n = (d n i ( x) d n j ( x))c where C is the inverse off the speed of sound. Cost: 3ops/pair or 728ops/fe. 3. Sum up the PHAT values. Requires a multiplication, addition and truncation to discretize each TDOA value, a memory access and one more addition for the sum itself. Cost: 5ops/pair or 1380ops/fe. Thus we need = 2588 ops/fe.

40 CHAPTER 7. COMPUTATIONAL COST Cost of a full grid-search The search volume in our case is 400cm 100cm 600cm. To search the entire focal volume at a 1cm resolution, implying V u = 1cm requires fe s. Therefore, the cost of the full grid-search is, Grid-Search Cost = (2588) = 62, 112mo/frame (7.1) Note that the signal-processing and interpolation cost are tiny in comparison to that of the grid-search. In our real-time system, we ideally would like to make a decision each 25.6ms, implying a 2.43T F machine! Obviously, the grid-search method is not a practical way of doing SRP-PHAT in real-time for reasonable hardware cost. 7.5 Cost of SRC and CFRC SRP-PHAT using SRC or CFRC requires the same signal processing and interpolation, but significantly reduces the number of fe s needed to find the global maximum. As will be shown experimentally, the number of fe s varies with SNR in a typical room. SRC and CFRC do require a few small additional computations. These are 1) determine each random point (in the case of SRC) which is about 21 ops/fe, or gridpoint (CFRC) which is 12 ops/fe and 2) sort to get the best N i+1 Gi points, which has negligible cost. Neither of these additions really affects the computational load appreciably. As one will see in the next Chapter, SRC averaged 0.073% and CFRC averaged 0.057% of the full grid-search cost, which is more than three orders of magnitude cheaper. This is really a big speed-up for SRP-PHAT, and only requires a 2GF on average to implement SRP-PHAT in real-time.

41 Chapter 8 Experiments and Results 8.1 The Huge Microphone Array (HMA) The Huge Microphone Array (HMA) is a product of a collaborative project between Laboratory for Engineering Man/Machine Systems (LEMS) at Brown University and the Center for Computer Aids for Industrial Productivity (CAIP) at Rutgers University, which started in February It is one of the largest microphone arrays that have ever been built and used in real time. Other well-known microphone array projects include: LOUD at MIT [42], Beamed Microphone Array at Canada National Research Council Computational Video Group and the Acoustic and Signal Processing Group [12]. The HMA array can support up to 512 microphones in real-time. The hardware has been fully reported in [32, 33], the software and some of the algorithms in [34, 35]. The HMA has 96 floating point-dsp s that can perform a total computation rate of approximately 4GF. 8.2 Experimental System The HMA system and room with a T 60 = 0.45s that we utilized in our experiments have been described in [36]. A human talker, approximately facing the locator microphones (24 microphones selected from panels H,I,J,K), repeated the first four seconds 34

42 CHAPTER 8. EXPERIMENTS AND RESULTS 35 of the rainbow passage from four locations as shown in Figure 8.1 with the distances and SNR s indicated. 4.7M 6.5M Panel C Panel D Panel E Panel F Panel B Focal Area Panel A Source 4 Avg. Dist=4.25m SNR~1.9dB Source 3 Avg. Dist=3.47m SNR~3.1dB Source 2 Avg. Dist=2.76m SNR~5.7dB Z Source 1 Avg. Dist=2.14m SNR~7.9dB Panel N X Panel M Panel L Panel K Panel J Panel G Panel H Panel I Figure 8.1: Top View of the Array, Showing Source Locations and Panels (Locator uses Microphones on Panels H, I, J, K) The arrows indicate the orientation of the talkers and the SNR s are for background noise only. 8.3 Preliminary Processing for SRP-PHAT data Interpolation The problem with the SRP-PHAT surface is it is very discontinuous near the peak, i.e. see Figure 8.2, and applying a global maximum search directly on it without

43 CHAPTER 8. EXPERIMENTS AND RESULTS 36 any pre-processing procedure to smooth the surface would result many close variants. Therefore, some interpolation is needed in order to make the surface smoother in this region. Figure 8.2: SRP-PHAT surface without interpolation We implemented two 10-to-1 interpolation techniques, i.e. lowpass FIR filter interpolation and cubic splines, on the spectral samples. The 131-long lowpass FIR filter interpolates on 824 original samples to get 8001 interpolated points. The interpolated SRP-PHAT surface turned out to be much smoother as one can see in Figure 8.3. Figure 8.3: SRP-PHAT surface with filter interpolation

44 CHAPTER 8. EXPERIMENTS AND RESULTS 37 The cubic spline interpolation also offers the same smoothing effect as the filter one, see Figure 8.4. However, its computational cost is 4 times cheaper than doing the 131-long FIR filter, hence we chose cubic splines in our work. Figure 8.4: SRP-PHAT surface with cubic interpolation Simple Energy Discriminator SRP-PHAT using a 1cm-full grid-search is virtually guaranteed to give a peak estimate off from the true global peak by less than 1cm whenever SRP-PHAT works correctly. Therefore, when SRP-PHAT gives reliable estimates, we would like to compare the performance of SRP-PHAT using a full grid-search with using SRC and CFRC. This comparison would be relevant to show the effectiveness of SRC and CFRC. The question is, how would one know when SRP-PHAT gives reliable estimates? From our observations, SRP-PHAT only fails on frames containing a large portion of stop or silence data. In other words, whenever the speech information in the frame is present sufficiently, SRP-PHAT seems to always make good estimates. Therefore, if we can somehow determine good frames, where the speech is present, we can confidently compare the performance of SRP-PHAT using the full grid-search with using SRC and CFRC. Speech frames normally tend to have higher energy than non-speech frames although low energy events are difficult to detect in highly noisy conditions. Hence, a

45 CHAPTER 8. EXPERIMENTS AND RESULTS 38 discriminator based on energy can be used to detect speech/non-speech frames. In our work, we created a simple energy discriminator as follows, 1. Initialize: Energy threshold, θ to background energy (in our case 30dB). 2. Determine: Energy of all frames in the processed data, normalizing highest frame to 0dB. 3. Select: a set of frames, G, that have energy greater than θ. 4. Calculate: the x, y, z-standard deviations, σ x, σ y, σ z of the source location estimates given by SRP-PHAT using a full grid-search on G frames. 5. Test: IF σ x, σ z 0.05m &σ y 0.1m, STOP. Determine Ĝ = G, and ˆθ = θ. KEEP Ĝ, ˆθ. 6. Increase: threshold θ to a higher value. 7. Iterate: Go to Step 2. After implementing this over 4 positions with different SNR s as shown in Figure 8.1, we obtained ˆθ = 7dB giving us significantly small values of σ x, σ y, σ z,( see Figures 8.5, 8.6, 8.7, 8.8). Hence, we use θ = 7dB as the threshold in our energy discriminator. In other words, all frames G that have energy E G 7dB are marked as good frames for the SRP-PHAT. We will only compare the performance of SRC and CFRC with the full grid-search on those good frames. Also, since we cannot do better than a full grid-search, performance will be listed as 100 % if the CFRC and SRC implementation achieve the global maxima everywhere grid-search did.

46 CHAPTER 8. EXPERIMENTS AND RESULTS 39 SNR=1.9 db 7dB x value y value z value dB Figure 8.5: The simple energy discriminator for Source 4 SNR=3.47dB x value y value z value dB Figure 8.6: The simple energy discriminator for Source 3

47 CHAPTER 8. EXPERIMENTS AND RESULTS 40 SNR=5.7 db 7dB x value y value z value dB Figure 8.7: The simple energy discriminator for Source 2 SNR=7.9 db 7dB x value y value z value dB Figure 8.8: The simple energy discriminator for Source 1

48 CHAPTER 8. EXPERIMENTS AND RESULTS Results Results are given in Table 8.1 for accuracy and the average number of fe s used for LEMSalg and grid search on all frames, as well as for CFRC and SRC on the good frames. Performances for LEMSalg and the grid search are absolute and show overall correctness (over all frames). Frames of 102.4ms, each advancing 25.6ms within the speech were used for testing, and an estimate was considered an error if it were either off by more than 5cm in x or z or 10cm in y, the vertical dimension. Algorithm Source 1 Source 2 Source 3 Source 4 SNR 7.9dB 5.7dB 3.47dB 1.9dB % Corr. # fe s % Corr. # fe s %Corr. # fe s % Corr # fe s LEMSalg Grid Search CFRC 100 9, , , ,723 SRC - I 100 7, , , ,601 SRC - II 100 7, , , ,304 SRC - III , , , ,219 Table 8.1: Performance of LEMSalg and SRP-PHAT using full grid search over all frames; CFRC and three SRC parameterizations over good frames for four different locations. The performance of SRP-PHAT using CFRC and three variants of SRC, as well as their costs relative to the full grid search will be visualized in Figures 8.9 and 8.10,

49 CHAPTER 8. EXPERIMENTS AND RESULTS 42 Performance of CFRC & SRC relative to Grid Search % Correct Source 1 Source 2 Source 3 Source 4 CFRC SRC-I SRC-II SRC-III Figure 8.9: Performance of SRP-PHAT using CFRC and three SRC parameterizations relative to grid-search Computational cost of CFRC & SRC relative to Grid Search Maximal cost: 0.22%! 1 cm SRP-PHAT Grid Search (%) Source 1 Source 2 Source 3 Source 4 CFRC SRC-I SRC-II SRC-III Figure 8.10: Cost of SRP-PHAT using CFRC and three SRC parameterizations relative to grid-search

50 Chapter 9 Conclusions and Future Work 9.1 Conclusions We have verified here that SRP-PHAT is superior, especially under higher noise conditions, to a less costly, real-time, two-stage location-estimation algorithm, LEMSalg. As we can obtain from Table 8.1, when the SNR is low (SNR = 3.47dB, 1.9dB), performance of LEMSalg degraded severely and averaged 39.5% over those high noise conditions. Meanwhile, SRP-PHAT using full grid-search has an average percent correct of 77.5%, nearly twice that of LEMSalg. From Table 8.1, it is clear that SRC and CFRC have perfect performance relative to a full grid-search. Also, by using SRC and CFRC, we can reduce SRP-PHAT s large computational cost by more than three orders of magnitude. Under our worst-case conditions, SNR 1.9dB, using CFRC we can get the full accuracy of SRP-PHAT with a computational advantage of 1217:1 to the full grid search. If the conditions are less noisy such as our best case of SNR 7.9dB, then CFRC gives full accuracy with a computational advantage of 2631:1, or only needing about a 1.85GF machine for 40 frames/second in real-time. On the other hand, using SRC-I gives a computational advantage of 714:1 to the full grid search under SNR 1.9dB, but up to 3206:1 under SNR 7.9dB. It is interesting to see that CFRC is less costly, 63% of the cost of SRC-I (the lowest cost SRC variant) under noisy conditions (SNR 1.9dB, 3.47dB) but 36% 43

51 CHAPTER 9. CONCLUSIONS AND FUTURE WORK 44 more expensive under less noisy cases (SNR 7.9dB, 5.7dB). Hence, when the noise is relatively high, using CFRC would have an advantage over SRC. On the other hand, SRC would cost less when the noise is low. Intuitively, this can be explained as follows. When the noise is high, volume V peak is reduced and there are more nearby local maxima. Thus the probability of getting a point in the true V peak is lower for SRC, so more fe s need to be evaluated to have correct convergence. On the other hand, when the noise is lower, the stochastic method has higher probability of hitting the enlarged V peak and fewer fe s need to be evaluated. If the range of the noise is in some sense bounded and known a priori then selecting correct parameters for CFRC seems to be the preferred choice. We base this on our particular data wherein over the four different SNR s, CFRC averaged 0.057% of the full grid-search cost while SRC-I averaged 0.073%. In summary, this thesis has presented two global optimization techniques: stochastic region contraction (SRC) and coarse-to-fine region contraction (CFRC) to reduce the computational cost of SRP-PHAT by three orders of magnitude relative to a full grid-search, thus making SRP-PHAT practical in real-time. Both these techniques have been shown to preserve perfectly superior performance of SRP-PHAT over the fast, two-stage method, LEMSalg that is implemented real-time in our system. 9.2 Future Work The scope of this thesis only deals with the single talker case. It is possible to extend the two newly introduced techniques, SRC and CFRC to the multiple talkers case, i.e. n talkers. This can be done by dividing the search volume into a suitable number of regions, V n i, during the process and doing SRC or CFRC over those local search volumes V n i. The main task is to set the boundaries for those local regions within the process. The magnitudes of n global maxima in the space must be higher than the noise level, or local maxima. If we can define a threshold to discriminate the global maxima s height from the local ones in some optimal sense, it would be more feasible to define the boundaries.

52 CHAPTER 9. CONCLUSIONS AND FUTURE WORK 45 Figure 9.1: 2D illustration of two talker-case (n=2) Also, one would ask is there any other weighting function more robust than the phase transform (PHAT) that can be applied to the steered response power (SRP)? Brandstein [3] has shown that the pitch-based weighting when applied to the generalized cross-correlation (GCC) is more robust than the phase transform (PHAT) under noise-only conditions and offers the same efficiency as the PHAT under reverberations. In addition, it is also more robust than the maximum likelihood (ML) under reverberant conditions, and performs better than the PHAT under noise-only situations. It would be interesting to see if this weighting function when applied to SRP will make any improvement.

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Airo Interantional Research Journal September, 2013 Volume II, ISSN: Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction

More information

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques Antennas and Propagation : Array Signal Processing and Parametric Estimation Techniques Introduction Time-domain Signal Processing Fourier spectral analysis Identify important frequency-content of signal

More information

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Joint Position-Pitch Decomposition for Multi-Speaker Tracking Joint Position-Pitch Decomposition for Multi-Speaker Tracking SPSC Laboratory, TU Graz 1 Contents: 1. Microphone Arrays SPSC circular array Beamforming 2. Source Localization Direction of Arrival (DoA)

More information

Approaches for Angle of Arrival Estimation. Wenguang Mao

Approaches for Angle of Arrival Estimation. Wenguang Mao Approaches for Angle of Arrival Estimation Wenguang Mao Angle of Arrival (AoA) Definition: the elevation and azimuth angle of incoming signals Also called direction of arrival (DoA) AoA Estimation Applications:

More information

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2007 EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION Anand Ramamurthy University

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

Sound Source Localization using HRTF database

Sound Source Localization using HRTF database ICCAS June -, KINTEX, Gyeonggi-Do, Korea Sound Source Localization using HRTF database Sungmok Hwang*, Youngjin Park and Younsik Park * Center for Noise and Vibration Control, Dept. of Mech. Eng., KAIST,

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer Michael Brandstein Darren Ward (Eds.) Microphone Arrays Signal Processing Techniques and Applications With 149 Figures Springer Contents Part I. Speech Enhancement 1 Constant Directivity Beamforming Darren

More information

ONE of the most common and robust beamforming algorithms

ONE of the most common and robust beamforming algorithms TECHNICAL NOTE 1 Beamforming algorithms - beamformers Jørgen Grythe, Norsonic AS, Oslo, Norway Abstract Beamforming is the name given to a wide variety of array processing algorithms that focus or steer

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Localization of underwater moving sound source based on time delay estimation using hydrophone array Journal of Physics: Conference Series PAPER OPEN ACCESS Localization of underwater moving sound source based on time delay estimation using hydrophone array To cite this article: S. A. Rahman et al 2016

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

arxiv: v1 [cs.sd] 4 Dec 2018

arxiv: v1 [cs.sd] 4 Dec 2018 LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and

More information

METIS Second Training & Seminar. Smart antenna: Source localization and beamforming

METIS Second Training & Seminar. Smart antenna: Source localization and beamforming METIS Second Training & Seminar Smart antenna: Source localization and beamforming Faculté des sciences de Tunis Unité de traitement et analyse des systèmes haute fréquences Ali Gharsallah Email:ali.gharsallah@fst.rnu.tn

More information

Bluetooth Angle Estimation for Real-Time Locationing

Bluetooth Angle Estimation for Real-Time Locationing Whitepaper Bluetooth Angle Estimation for Real-Time Locationing By Sauli Lehtimäki Senior Software Engineer, Silicon Labs silabs.com Smart. Connected. Energy-Friendly. Bluetooth Angle Estimation for Real-

More information

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION Aviva Atkins, Yuval Ben-Hur, Israel Cohen Department of Electrical Engineering Technion - Israel Institute of Technology Technion City, Haifa

More information

Smart antenna for doa using music and esprit

Smart antenna for doa using music and esprit IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN : 2278-2834 Volume 1, Issue 1 (May-June 2012), PP 12-17 Smart antenna for doa using music and esprit SURAYA MUBEEN 1, DR.A.M.PRASAD

More information

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm Buletinul Ştiinţific al Universităţii "Politehnica" din Timişoara Seria ELECTRONICĂ şi TELECOMUNICAŢII TRANSACTIONS on ELECTRONICS and COMMUNICATIONS Tom 57(71), Fascicola 2, 2012 Adaptive Beamforming

More information

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING ADAPTIVE ANTENNAS TYPES OF BEAMFORMING 1 1- Outlines This chapter will introduce : Essential terminologies for beamforming; BF Demonstrating the function of the complex weights and how the phase and amplitude

More information

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE A MICROPHONE ARRA INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE Daniele Salvati AVIRES lab Dep. of Mathematics and Computer Science, University of Udine, Italy daniele.salvati@uniud.it Sergio Canazza

More information

Ultrasound Beamforming and Image Formation. Jeremy J. Dahl

Ultrasound Beamforming and Image Formation. Jeremy J. Dahl Ultrasound Beamforming and Image Formation Jeremy J. Dahl Overview Ultrasound Concepts Beamforming Image Formation Absorption and TGC Advanced Beamforming Techniques Synthetic Receive Aperture Parallel

More information

Advances in Direction-of-Arrival Estimation

Advances in Direction-of-Arrival Estimation Advances in Direction-of-Arrival Estimation Sathish Chandran Editor ARTECH HOUSE BOSTON LONDON artechhouse.com Contents Preface xvii Acknowledgments xix Overview CHAPTER 1 Antenna Arrays for Direction-of-Arrival

More information

Adaptive Antennas in Wireless Communication Networks

Adaptive Antennas in Wireless Communication Networks Bulgarian Academy of Sciences Adaptive Antennas in Wireless Communication Networks Blagovest Shishkov Institute of Mathematics and Informatics Bulgarian Academy of Sciences 1 introducing myself Blagovest

More information

Reducing comb filtering on different musical instruments using time delay estimation

Reducing comb filtering on different musical instruments using time delay estimation Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering

More information

A FAST CUMULATIVE STEERED RESPONSE POWER FOR MULTIPLE SPEAKER DETECTION AND LOCALIZATION. Youssef Oualil, Friedrich Faubel, Dietrich Klakow

A FAST CUMULATIVE STEERED RESPONSE POWER FOR MULTIPLE SPEAKER DETECTION AND LOCALIZATION. Youssef Oualil, Friedrich Faubel, Dietrich Klakow A FAST CUMULATIVE STEERED RESPONSE POWER FOR MULTIPLE SPEAKER DETECTION AND LOCALIZATION Youssef Oualil, Friedrich Faubel, Dietrich Klaow Spoen Language Systems, Saarland University, Saarbrücen, Germany

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.2 MICROPHONE ARRAY

More information

Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band

Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band 4.1. Introduction The demands for wireless mobile communication are increasing rapidly, and they have become an indispensable part

More information

8 Robust Localization in Reverberant Rooms

8 Robust Localization in Reverberant Rooms 8 Robust Localization in Reverberant Rooms Joseph H. DiBiase!, Harvey F. Silverman!, and Michael S. Brandstein 2 1 Brown University, Providence Rl, USA 2 Harvard University, Cambridge MA, USA Abstract.

More information

Speaker Localization in Noisy Environments Using Steered Response Voice Power

Speaker Localization in Noisy Environments Using Steered Response Voice Power 112 IEEE Transactions on Consumer Electronics, Vol. 61, No. 1, February 2015 Speaker Localization in Noisy Environments Using Steered Response Voice Power Hyeontaek Lim, In-Chul Yoo, Youngkyu Cho, and

More information

Painting with Music. Weijian Zhou

Painting with Music. Weijian Zhou Painting with Music by Weijian Zhou A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical and Computer Engineering

More information

Subband Analysis of Time Delay Estimation in STFT Domain

Subband Analysis of Time Delay Estimation in STFT Domain PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,

More information

Frugal Sensing Spectral Analysis from Power Inequalities

Frugal Sensing Spectral Analysis from Power Inequalities Frugal Sensing Spectral Analysis from Power Inequalities Nikos Sidiropoulos Joint work with Omar Mehanna IEEE SPAWC 2013 Plenary, June 17, 2013, Darmstadt, Germany Wideband Spectrum Sensing (for CR/DSM)

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR BeBeC-2016-S9 BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR Clemens Nau Daimler AG Béla-Barényi-Straße 1, 71063 Sindelfingen, Germany ABSTRACT Physically the conventional beamforming method

More information

A Simple Adaptive First-Order Differential Microphone

A Simple Adaptive First-Order Differential Microphone A Simple Adaptive First-Order Differential Microphone Gary W. Elko Acoustics and Speech Research Department Bell Labs, Lucent Technologies Murray Hill, NJ gwe@research.bell-labs.com 1 Report Documentation

More information

An improved direction of arrival (DOA) estimation algorithm and beam formation algorithm for smart antenna system in multipath environment

An improved direction of arrival (DOA) estimation algorithm and beam formation algorithm for smart antenna system in multipath environment ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations An improved direction of arrival (DOA) estimation algorithm and beam formation

More information

Theory of Telecommunications Networks

Theory of Telecommunications Networks Theory of Telecommunications Networks Anton Čižmár Ján Papaj Department of electronics and multimedia telecommunications CONTENTS Preface... 5 1 Introduction... 6 1.1 Mathematical models for communication

More information

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading ECE 476/ECE 501C/CS 513 - Wireless Communication Systems Winter 2005 Lecture 6: Fading Last lecture: Large scale propagation properties of wireless systems - slowly varying properties that depend primarily

More information

Nonlinear postprocessing for blind speech separation

Nonlinear postprocessing for blind speech separation Nonlinear postprocessing for blind speech separation Dorothea Kolossa and Reinhold Orglmeister 1 TU Berlin, Berlin, Germany, D.Kolossa@ee.tu-berlin.de, WWW home page: http://ntife.ee.tu-berlin.de/personen/kolossa/home.html

More information

Measuring impulse responses containing complete spatial information ABSTRACT

Measuring impulse responses containing complete spatial information ABSTRACT Measuring impulse responses containing complete spatial information Angelo Farina, Paolo Martignon, Andrea Capra, Simone Fontana University of Parma, Industrial Eng. Dept., via delle Scienze 181/A, 43100

More information

Adaptive Systems Homework Assignment 3

Adaptive Systems Homework Assignment 3 Signal Processing and Speech Communication Lab Graz University of Technology Adaptive Systems Homework Assignment 3 The analytical part of your homework (your calculation sheets) as well as the MATLAB

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing Fourth Edition John G. Proakis Department of Electrical and Computer Engineering Northeastern University Boston, Massachusetts Dimitris G. Manolakis MIT Lincoln Laboratory Lexington,

More information

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Udo Klein, Member, IEEE, and TrInh Qu6c VO School of Electrical Engineering, International University,

More information

OFDM and FFT. Cairo University Faculty of Engineering Department of Electronics and Electrical Communications Dr. Karim Ossama Abbas Fall 2010

OFDM and FFT. Cairo University Faculty of Engineering Department of Electronics and Electrical Communications Dr. Karim Ossama Abbas Fall 2010 OFDM and FFT Cairo University Faculty of Engineering Department of Electronics and Electrical Communications Dr. Karim Ossama Abbas Fall 2010 Contents OFDM and wideband communication in time and frequency

More information

Smart antenna technology

Smart antenna technology Smart antenna technology In mobile communication systems, capacity and performance are usually limited by two major impairments. They are multipath and co-channel interference [5]. Multipath is a condition

More information

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY 28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY Timon Zietlow 1, Hussein Hussein 2 and

More information

BORIS KASHENTSEV ESTIMATION OF DOMINANT SOUND SOURCE WITH THREE MICROPHONE ARRAY. Master of Science thesis

BORIS KASHENTSEV ESTIMATION OF DOMINANT SOUND SOURCE WITH THREE MICROPHONE ARRAY. Master of Science thesis BORIS KASHENTSEV ESTIMATION OF DOMINANT SOUND SOURCE WITH THREE MICROPHONE ARRAY Master of Science thesis Examiner: prof. Moncef Gabbouj Examiner and topic approved by the Faculty Council of the Faculty

More information

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido The Discrete Fourier Transform Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido CCC-INAOE Autumn 2015 The Discrete Fourier Transform Fourier analysis is a family of mathematical

More information

Measurement System for Acoustic Absorption Using the Cepstrum Technique. Abstract. 1. Introduction

Measurement System for Acoustic Absorption Using the Cepstrum Technique. Abstract. 1. Introduction The 00 International Congress and Exposition on Noise Control Engineering Dearborn, MI, USA. August 9-, 00 Measurement System for Acoustic Absorption Using the Cepstrum Technique E.R. Green Roush Industries

More information

Comparison of LMS Adaptive Beamforming Techniques in Microphone Arrays

Comparison of LMS Adaptive Beamforming Techniques in Microphone Arrays SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 12, No. 1, February 2015, 1-16 UDC: 621.395.61/.616:621.3.072.9 DOI: 10.2298/SJEE1501001B Comparison of LMS Adaptive Beamforming Techniques in Microphone

More information

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI

More information

Design of FIR Filters

Design of FIR Filters Design of FIR Filters Elena Punskaya www-sigproc.eng.cam.ac.uk/~op205 Some material adapted from courses by Prof. Simon Godsill, Dr. Arnaud Doucet, Dr. Malcolm Macleod and Prof. Peter Rayner 1 FIR as a

More information

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k DSP First, 2e Signal Processing First Lab S-3: Beamforming with Phasors Pre-Lab: Read the Pre-Lab and do all the exercises in the Pre-Lab section prior to attending lab. Verification: The Exercise section

More information

MDPI AG, Kandererstrasse 25, CH-4057 Basel, Switzerland;

MDPI AG, Kandererstrasse 25, CH-4057 Basel, Switzerland; Sensors 2013, 13, 1151-1157; doi:10.3390/s130101151 New Book Received * OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journal/sensors Electronic Warfare Target Location Methods, Second Edition. Edited

More information

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003 CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D

More information

6 Uplink is from the mobile to the base station.

6 Uplink is from the mobile to the base station. It is well known that by using the directional properties of adaptive arrays, the interference from multiple users operating on the same channel as the desired user in a time division multiple access (TDMA)

More information

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,

More information

Sound Processing Technologies for Realistic Sensations in Teleworking

Sound Processing Technologies for Realistic Sensations in Teleworking Sound Processing Technologies for Realistic Sensations in Teleworking Takashi Yazu Makoto Morito In an office environment we usually acquire a large amount of information without any particular effort

More information

Speech Enhancement Using Microphone Arrays

Speech Enhancement Using Microphone Arrays Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Speech Enhancement Using Microphone Arrays International Audio Laboratories Erlangen Prof. Dr. ir. Emanuël A. P. Habets Friedrich-Alexander

More information

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya

More information

Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm

Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm Volume-8, Issue-2, April 2018 International Journal of Engineering and Management Research Page Number: 50-55 Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm Bhupenmewada 1, Prof. Kamal

More information

Multiple Antenna Processing for WiMAX

Multiple Antenna Processing for WiMAX Multiple Antenna Processing for WiMAX Overview Wireless operators face a myriad of obstacles, but fundamental to the performance of any system are the propagation characteristics that restrict delivery

More information

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading ECE 476/ECE 501C/CS 513 - Wireless Communication Systems Winter 2004 Lecture 6: Fading Last lecture: Large scale propagation properties of wireless systems - slowly varying properties that depend primarily

More information

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones Source Counting Ali Pourmohammad, Member, IACSIT Seyed Mohammad Ahadi Abstract In outdoor cases, TDOA-based methods

More information

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21) Ambiguity Function Computation Using Over-Sampled DFT Filter Banks ENNETH P. BENTZ The Aerospace Corporation 5049 Conference Center Dr. Chantilly, VA, USA 90245-469 Abstract: - This paper will demonstrate

More information

Robust direction of arrival estimation

Robust direction of arrival estimation Tuomo Pirinen e-mail: tuomo.pirinen@tut.fi 26th February 2004 ICSI Speech Group Lunch Talk Outline Motivation, background and applications Basics Robustness Misc. results 2 Motivation Page1 3 Motivation

More information

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

Analysis of LMS and NLMS Adaptive Beamforming Algorithms Analysis of LMS and NLMS Adaptive Beamforming Algorithms PG Student.Minal. A. Nemade Dept. of Electronics Engg. Asst. Professor D. G. Ganage Dept. of E&TC Engg. Professor & Head M. B. Mali Dept. of E&TC

More information

Ultrasound Bioinstrumentation. Topic 2 (lecture 3) Beamforming

Ultrasound Bioinstrumentation. Topic 2 (lecture 3) Beamforming Ultrasound Bioinstrumentation Topic 2 (lecture 3) Beamforming Angular Spectrum 2D Fourier transform of aperture Angular spectrum Propagation of Angular Spectrum Propagation as a Linear Spatial Filter Free

More information

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory

More information

Accuracy Estimation of Microwave Holography from Planar Near-Field Measurements

Accuracy Estimation of Microwave Holography from Planar Near-Field Measurements Accuracy Estimation of Microwave Holography from Planar Near-Field Measurements Christopher A. Rose Microwave Instrumentation Technologies River Green Parkway, Suite Duluth, GA 9 Abstract Microwave holography

More information

Effects of Fading Channels on OFDM

Effects of Fading Channels on OFDM IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 9 (September 2012), PP 116-121 Effects of Fading Channels on OFDM Ahmed Alshammari, Saleh Albdran, and Dr. Mohammad

More information

Convention Paper Presented at the 131st Convention 2011 October New York, USA

Convention Paper Presented at the 131st Convention 2011 October New York, USA Audio Engineering Society Convention Paper Presented at the 131st Convention 211 October 2 23 New York, USA This paper was peer-reviewed as a complete manuscript for presentation at this Convention. Additional

More information

Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics

Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics Anthony Badali, Jean-Marc Valin,François Michaud, and Parham Aarabi University of Toronto Dept. of Electrical & Computer

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position Applying the Filtered Back-Projection Method to Extract Signal at Specific Position 1 Chia-Ming Chang and Chun-Hao Peng Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan

More information

Time-of-arrival estimation for blind beamforming

Time-of-arrival estimation for blind beamforming Time-of-arrival estimation for blind beamforming Pasi Pertilä, pasi.pertila (at) tut.fi www.cs.tut.fi/~pertila/ Aki Tinakari, aki.tinakari (at) tut.fi Tampere University of Technology Tampere, Finland

More information

Exploiting a Geometrically Sampled Grid in the SRP-PHAT for Localization Improvement and Power Response Sensitivity Analysis

Exploiting a Geometrically Sampled Grid in the SRP-PHAT for Localization Improvement and Power Response Sensitivity Analysis Exploiting a Geometrically Sampled Grid in the SRP-PHAT for Localization Improvement and Power Response Sensitivity Analysis Daniele Salvati, Carlo Drioli, and Gian Luca Foresti, arxiv:6v4 [cs.sd] 7 Mar

More information

Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface

Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface MEE-2010-2012 Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface Master s Thesis S S V SUMANTH KOTTA BULLI KOTESWARARAO KOMMINENI This thesis is presented

More information

K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH).

K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH). Smart Antenna K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH). ABSTRACT:- One of the most rapidly developing areas of communications is Smart Antenna systems. This paper

More information

S. Ejaz and M. A. Shafiq Faculty of Electronic Engineering Ghulam Ishaq Khan Institute of Engineering Sciences and Technology Topi, N.W.F.

S. Ejaz and M. A. Shafiq Faculty of Electronic Engineering Ghulam Ishaq Khan Institute of Engineering Sciences and Technology Topi, N.W.F. Progress In Electromagnetics Research C, Vol. 14, 11 21, 2010 COMPARISON OF SPECTRAL AND SUBSPACE ALGORITHMS FOR FM SOURCE ESTIMATION S. Ejaz and M. A. Shafiq Faculty of Electronic Engineering Ghulam Ishaq

More information

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco Research Journal of Applied Sciences, Engineering and Technology 8(9): 1132-1138, 2014 DOI:10.19026/raset.8.1077 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Adaptive Wireless. Communications. gl CAMBRIDGE UNIVERSITY PRESS. MIMO Channels and Networks SIDDHARTAN GOVJNDASAMY DANIEL W.

Adaptive Wireless. Communications. gl CAMBRIDGE UNIVERSITY PRESS. MIMO Channels and Networks SIDDHARTAN GOVJNDASAMY DANIEL W. Adaptive Wireless Communications MIMO Channels and Networks DANIEL W. BLISS Arizona State University SIDDHARTAN GOVJNDASAMY Franklin W. Olin College of Engineering, Massachusetts gl CAMBRIDGE UNIVERSITY

More information

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:0.038/nature727 Table of Contents S. Power and Phase Management in the Nanophotonic Phased Array 3 S.2 Nanoantenna Design 6 S.3 Synthesis of Large-Scale Nanophotonic Phased

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Performance Analysis of a 1-bit Feedback Beamforming Algorithm

Performance Analysis of a 1-bit Feedback Beamforming Algorithm Performance Analysis of a 1-bit Feedback Beamforming Algorithm Sherman Ng Mark Johnson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-161

More information

System Identification and CDMA Communication

System Identification and CDMA Communication System Identification and CDMA Communication A (partial) sample report by Nathan A. Goodman Abstract This (sample) report describes theory and simulations associated with a class project on system identification

More information

Performance of Wideband Mobile Channel with Perfect Synchronism BPSK vs QPSK DS-CDMA

Performance of Wideband Mobile Channel with Perfect Synchronism BPSK vs QPSK DS-CDMA Performance of Wideband Mobile Channel with Perfect Synchronism BPSK vs QPSK DS-CDMA By Hamed D. AlSharari College of Engineering, Aljouf University, Sakaka, Aljouf 2014, Kingdom of Saudi Arabia, hamed_100@hotmail.com

More information

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal.

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal. 1 2.1 BASIC CONCEPTS 2.1.1 Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal. 2 Time Scaling. Figure 2.4 Time scaling of a signal. 2.1.2 Classification of Signals

More information

Bayesian Estimation of Tumours in Breasts Using Microwave Imaging

Bayesian Estimation of Tumours in Breasts Using Microwave Imaging Bayesian Estimation of Tumours in Breasts Using Microwave Imaging Aleksandar Jeremic 1, Elham Khosrowshahli 2 1 Department of Electrical & Computer Engineering McMaster University, Hamilton, ON, Canada

More information

Beamforming Techniques for Smart Antenna using Rectangular Array Structure

Beamforming Techniques for Smart Antenna using Rectangular Array Structure International Journal of Electrical and Computer Engineering (IJECE) Vol. 4, No. 2, April 2014, pp. 257~264 ISSN: 2088-8708 257 Beamforming Techniques for Smart Antenna using Rectangular Array Structure

More information

Adaptive Waveforms for Target Class Discrimination

Adaptive Waveforms for Target Class Discrimination Adaptive Waveforms for Target Class Discrimination Jun Hyeong Bae and Nathan A. Goodman Department of Electrical and Computer Engineering University of Arizona 3 E. Speedway Blvd, Tucson, Arizona 857 dolbit@email.arizona.edu;

More information

Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany Audio Engineering Society Convention Paper Presented at the 6th Convention 2004 May 8 Berlin, Germany This convention paper has been reproduced from the author's advance manuscript, without editing, corrections,

More information

Underwater Wideband Source Localization Using the Interference Pattern Matching

Underwater Wideband Source Localization Using the Interference Pattern Matching Underwater Wideband Source Localization Using the Interference Pattern Matching Seung-Yong Chun, Se-Young Kim, Ki-Man Kim Agency for Defense Development, # Hyun-dong, 645-06 Jinhae, Korea Dept. of Radio

More information