A Frequency-Invariant Fixed Beamformer for Speech Enhancement

Size: px

Start display at page:

Download "A Frequency-Invariant Fixed Beamformer for Speech Enhancement"

Erik Williamson
6 years ago
Views:

1 A Frequency-Invariant Fixed Beamformer for Speech Enhancement Rohith Mars, V. G. Reju and Andy W. H. Khong School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. Abstract Fixed beamformers maintain response that is independent of the signal and interference statistics. Frequencyinvariant fixed beamformers can achieve constant beamwidth across all frequencies which results in lower signal distortion and they have lower computation complexity compared to its adaptive counterpart. However, unlike data-dependent beamformers, their sidelobe attenuation is poor with respect to the direction of the interferences. In this paper we propose a method to improve the sidelobe attenuation while retaining the advantages of lower computational complexity and frequency-invariance. This is achieved by introducing frequency-invariant nulls in the interference directions. We also show how the weights for each null direction can be combined with the fixed beamformer to form the effective weights of the proposed beamformer. I. INTRODUCTION Microphone arrays are widely used in teleconferencing systems and hands-free telephony for source localization, speech enhancement and speaker recognition applications 1]- 5]. Beamforming algorithms employing multi-channel microphone arrays have been widely used for speech enhancement. These algorithms can broadly be divided into adaptive and fixed beamforming methods. Among the adaptive methods, the minimum variance distortionless response (MVDR) (also known as the Capon beamformer 6],) linearly constrained minimum variance (LCMV) 7] and generalized sidelobe canceler (GSC) 8] are the most popular. Since adaptive beamformers are data dependent, the beamformer weights have to be continuously updated based on the received signals. Fixed beamformers, on the other hand, represent a class of beamformers which maintain a constant response for a particular look direction independent of the signal and noise/interference statistics. Compared to adaptive beamformers, although they achieve a lower signal-to-noise ratio (SNR), fixed beamformers are attractive in terms of computational complexity and ease of real-time implementation since the filter weights can be pre-calculated and saved in a lookup table. For a fixed array aperture, the bandwidth of a beamformer is inversely proportional to the signal frequency. As such, direct implementation of a fixed beamformer can introduce low-pass filter distortion due to non-uniform gain, particularly for large bandwidth signals such as speech. Frequency-invariant fixed beamformers (FIBs) provide constant frequency response with uniform beamwidth across all frequencies. The design of FIB using different techniques for different microphone array configuration have been studied in great detail. In 9], the frequency-invariance property was obtained using the technique of harmonic nesting. While this approach ensures octave-independent beampattern, the spatial resolution within each octave is frequency dependent. To achieve frequency-invariance within each octave, a combination of harmonic nesting and filter-and-sum beamforming has been proposed in 1]. Both these methods focus on linear arrays. In 11] and 12], FIB design was studied for circular arrays and further extended to concentric circular and spherical arrays. A general FIB design for multi-dimensional arrays using separate primary filters for each array element and a common secondary filter has been proposed in 13]. The design is complex when applied for higher-dimensional arrays since the dilation property of the primary filters is not guaranteed. A relatively simpler design using multidimensional inverse Fourier transforms has been discussed in 14], 15], 16]. Recently, a direct optimization approach for the FIB design using the technique of convex optimization has been proposed in 17], 18]. To achieve higher computational efficiency, closed-form solutions for the FIB design using the least squares and eigenfilter methods have been discussed in 19], 2] and 21]. The inherent disadvantage of FIB is the inadequate sidelobe attenuation, which limits its applications for interference cancellation. In this paper, we achieve good sidelobe attenuation by combining multiple frequency-invariant null (FIN) beamformers along with the fixed beamformer. We show how the filter coefficients corresponding to the frequencyinvariant fixed and null beamformers are combined to obtain the effective filter coefficients. The outline of this paper is as follows: Section II gives a brief review of the general wideband beamformer structure. Section III discusses the least square solution for the design of FIB, while the proposed method is discussed in Section IV. Section V presents the simulation results and conclusions are drawn in Section VI. II. REVIEW OF WIDEBAND BEAMFORMING The general structure of a wideband beamformer using tapped-delay lines (TDL) is shown Fig. 1, where J is the number of filter taps associated with each of the M microphones. Such a beamformer samples the signals both spatially and temporally and its response can be expressed as a function APSIPA APSIPA 214

2 + x M 1 n] w M 1, w M 1,1 + w M 1,J 1 + w 1, w 1,1 + w 1,J 1 + yn] x 1 n] θ d + w, w,1 x n] + w,j 1 Fig. 1. Wideband beamformer structure with M microphones and filters of J taps for each microphone. of the frequency f and the azimuth angle θ given by P (f, θ) = M 1 m= J 1 wm,je ȷ2πf(τm(θ)+j), (1) j= where denotes the conjugate operation, τ m (θ) is the spatial propagation delay between the mth microphone and the reference microphone and T s is the time delay between the adjacent taps of the TDL. The vector form of (1) can be written as where P (f, θ) = w H s(f, θ), (2) w = w,,... w M 1,,... w,j 1... w M 1,J 1 ] T (3) is the (complex) filter coefficient vector. The variable s(f, θ) is the MJ 1 steering vector defined as s(f, θ) = s (f) s τ (f, θ), (4) such that denotes the Kronecker product, T s (f) = 1, e ȷ2πfT s,..., e s] ȷ(J 1)2πfT, (5) and ] s τ (f, θ) = e ȷ2πfτ(θ), e ȷ2πfτ1(θ),..., e ȷ2πfτ T M 1(θ). (6) For the case of a uniformly spaced linear microphone array with spacing d and wave propagation velocity c, the time delay is given by τ m (θ) = m d c cosθ. III. FIB BASED ON LEAST SQUARES The general weighted least squares cost function for the design of a wideband beamformer is given by 22], J LS = F (f, θ) P (f, θ) P d (f, θ) 2 dfdθ, (7) F Θ where F and Θ denote the frequency range of interest and azimuth angles, respectively. The variable F (f, θ) is a positive real-valued weighting function corresponding to the mainlobe and sidelobes while P (f, θ) and P d (f, θ) define the actual and desired directional responses, respectively. By selecting a uniform grid of frequencies and azimuth angles, we assume P d (f, θ) = 1 and F (f, θ) = 1 for the mainlobe and P d (f, θ) = and F (f, θ) = α, where < α < 1 for the sidelobes. Defining f i as frequency for the ith frequency index and θ k as the azimuth for the kth azimuth index, the cost function in (7) can then be rewritten, using (2), as J LS = w H s(f i, θ k ) 1 2 f iϵf θ k ϵθ m +α (8) w H s(f i, θ k ) 2, f i ϵf θ k ϵθ s where Θ m and Θ s denote the azimuth range corresponding to mainlobe and sidelobes, respectively. To achieve frequency-invariant property, the response variation criterion has been proposed 17]. The criterion J rv = w H s(f i, θ k ) w H s(f r, θ k ) 2, (9) f i ϵf θ k ϵθ FI defines the Euclidean distance between the response at the reference frequency f r and that of all frequencies over the range of azimuth angles Θ FI. This cost function along with (8) offers a tradeoff between sidelobe attenuation and the frequency-invariance property. This tradeoff can be varied through the use of a control parameter β in the following FIB cost function J FIB = J LS + βj rv. (1) It is useful to note that the frequency-invariant property can be obtained either in the mainlobe direction or the full azimuth range. If it is considered over the whole azimuth range, i.e., including the sidelobe region, the spectrum energy of the beamformer must be minimized only at the reference frequency f r instead of over the entire frequency range. Substituting J LS from (8) to (1), we obtain the FIB cost function J FIB = w H s(f r, θ k ) α w H s(f r, θ k ) 2 θ k ϵθ m θ k ϵθ s + βj rv. (11) The above cost function is then solved using the Lagrange multiplier method to obtain the optimum solution 2] w = Q 1 FIB a FIB, (12)

3 33 db 1 db Hz 18 Hz 275 Hz db 3 db 4 db 5 db Fig. 2. Beampattern of FIB for a look direction of 9 using a uniform linear array of twenty microphones. Fig. 3. Polar pattern of FIB for a look direction of 9 at frequencies 875 Hz, 18 Hz and 275 Hz. where Q FIB = s(f r, θ k )s H (f r, θ k ) +α s(f r, θ k )s H (f r, θ k ) θ k ϵθ m θ k ϵθ s +β (s(f i, θ k ) s(f r, θ k ))(s(f i, θ k ) s(f r, θ k )) H f i ϵf θ k ϵθ (13) and a FIB = θ k ϵθ m s(f r, θ k ). (14) The drawback of direct implementation of FIB for speech signals lies in the fact that it is challenging to achieve a frequency-invariant response at low frequencies where most of the speech energy is present. This drawback is particularly pronounced when the array aperture is small 23]. When the frequency-invariant property is considered for the whole speech band, the mainlobe beamwidth will be large and sidelobe attenuation may not be sufficient as shown in Fig. 2. In addition, the above FIB design approach does not take into account the control of null directions. This implies that nulls formed are intrinsic to the FIB design, which may not be in the direction of the interferences. FIB with additional constraints for null in the interference direction can be designed using convex optimization and constrained least-squares technique 24]. However, this approach will lead to increase in computational complexity during online weight estimation. IV. THE PROPOSED CFIBN APPROACH As described in Section III, the main disadvantage of the FIB is that the sidelobe attenuation may not be sufficient to suppress the interferences. On the other hand, data-dependent beamformers such as the MVDR beamformer will form nulls in the direction of the interferences and this results in the suppression of interfering signals. To overcome the above disadvantage of the FIB, we propose to design a combined frequency-invariant fixed beamformer with null (CFIBN) in the direction of interferences. In our proposed approach, frequency-invariant nulls (FIN) are combined with the FIB so that the effective beamformer can achieve unity gain in the direction of the desired source and nulls in the direction of the interferences. For the estimation of interfering source directions one may use any of the existing methods 25], 26]. For example, by applying a complex-valued independent component analysis (ICA) algorithm such as the FastICA 27] in any one of the frequency bins, we can estimate the unmixing matrix based on which the rows can then be used to estimate the direction(s) of the source(s) 28], 29]. In this work however, we assume that the directions of the interferers are known. In the following sections, we describe the design of the broadband nulls and estimation of the effective filter coefficients by combining the filter coefficients corresponding to FINs and that of FIB. A. Design of frequency-invariant broadband null As in the case of FIB, the nulls are made frequency invariant. These nulls can be achieved by minimizing the spectrum energy from the direction of the interference subjected to the constraint that the response from all the other directions are unity. Similar to the FIB design, if the frequency-invariance property is considered over the whole azimuth range, minimization is to be done only for the reference frequency f r. The cost function for the null design is then given by minimizing J NULL = w H s(f r, θ k ) 2 + β J rv, (15) θ k ϵθ null subject to w H s(f r, θ k ) = 1, Θ null Θ, (16) θ k ϵθ where Θ null is the desired null direction which corresponds to the direction of the interference. The above cost function can be solved using the Lagrange multiplier method which results in w = R 1 p p H R 1 p, (17)

4 where R = s(f r, θ k )s H (f r, θ k ) θ k ϵθ null + β (s(f i, θ k ) s(f r, θ k ))(s(f i, θ k ) s(f r, θ k )) H f iϵf θ k ϵθ (18) and p = θ k ϵθ s(f r, θ k ), Θ null Θ. (19) Fig. 4 shows an example for a frequency-invariant broadband null for an azimuth angle of 3 based on the above design, using a uniform linear microphone array of twenty microphones with an inter-microphone spacing of 2.14 cm. B. Effective filter coefficients After obtaining the beamformer filter coefficients corresponding to the look direction and null directions, they have to be combined to obtain the effective filter coefficients. Denoting the weight vectors corresponding to the FIB by w FIB and that of the N FINs by w FIN1, w FIN2, w FINN, the response of the equivalent beamformer having unity response in the desired direction and nulls in the directions of the N interferences can be written as wcfibns(f, H θ) = wfibs(f, H θ) wfin H 1 s(f, θ) wfin H 2 s(f, θ) wfin H N s(f, θ), (2) where denotes the Hadamard product and w CFIBN is the effective filter coefficients of the equivalent beamformer. It is important to note that before computing the equivalent weights, beamformer responses of FIB and FINs have to be normalized using corresponding filter coefficients. This normalization ensures that there is equal weightage for individual beampatterns. Following that of (2), the effective filter coefficients can then be calculated as w CFIBN = wfibs(f, H θ) wfin H 1 s(f, θ) wfin H 2 s(f, θ) ] 1. wfin H N s(f, θ) s H (f, θ) s(f, θ)s H (f, θ)] (21) Applying the above constraint to all the frequencies and directions, (21) can be expressed as w CFIBN = wfibs(f, H θ) wfin H 1 S(f, θ) wfin H 2 S(f, θ) ] 1, wfin H N S(f, θ) S H (f, θ) S(f, θ)s H (f, θ)] (22) where S(f, θ) = s(f 1, θ 1 ), s(f 1, θ 2 ),, s(f 1, θ K ), s(f 2, θ 1 ), s(f 2, θ 2 ),, s(f 2, θ K ),, s(f I, θ K ) ] (23) Fig. 4. Frequency invariant broadband null using a uniform linear array of twenty microphones. given that I is the number of frequency bins and K is the number of directions. The advantage of the above design compared to data dependent beamformers such as the MVDR is in its practical implementation. We can pre-calculate the filter coefficients w FIB and w FIN for all the possible directions and the matrix ( S(f, θ)s H (f, θ) ) 1 so that they can be combined together for any combination of source and interference directions without the need for a significant increase in computational complexity. In addition, unlike data-dependent methods, the proposed method does not require the estimation of noise covariance matrix, which can often be challenging under low SNR environment. V. SIMULATION RESULTS To verify the performance of the proposed approach, a uniform linear microphone array with M = 2, each employing a J = 3 tap finite impulse response (FIR) filter is considered for the design of both FIB and FIN in our simulations. We have also used a sampling frequency of 16 khz. The spacing between microphones d is set at half the wavelength corresponding to maximum signal frequency and this corresponds to 2.14 cm. The look direction is kept at 9 and the direction of the interferers are set to 3 and 15. The sidelobe weighting parameter α and the tradeoff parameter β is set at.1 and.95, respectively. The frequency of interest is assumed to range from 1 Hz to 32 Hz, with the reference frequency f r set at 17 Hz. The beampattern and the polar pattern obtained using the proposed method is shown in Fig. 5 and 6, respectively. Compared with the beampattern and polar pattern of the conventional FIB shown in Fig. 2 and 3, it can be seen that the interferences from 3 and 15 are attenuated significantly, while maintaining sufficient frequency-invariant response in the desired direction. It is worth noting that, unlike for the case of MVDR in Fig. 7 and 8, for FIB in Fig. 2, even though there are nulls in the sidelobes, they are not in the direction of 15

5 db 2 db db Hz 18 Hz 275 Hz 3 2 db db 6 3 db 4 db 4 db 5 db db 27 Fig. 7. Beampattern of the MVDR beamformer when the interferers are from 3 and 15 and the desired source is from Hz 18 Hz 275 Hz 3 1 db 3 3 Fig. 5. Beampattern of CFIBN when the interferers are from 3 and 15 and the desired source is from db Fig. 6. Polar pattern of CFIBN when the interferers are from 3 and 15 and the desired source is from 9 at frequencies 875 Hz, 18 Hz and 275 Hz. Fig. 8. Polar pattern of the MVDR beamformer when the interferers are from 3 and 15 and the desired source is from 9 at frequencies 875 Hz, 18 Hz and 275 Hz. the interferences. This is due to the fact that the interference direction information were not used for the design of FIB. By incorporating this information, one could signiﬁcantly improve the attenuation in the direction of interferences. In addition, the beampattern of the MVDR is non-uniform across frequencies and this results in distortion when applied for speech enhancement applications. It is also important to note that the MVDR assumes that the interference covariance matrix is known. Knowledge of this covariance matrix is challenging in practice particularly in an environment where the SNR is low. beamformer ﬁlter coefﬁcients for all the possible directions. Frequency-invariant ﬁxed beamformers with nulls in the direction of interferences and unity response in the direction of the desired source can then be formed by combining the above pre-calculated beamformer ﬁlter coefﬁcients. This approach does not require signiﬁcant computational complexity and hence the method is suitable for real-time applications. VI. C ONCLUSIONS We proposed a method for improving the sidelobe attenuation of a frequency-invariant ﬁxed beamformer by incorporating nulls in the interference directions. The proposed method involves the design of frequency-invariant ﬁxed beamformers and broadband nulls as well as the pre-calculation of the R EFERENCES 1] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, ] M. M. Goulding and J. S. Bird, Speech enhancement for mobile telephony, IEEE Trans. Veh. Technol., vol. 39, pp , Nov ] Y. Ephraim and D. Malah, Speech enhancement using a minimummean square error short-time spectral amplitude estimator, IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp , Dec ] R. O. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas and Propagation, vol. 34, no. 3, pp , Mar ] H. Beigi, Fundamentals of speaker recognition, Springer, 211.

6 6] J. Capon, High resolution frequency-wavenumber spectrum analysis, Proc. IEEE, vol. 57, no. 8, pp , Aug ] O. L. Frost, An algorithm for linearly constraint adaptive array processing, Proc. IEEE, vol. 6, no. 8, pp , Aug ] L. J. Griffiths and C. W. Jim, An alternative approach to the linearly constrained adaptive beamforming, IEEE Trans. Antennas and Propagation, vol. 3, no. 8, pp , Jan ] R. Smith, Constant beamwidth receiving arrays for broadband sonar systems, Acustica, vol. 48, no. 1, pp , ] T. Chou, Frequency-independent beamformer with low response error, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Detroit, USA, vol. 5, pp , May ] S. C. Chan and H. H. Chen, Theory and design of uniform concentric circular arrays with frequency invariant characteristics, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Philadelphia, USA, vol. 4, pp , Mar ] S. C. Chan and H. H. Chen, Theory and design of uniform concentric spherical arrays with frequency invariant characteristics, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, vol. 4, pp , May ] D. B. Ward, R. A. Kennedy and R. C. Williamson, Theory and design of broadband sensor arrays with frequency invariant far-field beam patterns, J. Acoust. Soc. Amer., vol. 97, no. 2, pp , Feb ] W. Liu, S. Weiss, J. G. McWhirter and I. K. Proudler, Frequency invariant beamforming for two-dimensional and three-dimensional arrays, Signal Process., 87, pp , Nov ] W. Liu and S. Weiss, Design of frequency invariant beamformers for broadband arrays, IEEE Trans. Signal Process., vol. 56, no. 2, pp , Feb ] W. Liu, D. McLernon and M. Ghogho, Design of frequency invariant beamformer without temporal filtering, IEEE Trans. Signal Process., vol. 57, no. 2, pp , Feb ] H. Duan, B. P. Ng, C. M. See and J. Fang, Applications of the SRV constraint in broadband pattern synthesis, Signal Process., vol. 88, pp , Apr ] Y. Zhao, W. Liu and R. J. Langley, Efficient design of frequency invariant beamformers with sensor delay-lines, in Proc. IEEE Workshop on Sensor Array and Multichannel Signal Process, Darmstadt, Germany, pp , Jul ] Y. Zhao, W. Liu and R. J. Langley, A least squares approach to the design of frequency invariant beamformers, in Proc. European Signal Processing Conference (EUSIPCO), Glasgow, Scotland, pp , Aug ] Y. Zhao, W. Liu and R. J. Langley, Subband design of fixed wideband beamformers based on the least squares approach, Signal Process., vol. 91, pp , Apr ] Y. Zhao, W. Liu and R. J. Langley, An eigenfilter approach to the design of frequency invariant beamformers, in Proc. ITG/IEEE Int. Workshop on Smart Antennas, Berlin, Germany, pp , Feb ] S. Doclo and M. Moonen, Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics, IEEE Trans. Signal Process., vol. 51, no. 1, pp , Oct ] V. Reddy, A. W. H. Khong and B. P. Ng, Unambiguous speech DOA estimation under spatial aliasing conditions, IEEE Trans. Audio, Speech and Language Process., Jul ] Y. Zhao, W. Liu and R. J. Langley, Application of the least squares approach to fixed beamformer design with frequency-invariant constraints, IET Signal Process., vol. 5, no. 3, pp , Jun ] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, Robust localization in reverberant rooms, in Microphone Arrays: Techniques and Applications, Springer-Verlag, 21, pp ] V. Reddy, B. P. Ng, and A. W. H. Khong, Insights into MUSIC-like algorithm, IEEE Trans. Signal Process., vol. 61, no. 1, pp , May ] A. Hyvarinen, J. Karhunen and E. Oja, Independent component analysis, John Wiley & Sons, vol. 46, ] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, F. Itakura, Evaluation of blind signal separation method using directivity pattern under reverberant conditions, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, vol. 5, pp , Jun ] V. G. Reju, S. N. Koh and I. Y Soon, Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals, Neurocomputing, vol. 71, pp , Jun. 28.

Broadband Microphone Arrays for Speech Acquisition

Broadband Microphone Arrays for Speech Acquisition Darren B. Ward Acoustics and Speech Research Dept. Bell Labs, Lucent Technologies Murray Hill, NJ 07974, USA Robert C. Williamson Dept. of Engineering,