Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Similar documents
Speech enhancement with ad-hoc microphone array using single source activity

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Multiple Sound Sources Localization Using Energetic Analysis Method

Nonlinear postprocessing for blind speech separation

516 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

Recent Advances in Acoustic Signal Extraction and Dereverberation

A Novel Hybrid Approach to the Permutation Problem of Frequency Domain Blind Source Separation

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

BLIND SOURCE separation (BSS) [1] is a technique for

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Underdetermined Convolutive Blind Source Separation via Frequency Bin-wise Clustering and Permutation Alignment

High-speed Noise Cancellation with Microphone Array

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS

Speech Enhancement Using Microphone Arrays

Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation

About Multichannel Speech Signal Extraction and Separation Techniques

Separation of Multiple Speech Signals by Using Triangular Microphone Array

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Microphone Array Design and Beamforming

BLIND SOURCE SEPARATION BASED ON ACOUSTIC PRESSURE DISTRIBUTION AND NORMALIZED RELATIVE PHASE USING DODECAHEDRAL MICROPHONE ARRAY

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

A BINAURAL HEARING AID SPEECH ENHANCEMENT METHOD MAINTAINING SPATIAL AWARENESS FOR THE USER

Sound Source Localization using HRTF database

A HYPOTHESIS TESTING APPROACH FOR REAL-TIME MULTICHANNEL SPEECH SEPARATION USING TIME-FREQUENCY MASKS. Ryan M. Corey and Andrew C.

IMPROVED COCKTAIL-PARTY PROCESSING

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

BLIND SOURCE SEPARATION FOR CONVOLUTIVE MIXTURES USING SPATIALLY RESAMPLED OBSERVATIONS

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Automotive three-microphone voice activity detector and noise-canceller

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

ONLINE REPET-SIM FOR REAL-TIME SPEECH ENHANCEMENT

ROBUST BLIND SOURCE SEPARATION IN A REVERBERANT ROOM BASED ON BEAMFORMING WITH A LARGE-APERTURE MICROPHONE ARRAY

AUDIO ZOOM FOR SMARTPHONES BASED ON MULTIPLE ADAPTIVE BEAMFORMERS

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

MULTIMODAL BLIND SOURCE SEPARATION WITH A CIRCULAR MICROPHONE ARRAY AND ROBUST BEAMFORMING

Clustered Multi-channel Dereverberation for Ad-hoc Microphone Arrays

Frequency-Domain Blind Source Separation of Many Speech Signals Using Near-Field and Far-Field Models

Robust Low-Resource Sound Localization in Correlated Noise

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

Audio Imputation Using the Non-negative Hidden Markov Model

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

A SOURCE SEPARATION EVALUATION METHOD IN OBJECT-BASED SPATIAL AUDIO. Qingju LIU, Wenwu WANG, Philip J. B. JACKSON, Trevor J. COX

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Fundamental frequency estimation of speech signals using MUSIC algorithm

Research Article DOA Estimation with Local-Peak-Weighted CSP

REAL-TIME BROADBAND NOISE REDUCTION

Harmonics Enhancement for Determined Blind Sources Separation using Source s Excitation Characteristics

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech Synthesis using Mel-Cepstral Coefficient Feature

Nicholas Chong, Shanhung Wong, Sven Nordholm, Iain Murray

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural Networks

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE

Single-channel Mixture Decomposition using Bayesian Harmonic Models

A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

ICA for Musical Signal Separation

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

Speech Enhancement Using a Mixture-Maximum Model

/$ IEEE

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

A Frequency-Invariant Fixed Beamformer for Speech Enhancement

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

The Steering for Distance Perception with Reflective Audio Spot

Chapter 4 DOA Estimation Using Adaptive Array Antenna in the 2-GHz Band

All-Neural Multi-Channel Speech Enhancement

Array Calibration in the Presence of Multipath

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Chapter 4 SPEECH ENHANCEMENT

In air acoustic vector sensors for capturing and processing of speech signals

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

Dual-Microphone Speech Dereverberation in a Noisy Environment

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

RIR Estimation for Synthetic Data Acquisition

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Smart antenna for doa using music and esprit

GROUP SPARSITY FOR MIMO SPEECH DEREVERBERATION. and the Cluster of Excellence Hearing4All, Oldenburg, Germany.

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Book Chapters. Refereed Journal Publications J11

Transcription:

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments Kouei Yamaoka, Shoji Makino, Nobutaka Ono, and Takeshi Yamada University of Tsukuba, yamaoka@mmlab.cs.tsukuba.ac.jp, maki@tara.tsukuba.ac.jp, takeshi@cs.tsukuba.ac.jp National Institute of Informatics (NII) SOKENDAI (The Graduate University for Advanced Studies), onono@nii.ac.jp Abstract In this paper, we evaluate the performance of a maximum signal-to-noise ratio beamformer based on a virtual increase of channels. We previously proposed a new microphone array signal processing technique, which virtually increases the number of microphones by generating extra signal channels from two real microphone signals. This technique generates a virtual observation on the assumption that the sources are W-disjoint orthogonal, which means that only one source is dominant in one time-frequency bin. However, mixed signals with a long reverberation tend to dissatisfy this assumption. In this study, we conducted experiments in a variety of reverberant environments, as well as computer simulation using image method. As a result, we confirmed that our technique contributes improving the performance in reverberant environments. We also confirmed that the longer the reverberation time, the smaller the increase in the improvement using our technique. Moreover, we present directivity patterns to confirm the behavior of a virtual increase of channels. I. INTRODUCTION Microphone array signal processing is used in various techniques such as speech enhancement involving the use of beamformers and blind source separation (BSS) []. The speech enhancement performance using these techniques depends on the number of microphones. The performance may degrade when the number of microphones is less than the number of sound sources (underdetermined conditions). Recently, many recording devices such as IC recorders and smartphones have become common. These devices have a small number of microphones (usually only two). For this reason, when we use these devices, speech enhancement tends to occur an underdetermined condition. Although several methods such as time-frequency masking [2], multichannel Wiener filtering [3] and the statistical modeling of observations using latent variables [4], [5] can work well in underdetermined conditions, better performance should be obtainable because they tend to contain several artificial noises such as musical noise. As a technique for realizing high performance in underdetermined conditions, we proposed a virtual increase of channels based on virtual microphone signals [6] [8]. In this technique, we create arbitrary channels of virtual microphone signals by using two channels of real microphones. Virtual microphone signals are generated as estimates of signals at a virtual microphone placed at a point where there is no real microphone. We perform microphone array signal processing using microphone signals consisting of both real and virtual microphone signals. Additionally, this technique is applicable to various types of microphone array signal processing, since we generate virtual signals in the audio signal domain, which is different from techniques in which signals are generated in the power domain [9] [] or a higher-order statistical domain [2], [3]. As an approach to virtual microphone signal generation, we previously proposed nonlinear interpolation using the complex logarithm spectrum of real microphone signals, which we call complex logarithmic interpolation [6]. Additionally, we proposed β-divergence-based nonlinear interpolation [7], [8] as a generalization of complex logarithmic interpolation. These methods assume W-disjoint orthogonality (W-DO) [2], [4]. Because of this assumption, when multiple sounds arrive, they can be regarded as a single sound. However, long reverberant environments may cause the breakdown of W-DO, and the performance of our technique in such a situation has not yet been investigated. Therefore, we study the effect of reverberation on the virtual increase of channels. In this paper, we compare the speech enhancement performance of a maximum signalto-noise ratio (SNR) beamformer [5], [6] with the virtual increase of channels in a variety of reverberant environments. II. INCREASING CHANNELS BY VIRTUAL MICROPHONE FOR MAXIMUM SNR BEAMFORMER A. Increasing channels by nonlinear interpolation with β- divergence We proposed a virtual increase of channels as a technique for creating arbitrary channels of virtual microphone signals by using two channels of real microphones [6] [8]. By using real and generated virtual microphones, we can use a microphone array whose number of channels has been virtually increased as shown in Fig.. In this technique, a microphone signal is modeled in the short-time Fourier transform (STFT) domain. Here, letx i (ω,t) be the ith real microphone signal (i =, 2) at angular frequency ω in the tth frame. The amplitude of this signal is denoted as A i = x i (ω,t) and the phase is denoted as φ i = x i (ω,t). A virtual microphone signal v(ω,t,α) is defined as the observation estimated at the point obtained by ISBN 978-0-9928626-7- EURASIP 207 2388

Real microphone Virtual microphones α Real microphone signal Interpolation Virtual microphone signals Signal processing outputs Non-target speech: 70 Target speech: 90 Reverberation Times: 0, 30, 780 ms SNR: 0 db.85 m.48 cm + Real microphone α Real microphone signal Fig. : Microphone array signal processing with virtual increase of channels internally dividing the line joining two real microphones in the ratio α : ( α). Hereafter, when there is no need to distinguish ω,t and α, the signal is simply denoted as v. The virtual microphone signal is obtained by a nonlinear interpolation for each time-frequency bin as follows. We derive the amplitude A v that minimizes the sum σ Dβ of the β-divergence between the amplitude of a real microphone signal and a virtual microphone signal weighted by the virtual microphone interpolation parameter α, σ Dβ = ( α)d β (A v,a )+αd β (A v,a 2 ), () A vβ = argmin Av σ Dβ, (2) where D β (A v,a i ) is defined as D β (A v,a i ) = A v (loga v loga i )+(A i A v ) (β = ), A v log A v (β = 0), A i A i A β v β(β ) + Aβ i β A va β i (otherwise). β By differentiating σ Dβ with respect to A v and setting it to 0, the interpolated amplitude extended using β-divergence is obtained as A vβ = exp(( α)loga +αloga 2 ) (β = ), ( ) ( α)a β +αa β β 2 (otherwise). Note that A vβ is continuous at β = and this interpolation is equivalent to complex logarithmic interpolation [6]. Phase φ v of a virtual microphone signal v is interpolated linearly as (3) (4) φ v = ( α)φ +αφ 2, (5) and this interpolation requires no spatial aliasing. From the above, virtual microphone signal v is represented as v = A vβ exp(jφ v ). (6) Non-target speech: 30 Fig. 2: Layout of sound sources and microphones in experiment B. Maximum SNR beamformer We apply the virtual microphone technique to a maximum SNR beamformer [5], [6], which is one of the speech enhancement techniques, to evaluate its performance. A maximum SNR beamformer requires the target-active period and target-inactive period as prior information for speech enhancement. Since this technique requires no information about the directions of sound sources, it is advantageous to application for sound sources with unknown directions. III. EXPERIMENTAL EVALUATION OF SPEECH ENHANCEMENT IN REVERBERANT ENVIRONMENTS In this study, to evaluate the speech enhancement performance in reverberant environments, we conducted experiments using observed signals that are convolutive mixtures of measured impulse responses in a variety of reverberant environments and speech signals. Additionally, we also used the Room Impulse Response (RIR) generator [7] to simulate impulse responses. The performance was evaluated by comparing the results of two methods: speech enhancement using the maximum SNR beamformer with virtual microphone signals, and speech enhancement without virtual microphone signals (an underdetermined condition). A. Experimental conditions We used impulse responses in the RWCP Sound Scene Database in Real Acoustic Environments (RWCP-SSD) [8], which is a common database for evaluating speech and acoustic signal processing research in real acoustic environments. In this experiment, we used impulse responses with reverberation times of 0, 30 and 780 ms, which were measured in an anechoic chamber and variable reverberation chambers. The layout of the sound sources and real microphones is shown in Fig. 2. The other experimental conditions are listed in Table I. We used two real microphones and generated one virtual microphone signal at their midpoint. Thus, the microphone array we used was composed of three microphones, two real microphones and one virtual microphone. Because of the long interval between the real microphones, there was spatial aliasing. We used eight samples of Japanese or English speech for the target signals, whose direction of ISBN 978-0-9928626-7- EURASIP 207 2389

TABLE I: Experimental conditions Number of real microphones 2 Number of virtual microphones (α = 0.5) Interval between real microphones.48 cm Reverberation time 0, 30, 780 ms Input SNR 0 db Sampling rate 8 khz FFT frame length 024 samples FFT frame shift 256 samples Target-active period Θ T 0 s Target-inactive period Θ I 0 s Speech-enhanced period 20 s arrival (DOA) was 90. We also used two combinations of Japanese or English speech for the non-target signals, whose DOAs were 70 and 30. The input SNR was set to 0 db. We used objective criteria, namely, the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR) [9], for which higher values indicate higher performance. Here, we show the average results of SDR and SIR over the eight samples for the target signals and the two combinations for the non-target signals. B. Results and discussion Figure 3 shows the relationship between the speech enhancement performance and reverberation time for different values of the interpolation parameter β. Note that W/O virtual mic in this figure denotes the performance of the maximum SNR beamformer without the virtual microphone and W/ β = 0,2,20 denote the performances with the virtual microphone for interpolation parameters β = 0, 2 and 20, respectively. According to Fig. 3, the results for W/ β = 0, 2, 20 have a higher SDR and SIR than these for W/O virtual mic regardless of the reverberation time. Thus, it is confirmed that our technique contributes to improving the performance in reverberant environments. On the other hand, the longer the reverberation time, the smaller the increase in the improvement upon a virtual increase of channels. Considering these results, it can be concluded that when the W-DO requirement is not satisfied, it directly adversely affects the performance. In a reverberant environment, the performance is improved by increasing the value of β, which controls the nonlinearity of the interpolation. In contrast, in a non-reverberant environment, the highest performance occurs when β = 2. The amplitude of the virtual microphone is interpolated linearly whenβ = 2 (see Eq. (4)). Essentially, we cannot obtain helpful information by linear interpolation. Nevertheless, this value of β gives the highest performance. Regarding W-DO, Fig. 4 is a histogram showing the proportion of sources that are simultaneously active at each frequency. In this figure, we use a male-male-female combination recorded with each reverberation time. we consider that source x i (i =,2,3) is active when it has a amplitude greater than max( xi ) 0 for all i at each frequency [20]. If two or three sources are simultaneously active, W-DO is not satisfied. According to this figure, approximately 5% of the timefrequency bins do not satisfy W-DO in Fig. 4(a). Moreover, the percentage of time-frequency bins that do not satisfy W-DO increases with the reverberation time as shown in Figs. 4(b) and (c). For this reason, the improvement of the performance with the virtual increase of channels is decreased in the case of long reverberation. C. Directivity patterns Figure 5 shows directivity patterns produced by the maximum SNR beamformer in an experiment. In Fig. 5(a), the maximum SNR beamformer produced one null using two real microphones. In the frequency range of.5 to 4 khz, there is spatial aliasing, so that we truncated the figure. Interestingly, according to Fig. 5(d), two nulls are created by using both one virtual microphone and two real microphones. This is the contribution due to the virtual increase of channels. However, according to Figs. 5(b) and (e), the nulls become indistinct in the case of long reverberation. In addition to the measured impulse responses in RWCP- SSD, we also used impulse responses with reverberation times of 0, 20, 30 and 780 ms produced by the RIR generator for simulation. In this simulation, almost all the conditions were the same as those in Fig. 2 and Table I. Only the interval between the real microphones was different which we set to 4 cm to avoid spatial aliasing. Figures. 5(c) and (f) show directivity patterns obtained by the simulation. We can confirm the contribution of the virtual increase of channels more clearly than prior experiment using measured impulse responses. IV. CONCLUSIONS In this paper, we verified the speech enhancement performance of a maximum SNR beamformer based on a virtual increase of channels assuming W-DO, which is an important assumption for a virtual increase of channels. However, mixed signals tend not to satisfy this assumption when the reverberation time is long. Thus, we conducted experiments in a variety of reverberant environments. As a result, we confirmed that a consistent improvement of SDR and SIR can be obtained by a virtual increase of channels even in a long-reverberation environment. However, the improvement with the virtual increase of channels decreased in the case of long reverberation. We confirmed that this is because W-DO is not satisfied by observing the number of active sources. Moreover, we showed directivity patterns to confirm the behavior of a virtual increase of channels. We expect that this significant decrease in the performance can be avoided to obtain better performance. ISBN 978-0-9928626-7- EURASIP 207 2390

SDR [db] 5 4.5 4 3.5 3 2.5 2.5 0.5 0 W/O virtual mic W/ β = 0 W/ β = 2 W/ β = 20 0 30 780 Reverberation time [ms] 0 30 780 Reverberation time [ms] (a) SDR (b) SIR Fig. 3: Relationship between reverberation time and speech enhancement performance SIR [db] 0 9 8 7 6 5 4 3 2 0 W/O virtual mic W/ β = 0 W/ β = 2 W/ β = 20 (a) Reverberation time 0 ms (b) Reverberation time 30 ms (c) Reverberation time 780 ms Fig. 4: Histogram showing the proportion of simultaneously active sources at each frequency (a) W/O virtual mic (T 60 = 0 ms) (b) W/O virtual mic (T 60 = 30 ms) (c) W/O virtual mic (T 60 = 20 ms, simulation) (d) W/ virtual mic β = 2 (T 60 = 0 ms) (e) W/ virtual mic β = 20 (T 60 = 30 ms) (f) W/ virtual mic β = 2 (T 60 = 20 ms, simulation) Fig. 5: Directivity patterns; (a), (d) and (b), (e) used measured impulse responses with reverberation times of 0 and 30 ms, respectively, (c), (f) used calculated impulse responses with a reverberation time of 20 ms ISBN 978-0-9928626-7- EURASIP 207 239

REFERENCES [] S. Makino, T.-W. Lee, and H. Sawada, Blind Speech Separation, Springer, 2007. [2] O. Yılmaz and S. Rickard, Blind separation of speech mixtures via time-frequency masking, IEEE Trans. on Signal Processing, pp. 830 847, 2004. [3] N. Q. K. Duong, E. Vincent, and R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model, IEEE Trans. on Audio, Speech, and Language Processing, vol. 8, no. 7, pp. 830 840, 200. [4] Y. Izumi, N. Ono, and S. Sagayama, Sparseness-based 2ch BSS using the EM algorithm in reverberant environment, Proc. WASPAA, pp. 47 50, 2007. [5] H. Sawada, S. Araki, and S. Makino, Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment, IEEE Trans. on Audio, Speech, and Language Processing, vol. 9, no. 3, pp. 56 527, 20. [6] H. Katahira, N. Ono, S. Miyabe, T. Yamada, and S. Makino, Virtually increasing microphone array elements by interpolation in complexlogarithmic domain, Proc. EUSIPCO, pp. 5, Sept. 203. [7] H. Katahira, N. Ono, S. Miyabe, T. Yamada, and S. Makino, Generalized amplitude interpolation by β-divergence for virtual microphone array, Proc. IWAENC, pp. 50 54, Sept. 204. [8] H. Katahira, N. Ono, S. Miyabe, T. Yamada, and S. Makino, Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer, EURASIP Journal on Advances in Signal Processing, vol. 206, no., pp. 8, Jan. 206. [9] H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, Speech enhancement using nonlinear microphone array based on complementary beamforming, IEICE Trans. on Fundamentals, vol. E82-A(8), pp. 50 50, 999. [0] S. Miyabe, B. H. (Fred) Juang, H. Saruwatari, and K. Shikano, Analytical solution of nonlinear microphone array based on complementary beamforming, Proc. IWAENC, pp. 4, 2008. [] Y. Hioka and T. Betlehem, Under-determined source separation based on power spectral density estimated using cylindrical mode beamforming, Proc. WASPAA, pp. 4, 203. [2] P. Chevalier, A. Ferréol, and L. Albera, High-resolution direction finding from higher order statistics: The 2q-MUSIC algorithm, IEEE Trans. on Signal Processing, vol. 53, no. 4, pp. 2986 2997, 2006. [3] Y. Sugimoto, S. Miyabe, T. Yamada, S. Makino, and B. H. (Fred) Juang, Employing moments of multiple high orders for high-resolution underdetermined DOA estimation based on music, Proc. WASPAA, pp. 4, 203. [4] A. Jourjine, S. Rickard, and O. Yılmaz, Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures, Proc. ICASSP, pp. 2985 2988, 2000. [5] H. L. Van Trees, Optinum Array Processing, John Wiley & Sons, 2002. [6] S. Araki, H. Sawada, and S. Makino, Blind speech separation in a meeting situation with maximum SNR beamformers, Proc. ICASSP, vol. I, pp. 4 45, 2007. [7] E. A. P. Habets, Room impulse response (RIR) generator, Available at: https://www.audiolabs-erlangen.de/ fau/professor/habets/software/rir-generator, Oct. 2008. [8] S. Nakamura, K. Hiyane, F. Asano, Y. Kaneda, T. Yamada, T. Nishiura, T. Kobayashi, S. Ise, and H. Saruwatari, Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding, Proc. ICME 02, vol. 2, pp. 6 64, Aug. 2002. [9] E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Trans. on Audio, Speech, and Language Processing, vol. 4, no. 4, pp. 462 469, 2006. [20] A. Blin, S. Araki, and S. Makino, Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation, IEICE Trans. on Fundamentals, vol. 88-A, no. 7, pp. 693 700, 2005. ISBN 978-0-9928626-7- EURASIP 207 2392