PAPER Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

Similar documents
Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

LETTER Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Speaker Localization in Noisy Environments Using Steered Response Voice Power

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

THE problem of acoustic echo cancellation (AEC) was

Robust Low-Resource Sound Localization in Correlated Noise

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Automotive three-microphone voice activity detector and noise-canceller

Sound Source Localization using HRTF database

ROBUST echo cancellation requires a method for adjusting

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER

IN REVERBERANT and noisy environments, multi-channel

Recent Advances in Acoustic Signal Extraction and Dereverberation

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Adaptive beamforming using pipelined transform domain filters

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Speech Enhancement Based On Noise Reduction

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments

MULTICHANNEL systems are often used for

High-speed Noise Cancellation with Microphone Array

Calibration of Microphone Arrays for Improved Speech Recognition

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Audio Restoration Based on DSP Tools

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM

NOISE ESTIMATION IN A SINGLE CHANNEL

Broadband Microphone Arrays for Speech Acquisition

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

INTERFERENCE REJECTION OF ADAPTIVE ARRAY ANTENNAS BY USING LMS AND SMI ALGORITHMS

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

Optimum Rate Allocation for Two-Class Services in CDMA Smart Antenna Systems

Auditory System For a Mobile Robot

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

Chapter 4 SPEECH ENHANCEMENT

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Multiple Sound Sources Localization Using Energetic Analysis Method

A Robust Adaptive Beamformer with a Blocking Matrix Using Coefficient-Constrained Adaptive Filters

PAPER A Novel Adaptive Array Utilizing Frequency Characteristics of Multi-Carrier Signals

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

A Study on the control Method of 3-Dimensional Space Application using KINECT System Jong-wook Kang, Dong-jun Seo, and Dong-seok Jung,

DEEP LEARNING BASED AUTOMATIC VOLUME CONTROL AND LIMITER SYSTEM. Jun Yang (IEEE Senior Member), Philip Hilmes, Brian Adair, David W.

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Study of Different Adaptive Filter Algorithms for Noise Cancellation in Real-Time Environment

Advanced delay-and-sum beamformer with deep neural network

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Noise Reduction for L-3 Nautronix Receivers

Speech Synthesis using Mel-Cepstral Coefficient Feature

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

On-Line Dead-Time Compensation Method Based on Time Delay Control

Smart antenna for doa using music and esprit

Local Relative Transfer Function for Sound Source Localization

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Real time noise-speech discrimination in time domain for speech recognition application

x ( Primary Path d( P (z) - e ( y ( Adaptive Filter W (z) y( S (z) Figure 1 Spectrum of motorcycle noise at 40 mph. modeling of the secondary path to

Published in: Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

RECENTLY, there has been an increasing interest in noisy

An HARQ scheme with antenna switching for V-BLAST system

Time Delay Estimation: Applications and Algorithms

Microphone Array Design and Beamforming

Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

ONE of the most common and robust beamforming algorithms

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

SOUND SOURCE LOCATION METHOD

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Sound Processing Technologies for Realistic Sensations in Teleworking

Impulsive Noise Reduction Method Based on Clipping and Adaptive Filters in AWGN Channel

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 5, MAY

IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS

Implementation of decentralized active control of power transformer noise

Using GPS to Synthesize A Large Antenna Aperture When The Elements Are Mobile

3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015)

Different Approaches of Spectral Subtraction Method for Speech Enhancement

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS

/$ IEEE

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Image De-Noising Using a Fast Non-Local Averaging Algorithm

Transcription:

972 IEICE TRANS. FUNDAMENTALS, VOL.E88 A, NO.4 APRIL 2005 PAPER Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller Yang-Won JUNG a), Student Member, Hong-Goo KANG, Chungyong LEE, Nonmembers, Dae-Hee YOUN, Member, Changkyu CHOI, and Jaywoo KIM, Nonmembers SUMMARY In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 db SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system. key words: speech enhancement, microphone array, generalized sidelobe canceller, adaptation mode controller 1. Introduction Recently, much consideration has been paid to microphone array (MA) systems that can offer more comfortable speech interface. MA systems can provide accurate acquisition of a distant speaker s speech even in noisy environments. The current speech recognition systems suffer from distant speakers and strong interference environments, but with MA systems, the application of speech recognition will be greatly expanded [1] [3]. The generalized sidelobe canceller (GSC) is considered the most feasible algorithm for the MA system because of its simplicity and capability of interference reduction [4], [5]. In a real situation, however, reverberation causes an incomplete blocking of the target signal in the noise reference signal path, thus the system performance degrades significantly [4], [6]. To overcome this problem, methods to control the blocking matrix (BM) in an adaptive way were proposed [4], [6]. In these methods, an adaptation mode controller (AMC) that differentiates between the existence of the tar- Manuscript received May 19, 2004. Manuscript revised October 27, 2004. Final manuscript received December 27, 2004. The authors are with MCSP Lab., Dept. of Electrical and Electronic Eng., Yonsei University, Seoul, Korea. The authors are with Human & Computer Interaction Lab., Samsung Advanced Institute of Technology, Kyonggi-Do, Korea. This work was supported by the Research Fund of Samsung Advanced Institute of Technology under Project PR00002417RR. a) E-mail: ywjung@mcsp.yonsei.ac.kr DOI: 10.1093/ietfec/e88 a.4.972 get and the interfering signals is introduced to have proper adaptation in the adaptive part of the GSC. It is well known that the performance of the GSC-based MA system mainly depends on the accuracy of the AMC. When the AMC fails to determine proper adaptation modes, the target signal of the system output can be cancelled out entirely. Despite its importance to the overall performance of the GSC, few researches have concentrated on AMC methods yet. Hoshuyama et al proposed a power ratio method that compares the power of the output of a fixed beamformer (FBF) to that of the output of the BM [7]. Since it utilizes the output of the BM, this method is only applicable when the adaptive blocking matrix (ABM) of the system is initially converged. Moreover, the decision parameters of this method should be controlled by a signal to interference and noise ratio (SINR), which is difficult to set to an appropriate threshold. Much earlier, Greenberg et al proposed an AMC for an adaptive beamformer for hearing aids [8]. In their method, cross correlation between input sensors are of concern, and this method can be applicable only to uncorrelated background noise environments. In this paper, we propose a new AMC that can be used even in the situation when the system has not been initialized. The proposed AMC operates with two-stage modes: an initialization stage and a running stage. In the initialization stage, a sound source localization (SSL) technique is adopted. Using the SSL s estimate of the direction of the incoming signal, an ABM can be trained when the signal comes from the target direction only. The running stage is controlled by the cross correlation coefficient of the FBF output and the GSC output, and it is easy to set an appropriate threshold by its normalization property. The performance of the proposed system is evaluated as a real-time pre-processor of the man-machine-interface for a Home-Agent Robot (HAR) system. Experimental results verify that the proposed AMC method outperforms the power ratio method, and high-quality speech acquisition of distant speakers can be achieved in very low SINR environments. 2. Adaptive Microphone Array Algorithm 2.1 Generalized Sidelobe Canceller The GSC consists of three functional blocks: a FBF, a BM, and a multiple input canceller (MIC). Generally, the FBF Copyright c 2005 The Institute of Electronics, Information and Communication Engineers

JUNG et al.: ADAPTIVE MICROPHONE ARRAY SYSTEM WITH TWO-STAGE ADAPTATION MODE CONTROLLER 973 Fig. 2 Example of the power ratio method. ABM MIC Fig. 1 Generalized sidelobe canceller with an ABM. Table 1 Adaptation mode of the GSC. Target only Interference only Target + Interference Adaptation Filtering Filtering + Filtering only only Filtering Adaptation Filtering only + Filtering only is realized by a delay-and-sum beamformer, the BM by a fixed transform such as a delay-and-subtract or a Walsh transform, and the MIC by a traditional multi-channel adaptive noise canceller [9], [10]. In enclosures like room environments, there always exist multiple propagation paths, so called reverberation. Due to reverberation, the conventional BM fails to block the target signal in the noise reference signal path, and then the system performance degrades significantly. Constructing the BM as an adaptive filter is suggested for preventing target signal leakage. The filter coefficients of the ABM and the MIC are updated by an LMS algorithm [4], [10]. Figure 1 shows a block diagram of the GSC with an ABM [11]. 2.2 Adaptation Mode Controller Since the filter characteristics of the ABM and the MIC are totally inverse, adaptation of the ABM and the MIC should be performed alternatively [4], [7]. The filter coefficients of the ABM are updated when only the target signal exists, and those of the MIC are updated when only the interference signals exist. This kind of adaptation mode control is well known as a double-talk problem in the field of adaptive echo cancellation [12]. However, if the adaptation mode is wrongly controlled, i.e. if the filters of the ABM are updated when the interference signal exists, the output of the overall system will not be enhanced at all or will be totally cancelled out [4]. To prevent this, an efficient AMC that correctly detects signal existence should be developed. The adequate adaptation mode is summarized in Table 1. The AMC based on power ratio was proposed by Hoshuyama et al [7]. In their method, the power ratio of the FBF output and the blocked output was used as a criterion to distinguish the target from interferences as follows: P ratio (n) = P y FBF (n) P bi (n) P yfbf (n) = (1 λ)p yfbf (n 1) + λy 2 FBF (n) P bi (n) = (1 λ)p bi (n 1) + λb 2 i (n) where, P yfbf (n) is the estimated power of the FBF output, P bi (n) is the estimated power of the i-th blocked output, and λ indicates a forgetting factor for averaging. Then, the power ratio P ratio (n) is compared with the predetermined threshold T pwr. P ratio (n) tends to be a large value when the target signal exists. This method has several weak points. First, it cannot be used for training an ABM. When an ABM is not trained at all, i.e. in the initial state, the i-th output of the ABM, b i (n) isthesameasthei + 1-th input signal x i+1 (n). As a result, P ratio (n) tends to be close to 1 in any case when a target or an interference signal exists. Therefore, P ratio (n) cannot be a measure of the signals existence. The second problem is that it is hard to set the threshold to be suitable to both high and low SINR environments, especially when the interference signal is non-stationary, such as speech or music. Moreover, leakage from the ABM forces P ratio (n) tobe a small value even in target periods. Note that perfect blocking cannot be achievable in real environments, and there always exists some amount of the target signal leakage in the blocked signal paths. Figure 2 shows an example of P ratio (n) and the target signal when we use the power ratio method. In this case, we pre-trained the ABM because the power ratio method cannot be applied to train the ABM. The difficulty of setting the threshold and the misdetection of the target signal in the target free period is well illustrated in Fig. 2. As shown in the area (b), false alarms occur when the target signal is absent. False alarms cause the adaptive filters of the MIC to converge slower, but their effect is not so critical for performance. More serious problems are caused by the misdetection of the target signal as in the area (a). In this case, since P ratio (n)issmallerthant pwr, the MIC should be trained even though it is in the target signal period, which causes a wrong solution of the MIC, thus the output of the GSC will be seriously distorted, sometimes even cancelled entirely. 3. Proposed System In this paper, we propose a new AMC that can be used even (1)

974 IEICE TRANS. FUNDAMENTALS, VOL.E88 A, NO.4 APRIL 2005 in the situation when the system has not been initialized yet. Unlike the previous attempts, the proposed AMC utilizes not only temporal information but also spatial information of given input signals. In general, it is assumed that a target signal is located at spatially separated position with an interference signal, thus the spatial information of the signals will be greatly helpful to decide the proper adaptation modes. The goal of this paper is to design and to implement an AMC that can be applicable in real environments using the both spatial and temporal information. In the GSC, the ABM and the MIC should be trained sequentially. Since the change of the ABM also affects the relationship of the reference signal and the desired signal of the MIC, the filter coefficients of the MIC should be readjusted when the ABM changes. For this reason, our proposed AMC controls the adaptation modes of the ABM and the MIC in a sequential manner. The proposed AMC operates with two-stage modes: an initialization stage and a running stage, and the operation of the proposed AMC in each stage is described as follows. 3.1 Initialization Stage To develop the initialization stage of our AMC, we use two assumptions: 1. The rough region of the target speaker is known in advance: Signals coming from the target region are treated as the target signals, while signals coming from outside of the target region are treated as interfering signals. 2. When signals appear from the target region for the first time, no directional interferences are coming from outside of the target region. When the characteristics of the interference signals differ from those of the target signal, there exist a few methods which can classify each signal. But, when the interference signals have similar statistical characteristics as the target one, (i.e. both target and interference signal are speech) it is extremely hard to detect the desired signal without a prior knowledge, even though blind separation technique could be applied [13]. Using the first assumption, we set the target region to differentiate the target from the other signals. The range of the target region can be controlled arbitrary based on the type of applications used. The second assumption is made for the easier realization of real-time man-machine-interface systems. In other words, a single SSL technique is used for our application. It is well known that the performance of the single SSL methods degrades when there exist multiple sound sources [4]. If multiple SSL techniques are concerned, the second assumption needs not be mandatory. In the initialization stage, the incoming direction of the input signal is estimated by the SSL. Under the first assumption, we can detect the target-only duration when the signals come from the target region, and thus, the ABM can be trained properly. The time difference of arrival information of each microphone input is used for localizing the sound source [4]. With the knowledge of the microphone positions, estimated time differences between each sensor are used to generate hyperbolic curves which are then intersected to arrive at a source location estimate [4]. Due to its computational efficiency and statistically optimum property for noise environments, the generalized cross correlation phase transform (GCC-PHAT) [14] is employed in this work. No signal enhancement can be achieved because the MIC is not yet adapted and remains in an initial state. Therefore, the system output is the same as the FBF output. The initialization stage is performed as follow: Calculate R ij (n), the cross correlation of two filtered versions of x i (n), and x j (n) : R ij (n)= 1 ψ ij (ω)x i (ω)x j 2π (ω)e jωn dω (2) where X i (ω), X j (ω) denote Fourier transform of x i (n) and x j (n), and frequency dependent weighting function ψ ij (ω) is given as [14] 1 ψ ij (ω) =. (3) X i (ω)x j (ω) Find the argument ˆn ij of the maximum R ij (n): ˆn ij = arg max R ij (n). (4) n D Find the angle of ˆn ij If the estimated angle is within the target region, the ABM is adapted and the AMC procedure jumps to the running stage. If the estimated angle is outside of the target region, the overall step is repeated. 3.2 Running Stage After finishing the initial training step of the ABM, the proposed AMC controls the adaptation mode of the MIC. In the running stage, we adopt a double talk detection (DTD) technique. Based on the principle of orthogonality [10], the cross correlation between the FBF output and the system output becomes zero when only interference exists. Conversely, the cross correlation tends to have very large value when the target signal exists. With our previously developed cross correlation based DTD method [15], we can detect the existences of the target and the interference signals efficiently. Moreover, by its normalization property, it is easy to set the threshold [15]. The cross correlation coefficient, ρ FBF GSC, can be estimated recursively as follows: ρ FBF GSC (n) = P FBF GSC (n) PyFBF (n)p y GSC (n) (5) P FBF GSC (n) = λp FBF GSC (n 1) + (1 λ)y FBF (n)y GSC (n)

JUNG et al.: ADAPTIVE MICROPHONE ARRAY SYSTEM WITH TWO-STAGE ADAPTATION MODE CONTROLLER 975 Fig. 4 room. (a) Experimental environments. (a) Target HAR. (b) Experimental (b) Fig. 3 Flowchart of the proposed AMC. P ygsc (n) = (1 λ)p ygsc (n 1) + λy 2 GSC (n) where λ indicates a forgetting factor for averaging and P yfbf (n) is defined in (1). Note that the value of ρ FBF GSC is close to 1 when the target signal exists, and ρ FBF GSC becomes zero when interference only exists. In a real situation, ρ FBF GSC tends to be small value due to estimation errors and other real world problems. Using the property of ρ FBF GSC, we can update the MIC when ρ FBF GSC is less than the threshold T corr, i.e. when the interference signal is active. The existence of the interference signal can be detected by the signal energy level. A flowchart of the proposed twostage AMC procedure is summarized in Fig. 3. 4. Experiment Results We implemented the PC-based real-time system to evaluate the performance of the proposed system as a pre-processor of the man-machine-interface for a HAR system. The target HAR, developed by Samsung Advance Institute of Technology, is 62 cm tall and the body is near circular with an approximately 24 cm radius [16]. For speech processing, we use 8 omni-directional microphones placed around the robot s body. The distance between adjacent microphones is approximately 9 cm. An automatic speech recognition module is implemented and is combined with the MA system. The recognized words are displayed on LCD display. A living room-like experimental room is constructed to develop and ensure the efficiency of the proposed speech processing system. The experimental room is (6.1 m 4.4 m 2.6 m) and the reflection time is about 300 msec. To mimic a general living room environment, sofa, TV and stereo system are placed in the room. The HAR is located approximately 2 m from the sofa. During Fig. 5 AMC parameters (running stage). experiments, human speakers are located on the sofa and speak 2 5 syllable words. Rock music is played from the loudspeaker as an interference and the SINR is set to about 0 10 db. The HAR and the experimental room are depicted in Fig. 4. We evaluate the performance of our proposed system in terms of the mode detection in the AMC, the amount of speech enhancement, and the recognition score. 4.1 AMC Results In this section, only the second stage of the proposed AMC is evaluated. The performance of the proposed AMC at the first stage entirely depends on the accuracy of the SSL. Since we are mainly concerned with the performance of the AMC, we will not analyze the performance of the SSL module in this paper. Note that, however, the GCC-PHAT method provides sufficient accuracy to perform the first stage of the AMC in our experiment. The performance of the second stage is compared with that of the power ratio method. As mentioned in Sect. 2.2, the power ratio method cannot be used to train the ABM. Therefore, we also apply the same processing step of the first stage to the power ratio method. The AMC parameters of the second stage are given in Fig. 5. As we discussed earlier, the power ratio method fails to detect target existence, and it is difficult to set the decision threshold. On the con-

976 IEICE TRANS. FUNDAMENTALS, VOL.E88 A, NO.4 APRIL 2005 (a) (b) Fig. 6 Waveforms and spectrograms of: (a) 1st microphone input and (b) system output. Table 2 SINR and recognition results. Single channel Proposed SINR 5dB 18 db (13 db improved) Recognition score 61% 93% trary, no detection failure occurs and it is easy to determine with the proposed AMC. One can notice that false alarms appear in both the power ratio method and the proposed method. Though the false alarms force the MIC to freeze in the target absent periods and they slow convergence of the MIC, small amounts of false alarms could be tolerable for algorithm stability. However, the misdetection of the target existence affects the MIC deteriorately. In the case of false alarms, the proposed AMC is much more reliable than the power ratio method. 4.2 System Outputs and Recognition Results To verify the performance of the proposed system when it is combined with the HAR, we measure the SINR of the system input and output as well as the speech recognition rate. The time domain waveforms and spectrograms of the 1st microphone input and the system output are depicted in Fig. 6. About 13 db SINR improvement is achieved in this experiment. We do not include the results of the power ratio method because its performance is very poor in these situations. For the recognition system, a command set consisting of 40 words for control of the home appliance is used. Table 2 shows the recognition score and the SINR enhancement of the single channel system and the proposed system. The proposed method shows 32% higher recognition rate than the single channel method. 5. Conclusion In this paper, we proposed a new adaptive MA system with a two-stage AMC and implemented it with a real-time system for the man-machine-interface of the HAR. The proposed AMC utilized both temporal and spatial information of given signals for more proper control of adaptation modes, which will eventually improve the performance of the MA systems where the proper information is available. The proposed AMC outperformed the conventional power ratio method, and the implemented system acquired high quality speech from a distant speaker in noisy environments while it requiring only PC computing power. The recognition score of the implemented system was enhanced more than 30% compared to the single channel system. The implemented MA system could be used by any kind of manmachine-interface system for speech acquisition and recognition in real environments. References [1] L.R. Rabiner and B.H. Juang, Fundamentals of speech recognitions, Prentice Hall, 1993. [2] M. Omologo, P. Svaizer, and M. Matassoni, Environmental conditions and acoustic transduction in hands-free speech recognition, Speech Commun., vol.25, pp.75 95, 1998. [3] J. Bitzer, K.U. Simmer, and K.D. Kammeyer, Multi-microphone noise reduction techniques for hands-free speech recognition A comparative study, Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pp.171 174, Tampere, Finland, 1999. [4] M. Brandstein and D. Ward, Microphone Arrays, Springer, 2001. [5] S.Y. Low, N. Grbic, and S. Nordholm, Robust microphone array using subband adaptive beamformer and spectral subtraction, Proc. 8th International Conference on Communication Systems, vol.2, pp.1020 1024, 2002. [6] O. Hoshuyama, A. Sugiyama, and A. Hirano, A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters, IEEE Trans. Signal Process., vol.47, no.10, pp.2677 2684, Oct. 1999. [7] O. Hoshuyama, B. Begasse, A. Sugiyama, and A. Hirano, A realtime robust adaptive microphone array controlled by an SNR estimarte, Proc. IEEE International Conference on Acoustics, Speech, Signal Processing, pp.3605 3678, 1998. [8] J.E. Greenberg and P.M. Zurek, Evaluation of an adaptive beamforming method for hearing aids, J. Acoust. Soc. Am., vol.91, no.3, pp.1662 1676, March 1992. [9] L.J. Griffiths and C.W. Jim, An alternative approach to linear constrained adaptive beamforming, IEEE Trans. Antennas Propag., vol.ap-30, no.1, pp.27 34, Jan. 1982. [10] S. Haykin, Adaptive Filter Theory, Prentice Hall, 1991. [11] S. Gannot, D. Burshtein, and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Trans. Signal Process., vol.49, no.8, pp.1614 1626, Aug. 2001. [12] M.M. Sondhi, An adaptive echo canceler, Bell Syst. Tech. J., vol.46, pp.497 510, March 1967. [13] A. Hyvarinen and E. Oja, Independent component analysis: Algorithms and applications, Neural Netw., vol.13, no.4-5, pp.411 430, 2000. [14] C.H. Knapp and G.C. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoust. Speech Signal Process., vol.24, no.4, pp.320 327, Aug. 1976. [15] S.J. Park, C.G. Cho, C. Lee, and D.H. Youn, Integrated echo and noise canceler for hands-free applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol.49, no.3, pp.188 195, March 2002. [16] Y.W. Jung, J. Lee, D. Kong, J. Kim, and C. Lee, High-quality speech acquisition and recognition system for home-agent robot, Proc. IEEE International Conference on Consumer Electronics, pp.354 355, 2003.

JUNG et al.: ADAPTIVE MICROPHONE ARRAY SYSTEM WITH TWO-STAGE ADAPTATION MODE CONTROLLER 977 Hong-Goo Kang received the B.S., M.S., and Ph.D. degrees in electronic engineering from Yonsei University, Seoul Korea, in 1989, 1991 and 1995, respectively. He was a Senior Member of Technical Staff of AT&T, Labs- Research, from 1996 to 2002. In 2002, he joined the Department of Electrical and Electronic Engineering, Yonsei University, where he is currently an Assistant Professor. His research interests include speech signal processing, array signal processing and communication signal processing. Yang-Won Jung received his B.S. and M.S. degrees in Electronic Engineering from Yonsei University, Seoul, Korea in 1998 and 2000 respectively. He is currently a Ph.D. candidate in Department of Electrical and Electronic Engineering at Yonsei University, Seoul, Korea. His research interests include adaptive signal processing, echo and noise cancellation, 3D audio signal processing, and adaptive microphone array algorithm. Chungyong Lee received the B.S. and M.S. degrees in electronic engineering from Yonsei University, Seoul Korea, in 1987 and 1989, respectively, and the Ph.D. degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, GA, in 1995. He was a senior engineer of Samsung Electronics Co., Ltd., Kiheung, Korea from 1996 to 1997. In 1997, he joined the faculty of the Department of Electrical and Electronic Engineering, Yonsei University, where he is currently an Associate Professor. His research interests include array signal processing and communication signal processing. Changkyu Choi received the B.E., M.E., and Ph.D. degrees in Electrical Engineering from Korea Advanced Institute of Science and Technology in 1991, 1994, and 1999, respectively. He joined Human and Computer Interaction Laboratory at Samsung Advanced Institute of Technology, Kyonggi-Do, Korea, in 1999, where he is currently a senior engineer. His research interests are in the areas of speech enhancement, speech feature extraction, sound source localization, blind source separation, and microphone array signal processing for robotic systems. Jaywoo Kim received the B.S. degree in electronics and computer engineering from Korea University, Seoul, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from the Ohio State University, Columbus, in 1992 and 1995, respectively. From 1996 to 1999, he was a research engineer at Renault-Samsung Motors Technology Center, Kyonggi-Do, Korea. He joined Human and Computer Interaction Laboratory at Samsung Advanced Institute of Technology, Kyungki-Do, Korea, in 1999 where he is currently a senior researcher. His research interests include user interface, robotics, computer vision, speech recognition, and sound source localization/separation. Dae-Hee Youn received his B.S. degree in Electronic Engineering from Yonsei University, Seoul, Korea, in 1977, and the M.S. and Ph.D. degrees in Electrical Engineering from Kansas State University, Manhattan, Kansas, in 1979 and 1982, respectively. From 1982 to 1985, he was an assistant professor at the University of Iowa, Iowa City, Iowa. Since 1985, he has been with the Department of Electrical and Electronic Engineering at Yonsei University, Seoul, Korea, where he is currently a professor. His research interests include adaptive digital filter and its application, speech and audio signal processing, and real-time implementation of DSP algorithms.