Investeşte în oameni! FONDUL SOCIAL EUROPEAN Programul Operaţional Sectorial Dezvoltarea Resurselor Umane 2007 2013 Axa prioritară: 1 Educaţia şi formarea profesională în sprijinul creşterii economice şi dezvoltării societăţii bazate pe cunoaştere Domeniul major de intervenţie: 1.5 Programe doctorale şi postdoctorale în sprijinul cercetării Titlul proiectului: Proiect de dezvoltare a studiilor de doctorat în tehnologii avansate- PRODOC Numarul de identificare al contractului: POSDRU 6/1.5/S/5 Beneficiar: Universitatea Tehnică din Cluj-Napoca FACULTY OF ELECTRONICS, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY Ioana Homănă, Eng. PhD THESIS ACOUSTIC ECHO CANCELLATION USING ADAPTIVE FILTERS Scientific Coordinator Prof.dr.ing. Marina ŢOPA Ph.D. Comitee Componence: Coordinator: Members: - Prof. dr. ing. Gabriel Oltean Electronics, Telecommunication ans Information Technology Faculty, Technical University of Cluj-Napoca; - Prof. dr. ing. Marina Ţopa Electronics, Telecommunication ans Information Technology Faculty, Technical University of Cluj- Napoca; - Prof. dr. ing. Alexandru Isar - Electronics and Telecommunications Faculty, Politehnica University of Timisoara - Prof. dr. ing. Florin Sandu - Electrical Engineering and Computer Science Faculty, Referent, Transilvania University from Brasov - Prof. dr. ing. Sorin Hintea - Electronics, Telecommunication ans Information Technology Faculty, Technical University of Cluj- Napoca;Universitatea Tehnică din Cluj-Napoca.
Abstract Echo is a phenomenon in which a delayed and distorted version of an original sound or signal is reflected back to the source. Jacob Benesty Motivaţia Acoustic echo results from a feedback path set up between the speaker and the microphone in a mobile phone, hands-free phone, teleconference or hearing aid system. Acoustic echo may be reflected from a multitude of different surfaces, such as walls, ceilings and floors, and travels through different paths. If the time delay is not too long then the acoustic echo may be perceived as a soft reverberation, and may add to the artistic quality of the sound. Concert halls and church halls with desirable reverberation characteristics can enhance the quality of a musical performance. However, acoustic echo is a well-known problem in communication systems. Echo can severely affect the quality and intelligibility of voice conversation in a communication system. Therefore, echo cancelling is a critical condition in terms of improving the quality of voice in communication systems. Incăperea 1 ecoul AEC ecoul Incăperea 2 Figura Error! No text of specified style in document.-1. Basic configuration of an teleconferencing system. 2
Anexe The basic application of this thesis is an teleconferencing system, figure 1-1. The principal issue in acoustic echo cancellation (AEC) is to estimate the impulse response between the loudspeaker and the microphone of a hands-free communication device. Acoustic echo appears when the conference room is operating with open microphones and loudspeakers in full duplex mode. The transmitted signal is picked up by the open microphone and retransmitted to the near-end room. The result is that a person will hear his own voice coming back to him from the far room with a delay, thus the undesired echo or the acoustic echo, which affects the clarity and intelligibility of speech. The main goal of this thesis is to improve the communication quality, by eliminating the acoustic echoes. Therefore, several methods of adaptive filtering algoritms are presented and their performances are compared in order to establish which is the best method that can be used in this applications. Obiective şi concluzii Nowadays, hands-free communication devices are involved in many popular applications, such as mobile telephony and teleconferencing systems. Due to their specific features, they can be used in a wide range of environments with different acoustic characteristics. In this context, an important issue that has to be addressed when dealing with such devices is the acoustic coupling between the loudspeaker and microphone. In teleconferencing systems, the main challange is to achieve a good voice quality in a double-talk situation. A common problem during a conversation is the acoustic echo. This affects the quality of the speech signal and the conversation will not be clearly heard. The objective of this thesis was to improve the quality of communication by analyzing and applying the adaptive algorithms in acoustic echo cancellation applications. By analyzing the structures and the performances of the classical adaptive algorithms, such as Least Mean Square (LMS), Normalized LMS (NLMS), Variable step-size LMS (VSS - LMS), Variable stp-size NLMS (VSS - NLMS), Recursive Least Square (RLS) and other four types of variable step-size algorithms based on the NLMS algorithm, the Non Parametric VSS, the Simple VSS-NLMS, the New NPVSS-NLMS, Practical VSS-NLMS, we wanted to highlight the pros and cons of the algorithms and for a better understanding of the adaptation process. Thus, some conditions and criteria were specified (convergence speed, computational complexity, robustenss) in order to demonstrate which of the above 3
Abstract algoritms obtain the best results. Their performances were analyzed through Matlab simulations, where three measure of performance were used: the Mean Sqaure Error (determines the convergence speed of the adaptive agorithms), Echo Return Loss Enhancemenet (measures the level of echo suppression from the received signal) and the missalignment (it quatifies directly how well an adaptive filter converges to the impulse response of the system that needs to be identified). Finally, the algoritms were tested by applying them in different applications of acoustic echo cancelling, as system identification and in a double-talk situations. Also, in order to demonstrate the possibility of implementing and the practical applicability of the adaptive algorithms, the LMS and the NLMS algorithm were implemented on an Virtex-5 FPGA board from Xillinx, using an VHDL language. The adaptive algorithms presented in this thesis were analyezed taking in acount two situation that may occur in a teleconferencing system: 1. Sigle-talk scenario (only the far-end signal is present). In this case the performance of the classical adaptive algorithms: LMS, NLMS, VSS-LMS, VSS-NLMS and RLS were studied from the point of view of an system identification application, i.e. room impuls response estimation. The most reliable solution to this problem is the use of an adaptive filter that generates at its output a replica of the echo, which is further subtracted from the microphone signal. 2. Double-talk situation (the far-end and the near-end signal occur simultaneously). In orther to improve the sound quality, in this situation, two methods were study. One, where the double-talk detectors (DTD) are used, they slow down or switch off the adaptation of the adaptive filter during double-talk periods, to prevent it from divergence and second, without using a DTD in the structure of an acoustic echo canceller (AEC). In this case four types of variable-step-size adaptive algorithms have been used, the near-end speech is estimated from the error signal and further transmitted to the far-end talker. In a double-talk situation, the structure of an AEC consits from both an adaptive filter and a DTD. The detection of the near-end speech is based on a decision statistcs, the adaptation of the adaptive filter coeficients takes place when the decision statistics is bigger then a threshold. The decision statistic is equal to one when the near-end signal is present and less then one when the near-end signal consists from both far-end signal and the near-end speech. Therefore, in addition to the classical measure of performance (MSE and ERLE) the adaptive algorithms were also study from the point of view of double-talk detection (the 4
Anexe adjusments of the adaptive filter coefficients depends on the decision statistics value). In the second method, the structure of the AEC is based only on the variable step-size adaptive algorithms. Their capabilities and performances were analyzed in terms of classical criteria (convergence speed, robustness and traking capabilities) and a practical point of view (available parameters). The near-end signal has a great influence on the behavior of the adaptive filter, it can be seriously affected in this case, up to divergence. In practice, the near-end speech is not available, so it should be estimated taking into account the known parameters, which are difficult to control in practice. Thus, for real-world applications, it is hihly desirable to use non-parametric algoritms. Therefore, variable step-size NLMS adaptive algorithms have been proposed, which estmates the near-end speech using different approaches based: on the error signal (SVSS-NLMS), on the cross-correlation vector between the input and the error signal (New NPVSS-NLMS) and based on the microphone and the input signal of the adaptive filter (PVSS-NLMS). Among these, the most non-parametric is the PVSS-NLMS algorihm, since does not require the tuning of some unknown parameters. In terms of missaligment, the PVSS-NLMS algorithms outperforms the SVSS-NLMS and the New NPVSS-NLMS algorithms, whit or without using a double-talk detector, since the algorithm can handle double-talk without diverging. In terms of MSE and ERLE, the performances of the PVSS-NLMS algorithms, with or without using a double-talk detector, are almost the identical, which means that the adaptive algorithm can be used in a double-talk situation whitout using a double-talk detector. Comparing the performances of the classical adaptive algorithms from chapter 3 with those of the variable step-size algorithms, mentioned before, in terms of mean square error (faster convergence speed) but also from the ERLE point of view (30-40 db echo attenuation) are much better. In addition, the adaptation of the adaptive filter coefficients is not slowed down or switched off during the double-talk periods, converging to an optimal solution. Therefore, the PVSS-NLMS algorithm has proved to be suitable for real-world AEC application. Because the best performance was offered by adaptive algorithms based on the NLMS algorithm, section 5.4 proves once more the improved functionality of the NLMS algorithm compared to the LMS algorithm, both of them being implemented in VHDL, downloaded and tested on a Virtex-5 FPGA device from Xilinx. Even if the NLMS algorithm is more complex, not only from the computational complexity standpoint but also from the VHDL implementation standpoint, it s performances proved to be superior to the ones of the 5
Abstract LMS algorithm. The major difference between the basic method [PET04], applied for implementing adaptive filters in this paper, and the methods presented in [ELH06], [ANG09], [ANH10] and [KHA09] is that in the case of the method presented in this paper it takes log 2 N clock cycles for adapting all the N coefficients, because of parallel execution of multiple operations of the FIR filter, while in the case of the other methods it will take at least N clock cycles for adapting all the coefficients, because each coefficient of the FIR filter is processed separately. This means that the LMS and NLMS algorithms presented in this paper will be executed much faster, thus making them suited for the cases when applications require high processing speed between filter iterations or a significant number of filter coefficients while reducing the frequency of the system clock to reduce overall power consumption of the system. Therefore, the method presented in this paper brings a major advantage for al adaptive filtering applications that require a higher number of coefficients. The purpose of this thesis was to improve the quality of communication when using teleconference systems, therefore the contributions introduced here for this purpose were materialized into three main research directions: First section is dedicated to modeling and simulating the classic adaptive algorithms in Matlab, focusing on eliminating the acoustic echo; the adaptive algorithms were modeled and simulated in Matlab also accounting for double talk situations. Their functionality was guaranteed even under double talk situations by implementing and using double talk detectors; In the second part of the thesis, the adaptive algorithms with variable convergence step size were modeled and analyzed with Matlab. These were implemented and analyzed with and without double talk detectors (Geigel and the method proposed by Iqbal); their functionality proved to be fairly good under double talk conditions. In this thesis the algorithms were studied considering not only the robustness against voice signal variations from near speaker (using misalignment) but also the cancellation of acoustic echo (using MSE and ERLE), compared to [PAL08], [PAL09], [PAL10] where algorithms were studied just considering the robustness against voice signal variations; In the last part of the thesis, the LMS and NLMS algorithms were implemented using VHDL, programmed and tested on a Virtex-5 FPGA device from Xilinx. The improvements to the method proposed in [PET04] are: 6
Anexe o conversions of signal samples binary values from C2 to SM before multiplication/division operations, respectively from SM to C2 before addition/subtraction operations to facilitate obtaining the expected result; o applying the method of binary multiplication using fixed point representation of terms to obtain product values similar to the scaled representations of values used in Matlab; o extend the algorithm applicability from a fixed number of coefficients to any number of coefficients that is a natural power of 2; o verification and addressing of underflow and overflow conditions for additions and subtraction operations for maintaining results and algorithm performances close to the ones proved when modeling algorithm in Matlab; o division operation for NLMS algorithm was implemented based on the binary data division with fixed point representation using the comparison method. 7