DESIGN AND IMPLEMENTATION OF AN ADAPTIVE NOISE CANCELING SYSTEM IN WAVELET TRANSFORM DOMAIN. AThesis. Presented to

DESIGN AND IMPLEMENTATION OF AN ADAPTIVE NOISE CANCELING SYSTEM IN WAVELET TRANSFORM DOMAIN AThesis Presented to The Graduate Faculty of the University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Vladan Bajic August, 2005

DESIGN AND IMPLEMENTATION OF AN ADAPTIVE NOISE CANCELING SYSTEM IN WAVELET TRANSFORM DOMAIN Vladan Bajic Thesis Approved: Accepted: Advisor Dr. Okechukwu C. Ugweje Department Chair Dr. Alex De Abreu Committee Member Dr. George Giakos Dean of the College Dr. George Haritos Committee Member Dr. Igor Tsukerman Dean of Graduate School Dr. George Newkome Date ii

ABSTRACT This thesis is focused on the analysis and performance comparison between two methods of implementing adaptive filtering algorithms, namely the Normalized time domain Least Mean Squares (NLMS) algorithm, and the Wavelet transform domain LMS (WLMS) algorithm. A brief theoretical development of both methods is explained, as well as the advantages of performing the LMS class of algorithms in the wavelet transform domain. Then, both algorithms are implemented on the real time Digital Signal Porcesing (DSP) system used for audio signals processing. Two different wavelets, Daubechies 2 and Daubechies 3, are used for the comparison. Results are presented showing the performance of each program. The results are shown in both time and frequency domains. The time domain results show a very important characteristic of adaptive filters, that is, the convergence speed of the algorithms under different types of input signals. Also, the frequency domain resultsareshownforthesamealgorithmsandsame input signals. The frequency domain characteristics are analyzed after the adaptive filtering convergence. The noise produced by different algorithms was shown across the spectrum, and the distorting effects were analyzed. The trade-offs of convergence speed versus added noise were analyzed. Also, an analysis of different LMS algorithm implementations on a real time system was analyzed and shown. The overall results show the convergance speed improvement when using WLMS algorithms over the NLMS algorithm. iii

ACKNOWLEDGEMENTS I would like to acknowledge the help and extend my gratitude offered to me by many people during the completion of this thesis. I would like to thank my adviser, Dr. Okechukwu Ugweje, for his guidance and assistance throughout the project, and for making it possible. Also I would like to acknowledge the faculty of Electrical and Computer Engineering Department, especially my committee members, Dr. Igor Tsukerman and Dr. George Giakos for their help, likewise, to Dr. Dale Mugler of the Theoretical and Applied Mathematics Department for his support. To my parents, for their continuous love, belief and support in me, I will remain forever grateful for all the sacrifices they made to enable me to pursue my goals and dreams. Thank you to all of my colleagues at Audio-Technica US Inc. whose support during the past few years provided the much needed drive and motivation. Particularly, I am thankful to Ms. Jacquelyn Green for her unlimited patience, support and guidance; to Mr. Kelly Statham and Mr. Chris Henderson for their technical expertise and help. It would have never been possible to achieve this goal without all of your help. I would also like to acknowledge the support of many others who helped me along the way. iv

TABLE OF CONTENTS Page LISTOFTABLES... viii LIST OF FIGURES........... ix CHAPTER I.INTRODUCTION... 1 1.1Introduction... 1 1.2Motivation... 3 1.3ContributionofThesis... 5 1.4ThesisOutline... 7 II.THEORETICALBACKGROUNDOFADAPTIVEFILTERS... 8 2.1Introduction... 8 2.2 Adaptive Filtering Algorithms.... 14 2.2.1 Least Mean Squares Algorithms... 15 2.2.1.1 Auto-Correlation of InputSignal... 20 2.2.2 Normalized Least Mean Squares Algorithms....... 23 2.2.3 Recursive Least Squares Algorithms... 25 2.3WaveletTransforms... 26 2.4 Wavelet Transform Domain LMS Algorithm... 33 2.5PastandCurrentWork... 40 v

2.6Conclusion... 41 III. DESCRIPTION OF THE EXPERIMENT... 43 3.1Introduction... 43 3.2DescriptionofHardware... 43 3.2.1SignalDescription... 46 3.2.2 Hardware Implementations... 48 3.3DescriptionofSoftware... 52 3.4ImplementationIssues... 58 IV. EXPERIMENTAL RESULTS AND ANALYSIS... 61 4.1Introduction... 61 4.2ExperimentalMeasurements... 61 4.3PinkNoiseTestsWithEqualMSELevel... 64 4.3.1 Frequency Domain Analysis of Post Convergance Performance.... 64 4.3.2 Convergance Speed Performance... 69 4.4 Bandpass Noise Tests With Equal MSELevel... 72 4.4.1 Frequency Domain Analysis of Post Convergance Performance.... 72 4.4.2 Convergance Speed Performance... 76 4.5 Bandpass Noise Results With Maximized Convergance..... 79 4.5.1 Frequency Domain Analysis of Post Convergance Performance.... 79 4.5.2 Convergance Speed Performance... 83 V.SUMMARYANDCONCLUSION... 87 5.1SummaryofThesis... 87 5.2SummaryofResults... 88 vi

5.3FutureWork... 90 REFERENCES... 92 APPENDIX A. LEAST MEAN SQUARES ALGORITHM... 95 vii

LIST OF TABLES Table Page 3.1 Comparison of the Computational CostforDifferentAlgorithms... 60 4.1 Step Sizes for Different Algorithms... 63 viii

LIST OF FIGURES Figure Page 2.1 TimedomainLMSalgorithm... 9 2.2 Filter and decimate operationusedforwaveletdecomposition... 30 2.3 Wavelet transform domain LMS algorithm... 35 2.4 Level two wavelet transform decomposition... 37 3.1 Block diagram of system used for the experiment....... 44 3.2 The DSP board used in the experiment... 45 3.3 TheAudio-PrecisionSystem2usedintheexperiment... 46 3.4 The overall system setup........ 46 3.5 Flowchart of the algorithms...... 54 4.1 Post convergence error, NLMS algorithm with pink noise.... 65 4.2 Post convergence error, WLMS algorithm using DB2 with pink noise.... 66 4.3 Post convergence error, WLMS algorithm using DB3 with pink noise.... 67 4.4 Convergence of NLMS algorithm withpinknoise... 69 4.5 Convergence of WLMS algorithm using DB2 with pink noise........... 70 4.6 Convergence of WLMS algorithm using DB3 with pink noise........... 71 4.7 Postconvergenceerror,NLMSalgorithmwithbandpassnoise... 73 4.8 Post convergence error, WLMS algorithm using DB2 with bandpass noise 74 4.9 Post convergence error, WLMS algorithm using DB3 with bandpass noise 75 ix

4.10 ConvergenceofNLMSalgorithmwithbandpassnoise... 77 4.11 Convergence of WLMS algorithm using DB2 with bandpass noise...... 77 4.12 Convergence of WLMS algorithm using DB3 with bandpass noise...... 78 4.13 Post convergence error, NLMS algorithm with bandpass noise andmaximizedstepsize... 80 4.14 Post convergence error, WLMS algorithm using DB2 with bandpass noise andmaximizedstepsize... 81 4.15 Post convergence error, WLMS algorithm using DB3 with bandpass noise andmaximizedstepsize... 82 4.16 Convergence of NLMS algorithm with bandpass noise and maximized stepsize... 83 4.17 Convergence of WLMS algorithm using DB2 with bandpass noise and maximizedstepsize... 84 4.18 Convergence of WLMS algorithm using DB3 with bandpass noise and maximizedstepsize... 85 x

CHAPTER I INTRODUCTION 1.1 Introduction In the field of signal processing, there is a significant need for a special class of digital filters known as adaptive filters. Adaptive filtersareusedcommonlyinmanydifferent configurations, and for certain applications these filters have a great advantage over the standard digital filters. They can adapt their filter coefficients to the environment according to preset rules. The filters are capable of learning from the statistics of current conditions, and change their coefficients in order to achieve a certain goal. They are used in many different situations when it is impossible or inconvenient to use a pre-designed filter. In order for a filter to be designed, knowledge of the desired response is required a priori. When such a knowledge is not available, due to the changing nature of the filter s requirements, it is impossible to design a standard digital filter. In such a situations, adaptive filters are desirable. Adaptive filters continuously change their impulse response in order to satisfy the given conditions, and by doing so, change the very characteristic of their response. There are certain rules that filters use in order to adapt. The rules are dependent upon the configuration in which the filter is used, and also the desired goal of the filter. Applications of adaptive filters include the noise canceling, echo canceling, adaptive beam forming, equalizers for the telecommunication systems, just to mention a few. The algorithms used 1

to perform the adaptation, and the configuration of the filter depends directly on the use of the filter. However, the basic computational engine that performs the adaptation of the filter coefficients can be the same for different algorithms, and it is based on the statistics of the input signals to the system. The two classes of adaptive filtering algorithms namely Recursive Least Squares (RLS) [1] and Least Mean Squared (LMS) [3] are capable of performing the adaptation of the filter coefficients. Each algorithm presents certain advantages and disadvantages with regards to their performance and implementation. In most cases, the ultimate choice of the algorithm depends on trade-offs between performance gains and implementing costs. Therefore, there is a constant need and effort across the signal processing community for improving the current algorithms in terms of both increasing the performance and reducing the computational complexity. This thesis describes techniques to increase the performance of certain classes of adaptive filters while keeping their computational demand reasonably low. In addition, it presents real time experiments performed on the LMS algorithm. The basic idea is to use wavelet transforms prior to processing of the signal in order to obtain faster convergence of the filter coefficients. The system is implemented on a Digital Signal Processing (DSP) board [17]. Two different algorithms time domain LMS and the Wavelet transform domain LMS (WLMS) are presented, analyzed, and compared. The results obtained from the experiment are presented, and analyzed in detail. 2

1.2 Motivation As mentioned above, in the field of adaptive signal processing, there are mainly two classes of algorithms that are used to force the filter to adapt - LMS and RLS. Their implementations and adaptation properties are the determining factors for choice of application. The main requirements and the performance measures for adaptive filtersaretheconver- gence speed and the asymptotic error. The convergence speed enables us to measure how quickly the filter is converging to the desired value. This is the primary property of the filter. Since each adaptive filter, by its structure, is learning the properties of the signal from its past samples, the convergence speed measures how quickly the filter is learning. It is a major requirement and a limiting factor for most of the applications of adaptive filters. The asymptotic error represents the amount of error that the filter introduces at steady state after it has converged to the desired value. The RLS filters, due to their computational structure, have considerably better properties than the LMS filters both in terms of the convergence speed and the asymptotic error [1], [2]. The RLS filters which outperform the LMS filters obtain their solution for the weight updates directly from the Mean Square Error (MSE). However, they are very computationally demanding and also very dependent upon the precision of the input signal [1]. BothoftheserequirementsmaketheRLSfilters impractical for real time applications. Their computational requirements are significant, and imply the use of expensive and power demanding high-speed processors. Also, for the systems lacking the appropriate dynamic range, the adaptation algorithms can become unstable. In most cases, regular 16-bit Analog-to-Digital (A/D) converters do not provide enough dynamic range (or preci- 3

sion) for the filters to operate properly. These properties make the RLS filters unusable for most of the applications. The LMS class of algorithms provides a robust and computationally acceptable option for most of the applications. Most of the research in the area of adaptive filters today is performed on the LMS class of algorithms. However, the most significant problem with LMS algorithms is the slow convergence speed. The convergence speed is directly dependent upon the choice of certain parameters in the algorithm. There is a trade-off between the convergence speed and the asymptotic error that the algorithm creates. Any increase in speed will create a larger error. The problem with the slow convergence is especially evident in cases with a large number of filter coefficients, which are often necessary for systems with long time delay spreads. In order to compensate for these problems and be able to take advantage of the simplicity and robustness of the LMS class of algorithms, there are many different attempts to upgrade the algorithm, so that its convergence speed is improved [6], [11], [12], [13], [14]. The most common solution to this problem is to have the filter operate in transform domain, instead of time domain. Although this approach has the disadvantage of having to transform a signal before processing, and in most cases, inverse transform the signal after processing, it still represents a computationally savvy way to adapt filter weights with improved convergence. There are many different transforms that can be used for this purpose. The most common are the Fourier transforms in which adaptation of the filter coefficients is in the frequency domain. Other common transforms include the discrete cosine transforms, and 4

Walsh Hadamard transforms. In this thesis, we investigate the use of wavelet transform techniques in adaptive filtering. The wavelet transforms represent a convenient way to analyze the signal for several reasons. For example, Discrete Wavelet Transforms (DWT) can divide the original signal into subbands, and are capable of having perfect reconstruction[4]ofasignalpost processing. Also, DWT are not computationally demanding, and can be performedfairlyeasilyon a real time DSP system [10]. They have the ability to take advantage of multirate processing, enabling the system to operate at lower rates depending on the subband [4], [8]. They also have the ability for the choice of wavelet structure, and the ability to perform dyadic division of the original signal into subbands. These are some of the advantages that the wavelet transforms offer. Hence, the motivation for the thesis is the search for computationally efficient algorithm to perform LMS adaptive filtering, with improved convergence speed compared to the basic time domain LMS. The wavelet transform LMS is considered, and compared to the time domain LMS. The entire systems both in time domain, and wavelet domain are realized on a hardware setup on an operational DSP chip performing both of the algorithms. The experimental results are presented and analyzed. 1.3 Contribution of Thesis The theory of LMS adaptive filtering is well known. The basic time domain LMS and its variations, such as the Normalized LMS (NLMS), leaky NLMS, sign error LMS, and the frequency domain LMS, have been used extensively in industry. However, due 5

to the constant need for improvements in performance and computational efficiency, new algorithms are constantly under investigation. The wavelet transform domain LMS algorithms have been introduced, and analyzed in the past. Many important published papers on the subject are summarized in Chapter 2. However, because wavelet transform is a fairly new method of processing signals, not much published work on the subject of using wavelet transforms for LMS algorithms is available, unlike the Fourier transform domain LMS (frequency domain LMS). This is especially true for the real time systems. The core of this thesis is the experimentation of using wavelet transforms to improve the performance of time domain LMS algorithms. The contribution of the thesis can be itemized as follows: a) A theoretical derivation of the NLMS and the WLMS adaptive filtering algorithms, including the mechanisms used in implementation. b) The NLMS and the WLMS algorithms are implemented on a real time embedded system. The advantages and disadvantages of both techniques from the implementation standpoint were investigated. c) Results from experiments are obtained and analyzed for both algorithms with different input signals. The performance of NLMS and WLMS was compared, and conclusions were drawn with respect to the Mean Squared Error (MSE) level and the convergence speed. Furthermore, two different WLMS algorithms were implemented by using different wavelets - the Daubechies 2 wavelets and the Daubechies 3 wavelets. d) The costs of implementing the algorithms are analyzed and considered as a trade-off factor with respect to the performance criteria. 6

1.4 Thesis Outline The work presented in this thesis is organized as follows. In Chapter 2, the theoretical background of adaptive filters and associated algorithms is presented. It includes the mathematical theory of adaptive filters, introduction to wavelet transforms, the development of analytical expressions used in the experiment, and the advantages of using transforms in adaptation of filter weights. Also, recent developments in the field of adaptive filtering are summarized. In Chapter 3, the experimental description is presented, which includes the hardware used to embed the algorithm as it presents an important implementation issue. Analysis of the software used, and detailed discussion of the issues encountered while implementing the algorithm are also presented. Also, the trade-offs in different design decisions made while implementing the algorithms are presented. This chapter concludes with a brief description of the measurement equipment used for the experiment. Chapter 4 presents the results obtained from the experiment, with detailed explanation of the results, as well as observations and comparisons between the results obtained from time domain and transform domain LMS algorithms. Several results obtained for different algorithms under different conditions are analyzed. The summary of the thesis, and conclusions based on the results obtained are presented in Chapter 5. Suggestions for the further improvement of the algorithm and possible direction that could be taken to improve system performance, were also presented. 7

CHAPTER II THEORETICAL BACKGROUND OF ADAPTIVE FILTERS 2.1 Introduction Adaptive filters in general are used very often in the industry. The application which they are intended for determines the particular configuration of the filter. The most commonly used versions are in noise/echo cancellation algorithms, system determination, beamforming applications, and many more. The application used in this thesis is for noise cancellation. Analysis is provided in time domain, as well as in wavelet transform domain. Consider the adaptive filter representation shown in Figure 2.1. The adaptive filter has two inputs: the primary input d(n), which represents the desired signal corrupted with undesired noise, and the reference signal x(n), which is the undesired noise to be filtered out of the system. The primary input is therefore comprised of two portions - one is the desired signal, and the other one is noise signal corrupting the desired portion of the primary signal. The goal of adaptive noise cancelling systems is to reduce the noise portion, and to obtain the uncorrupted desired signal. In order to achieve this task, a reference of the noise signal is needed. Thatreferenceisfedtothesystem,and it is called a reference signal x(n). However, the reference signal is typically not the same signal as the noise portion of the primary signal - it can vary in amplitude, phase or time delay. Therefore the reference 8

signal cannot be simply subtracted from the primary signal to obtain the desired portion at the output. The basic idea is for the adaptive filter to predict the amount of noise in the primary signal, and then subtract that noise from it. The prediction is based on filtering the reference signal x(n), which contains a solid reference of the noise present in the primary signal. The noise in the reference signal is filtered to compensate for the amplitude, phase and time delay, and then subtracted form the primary signal. This filtered noise is the system s Figure 2.1: Time domain LMS algorithm. prediction of the noise portion of the primary signal, and it is also known as the regressor signal y(n). The resulting signal is called error signal e(n), and it presents the output of the system. Ideally, the resulting error signal would be only the desired portion of the primary signal. 9

In practice, it is difficult to achieve this, but it is possible to significantly reduce the amount of noise in the primary signal. This is the overall goal of the adaptive filters. This goal is achieved by constantly changing (or adapting) the filter coefficients (weights). The adaptation rules determine their performance, and the requirements of the system used to implement the filters. A good example to illustrate the principles of adaptive noise cancelling is the noise removal from the pilot s microphone in the airplane. Due to the high environmental noise produced by the airplane engines, the pilots s voice in the microphone is distorted with a high amount of noise, and can be very difficult to understand. In order to overcome the problem, an adaptive filter can be used. In this particular case, the desired signal is the pilot s voice. This signal is corrupted with the noise from the airplane s engines. Combined, the pilot s voice and the engine noise constitute primary signal d(n). Reference signal for the application would be a signal containing only the engine noise, which can be easily obtained from the microphone placed near the engines. This signal would not contain the pilot s voice, and for this application it is the reference signal x(n). Adaptive filter shown in Figure 2.1 can be used for this application. The filter output (regressor signal) y(n) is the system s estimate of the engine noise as received in the pilot s microphone. This estimate is subtracted from the primary signal (pilot s voice plus engine noise), and at the output of the system e(n) should contain only the pilot s voice without any noise from the airplane s engines. It is not possible to subtract the engine noise from the pilot s microphone directly, since the engine noise received in the pilot s microphone, and the engine noise received in the reference microphone are not the same signal. There are differences in amplitude and time delay. Also, these differences are not fixed. They change in time with pilot s 10

microphone position with respect to the airplane engine, and many other factors. Therefore designing the fixed filter to perform the task would not obtain the desired results. The application requires adaptive solution. There are many forms of the adaptive filters, and their performance depends on the objective set forth in the design. Theoretically, the major goal of any noise cancelling system is to reduce the undesired portion of the primary signal as much as possible, while preserving the integrity of the desired portion of the primary signal. Since there are many adaptive filters operating in a variety of ways with different goals, the performance measures also change based on the desired application of the filter. Ultimately, the goal of adaptive noise canceling filter is noise removal. The extent and the rate at which this goal is achieved, determines the algorithms that can be used for this application. As noted above, the filter produces estimate of the noise in the primary signal, adjusted for magnitude, phase and time delay. This estimate is then subtracted from the noise corrupted primary signal, to obtain the desired signal. In order for the filter to work well, the adaptive algorithm has to adjust the filter coefficients such that, output of the filter is a good estimate of the noise present in the primary signal. In order to determine the amount by which noise in the primary signal is reduced, the mean squared error technique is used. The Minimum Mean Squared Error (MMSE) is defined as [1] min E d(n) XW t 2 =min E[d(n) W W y(n)]2 (2.1) 11

where d is the desired signal, X and W are the vectors of the input reference signal, and the filter coefficients, respectively. This represent the measure of how well the newly constructed filter (given as a convolution product y(n) =XW) estimates the noise present in the primary signal. The goal is to reduce this error to a minimum. Therefore, the algorithms that perform adaptive noise cancelation are constantly searching for a coefficient vector W, which produces the minimum mean squared error. Minimizing the mean squared of the error signal minimizes the noise portion of the primary signal, but not the desired portion [3]. To best understand this principle, recall that the primary signal is made of the desired portion, and the noise portion. The reference signal y(n) is a reference of the noise portion of the primary signal, and therefore is correlated with it. However, the reference signal is not correlated with the desired portion of the primary signal. Therefore, minimizing the mean squared of the error signal minimizes only the noise in the primary signal. This principle can be mathematically described as follows. If we denote the desired portion of primary signal with s, and the noise portion of desired signal as c, it follows that d(n) =s(n)+c(n). As shown in Figure 2.1, he output of the system can be written as [3] e(n) = d(n) y(n) e(n) = s(n) +c(n) y(n) e(n) 2 = s(n) 2 +(c(n) y(n)) 2 +2s(n)(c(n) y(n)) E[e(n) 2 ] = E[s(n) 2 +(c(n) y(n)) 2 +2s(n)(c(n) y(n))] E[e(n) 2 ] = E[s(n) 2 ]+E[(c(n) y(n)) 2 ]+2E[s(n)(c(n) y(n))]. 12

Due to the fact that the s(n) is uncorreleted to both c(n) and y(n), as noted earlier, the last term is equal to zero, so we have [3] E[e(n) 2 ] = E[s(n) 2 ]+E[(c(n) y(n)) 2 ] min W E[e(n)2 ] = min W E[s(n)2 ]+min E[(c(n) W y(n))2 ], and since s(n) is independent of W,wehave min W E[e(n)2 ]=E[s(n) 2 ]+min E[(c(n) W y(n))2 ]. Therefore, minimizing the error signal, minimizes the mean squared of the difference between the noise portion of the primary signal c(n), and the filter output (regressor signal) y(n) = XW. Adjusting the weights to achieve this task does not affect the desired portion of the primary signal[3]. The smallest possible output is achieved when E[e(n) 2 ]=E[s(n) 2 ], resulting in a noise free output [3]. The optimal filter coefficients (W O ) to perform this task are given by [2] R xx W O = P xd, (2.2) where R xx is the auto-correlation of the reference signal, and P xd is the cross-correlation of the reference and the primary signals. The mathematical derivation of (2.2) is given in Appendix A. This equation gives the expression for the optimal filter coefficients. All of the algorithms are measured based on how close actual coefficients approach this optimal solution, and how quickly it can be achieved. 13

Obtaining the coefficients in this manner is the ultimate goal of the adaptive noise cancelling filters. It is evident from (2.2) that the optimal weights can be written as W O = R 1 xx P xd. (2.3) This shows what the filter coefficients should be in order for the filter to obtain the best solution, that is, a MMSE solution. From (2.1), the MMSE can be written as [1] J ms = E [d XW] 2 = E (d XW O )(d XW O ) = E (d XW O )d = σ 2 d P xd R 1 xx P xd, where J ms is the mean squared error, and σ 2 d is the variance of the desired signal, and E [.] denotes the expected value. This shows that the minimum squared error depends on the variance of the desired signal σ 2 d. The variance plays a significant roll in the performance of adaptive filters. 2.2 Adaptive Filtering Algorithms In adaptive filtering, many different algorithms have been developed. They vary widely in structure and approach, therefore yielding the different results. Since most of the filters are designed to be implemented in real time, the amount of processing required is crucial in the applications and design of adaptive systems. Due to the requirements, there have been two classes of the adaptive filtering algorithms developed. These are the Least Mean Square (LMS) algorithm [3], and Recursive Least Squares (RLS) algorithm [1], [2]. 14

2.2.1 Least Mean Squares Algorithms TheLMSalgorithmisapractical implementation used to find the optimal filter coefficients in real time environment. Its major advantage is the simplicity of realization. Computations do not involve matrix inversion, and therefore are considerably more practical to implement. The basic idea of the LMS algorithm is the gradient search method [2]. This technique involves initially estimating the coefficient vector, and then iteratively updating it in the appropriate direction. At each point in time, the coefficient vector changes from its current estimate, to the next estimate of the optimal coefficients. It is shown in Appendix A, that the error is a quadratic function of the coefficients that never goes negative (see A.2). Therefore, adjusting the coefficients to minimize the mean squared error means descending along this quadratic function, with the objective of getting to the minimum of the function [3],thepointwheretheerrorisminimized. ThisoccurswhenW = W O,whichimplies that at this point the current filter coefficients equal the optimal coefficient values. Without directly evaluating (2.3) which would involve the matrix inversion, we cannot know the value of W O. However, we can be assured that the algorithm moves the filter coefficients in the appropriate direction with each iteration by taking the first derivative of the mean squared error function. That is, the value is moved in the direction that would reduce the error. The new value can be determined iteratively, and can be expressed as [3] W (n +1)=W (n) µ J, (2.4) where J is the gradient of the mean squared error, n is time index, and µ is the step size. The step size is a crucial factor in the algorithm performance and implementation. 15

It determines the convergence rate, and the asymptotic error that the algorithm creates [2], [7]. The LMS algorithm estimates an instantaneous gradient by assuming that the square of the single error sample is an estimate of the mean squared error. The gradient is obtained by differentiating the squared error signal e 2 (n) definedin(a.1)withrespecttow, and is givenby[3] J = ½ djms dw The gradient can also be expressed as J = de 2 (n) dw 0 de 2 (n) dw 1. de 2 (n) dw N 1 ¾ ½ ¾ de [e 2 (n)] = = dw =2e(k) de(n) dw 0 de(n) dw 1. de(n) dw N 1 de[e 2 (n)] dw 0 de[e 2 (n)] dw 1. de[e 2 (n)] dw N 1. = 2e(n)X(n). (2.5) Substituting (2.5) into (2.4) we obtain the basic form of the coefficient update for the LMS algorithm, which can be expressed as W (n +1)=W (n)+2µe(n)x(n). (2.6) This provides the amount by which the filter coefficients is updated in eachiterationof the program. It is evident from (2.6) that the filter coefficients are updated iteratively in a straightforward method. The second term in (2.6) is the correction factor (i.e. 2µe(n)X(n)). It is made up of the step size µ, which determines the amount by which the coefficients are allowed to change, the current error sample e(n) multiplied by the current weight vector X(n). 16

The weight update is performed at each program iteration for each new set of input samples. The correction factor is an estimate of the direction and the magnitude by which the current set of filter coefficients needs to change in order to reduce the amount of error. Since LMS algorithm iterative, it does not provide the optimal coefficient vector at each evaluation, but it moves the current set of coefficients in the appropriate direction. After a number of iterations, the filter coefficients can achieve their optimal state. How quickly the filter coefficients reach the optimal set of coefficients is called the convergence speed, which depends on a number of different conditions such as step size, error magnitude, reference signal magnitude, and statistical properties of the input signal. The statistical properties such as auto-correlation matrix and its eigenvalue spread are very important parameters in the LMS algorithm. It is important to mention that weight update given by (2.6) is simple and easy to implement in a real time system. This accounts for the wide spread use of LMS algorithms in adaptive filtering. However, the approximations and assumptions made in its derivations are limiting factor. Recall the block diagram representation of the time domain LMS algorithm that is showninfigure2.1.thissetupisimplemented for the Adaptive Noise Canceling (ANC) applications. As shown in Figure 2.1, N samples of the input reference signal are collected and used as an input to a Finite Impulse Response (FIR) filter with weight vector W.The filter output y(n) is then subtracted from the primary signal d(n). This creates the error signal, which is the output of the algorithm. The error is also used as an input for the LMS weight update. The LMS gives the updated weights, which are to be used with the next program iteration. The entire LMS algorithm can be summarized as follows: 17

a) The filter output is calculated by convoluting the reference input with the current coefficients set y(n) =W t (n)x(n), b) The error is calculated by subtracting the filter output from the primary signal e(n) =d(n) y(n), c) The filter coefficients are updated by using W (n +1)=W (n)+2µe(n)x(n). An important parameter regarding the algorithm stability and convergence speed is the step size, which is directly related to the performance and stability of the adaptive algorithm. The step size µ is a small positive constant, which determines the amount of change introduced to each coefficient during every update. If µ is made too small, the algorithm will take more time to converge, since it will need to take more steps to arrive at the bottom of the performance function curve [2], [3]. On the other hand if µ is too large, the algorithm will go past the bottom of the curve, and it will never achieve the result that is close enough to the desired solution. Therefore, the asymptotic error is increased. Furthermore, the algorithm can become unstable should the step size be made too large. As any particular weight (w j )ofthevectorw (where W =[w 0 w 1 w N 1 ])isapproaching minimum value of the quadratic equation representing the mean squared error J ms,itdoessointhestepsofsizeµe(n)x(n). Once this bottom point or minimum of J ms is approached at a distance less than the next step size, the next step will continue 18

along J ms past the minimum point. At that moment, the sign of djms dw j will change, and the next step will go in the opposite direction along J ms. Thatstepwilltakeitagainpast the minimum point, and with every new step, the weight w j will increase and decrease by µe(n)x(n j) around the optimal point. It may never get to the bottom of the performance curve. So, by making the constant µ small, the oscillations of the weight from its optimal value will be small, resulting in the weights always being at the values close to the optimal. However, small µ will also mean long convergence time, since the algorithm will require more steps to get to the minimum point of the performance function (the bottom of the quadratic equation curve). Hence in LMS algorithm, there is a trade-off in terms of the speed versus the precision obtained at the minimum point. The faster it is, the more weights have to oscillate around the optimal value. However, the limits to the value of the step size, for the algorithm to be stable are given by [1] 0 <µ< 2 λ max, (2.7) where λ max is the largest eigenvalue of the auto-correlation matrix of the input signal. The value of λ max changes with the input signal, and µ hastobeselectedinsucha manner to satisfy (2.7). Otherwise, the algorithm can become unstable, even though FIR filter is used. Since the filter weights are not fixed, but adaptive, it is possible for the algorithm to diverge, and not to converge to any particular output. Therefore, care must be taken while selecting the appropriate step size. 19

2.2.1.1 Auto-Correlation of Input Signal The auto-correlation matrix of the input reference signal statistically determines the convergence properties of the algorithm. In [2], it is shown that the difference in outputs between an adaptive filter using the optimal filter coefficients, and the non-optimal coefficients depends only on the reference signal properties. It is a function of the input auto-correlation matrix, given as [2], [3] R xx = E X(n)X(n) t x(n)x(n) x(n)x(n-1) x(n)x(n-2)... x(n)x(n-n+1) x(n-1)x(n) x(n-1)x(n-1) x(n-1)x(n-2).... x(n-2)x(n) x(n-2)x(n-1) x(n-2)x(n-2).... = E.............. x(n-n+1)x(n)..... x(n-n+1)x(n-n+1) x(n) 2 x(n)x(n-1) x(k)x(k-2)... x(k)x(n-n+1) x(n-1)x(n) x(n-1) 2 x(n-1)x(n-2).... x(n-2)x(n) x(n-2)x(n-1) x(n-2) = E 2.... (2.8).............. x(n-n+1)x(n)..... x(n-n+1) 2 From (2.8), observe that input auto-correlation matrix is a symmetric matrix with the power of the signal appearing along the main diagonal. For real signal it is symmetric, and for complex signals it is conjugate symmetric. It is also a positive definite matrix, except in some rare cases when it can be positive semi-definite as shown in [1], [2]. When R xx is positive definite, it is invertible, and therefore the optimal weight W O = R 1 xx P xd has a unique solution. That is, the matrix is non-singular. This is the desired case. It is possible, however, for the matrix to be singular. The eigenvalue decomposition of R xx is a very useful indicator of the properties of input signal, and therefore the performance of the algorithm. The eigenvalues of auto- 20

correlation matrix decuple the input signal into different modes, each mode corresponding to the appropriate coefficient. This can be best seen if matrix Λ is created such that Λ = diag(λ 1, λ 2, λ 3,..λ N 1 ),whereλ n are the eigenvalues of the auto-correlation matrix. It is shownin[2]thatλw o = P xd, where P xd is the cross-correlation vector between input reference and primary signals. Since Λ is a diagonal matrix, it can be shown that λ i w o i = p i = w o i = p i λ i, (2.9) which shows that each uncoupled weight w o i can be presented as a ratio of the appropriate cross-correlation term, and its corresponding eigenvalue. From (2.9) it is easily seen that if R xx is singular, then one or more of its eigenvalues will be zero, resulting in some filter coefficients being undriven. In that case, no change in error is observed for any change in the undriven coefficient. However, if matrix R xx is nonsingular, the solutions for (2.6) (also known as the normal equations [2]) are unique. WhileitispossibleforR xx to be singular, for most well-behaved input signals (wide sense stationary), R xx will be nonsingular, and the unique solutions for the optimal weights are obtainable. Another important parameter is the convergence speed. Ultimately, convergence speed of the algorithm directly depends on the spread of eigenvalues of R xx, [1], [2], [7]. The ratio ρ = λmax λ min determines how quickly the algorithm converges to a desired value. The convergence speed increases as ρ approaches unity. It is shown in [1], that the slowest convergence mode is Slowest convergance = ρ 1 ρ +1 = λ max λ min λ max + λ min. (2.10) 21

Minimizing the slowest convergence calls for reducing the ratio λmax λ min as close as possible to unity, and that means reducing the spread of the eigenvalues of the matrix R xx.thisisa very important principle of the LMS algorithms. As indicated above, adaptive filters due to their nature of constantly changing their impulse response can become unstable. The stability of LMS algorithm can also be analyzed through the eigenvalues of R xx [1], [2], [3], [11]. In order to make the algorithm stable, there are limits to the step size µ given in (2.7), since it determines the bounds on the step size, which in turn determines the speed of adaption. The eigenvalue spread of the auto-correlation matrix will be wide if input reference signal is highly correlated [1], [2], [3], [6]. The smaller eigenvalues of R xx result in a slower convergence, and the larger eigenvalues limit the step size, and therefore the learning capability of the filter [1]. Better results are obtained if the reference signal is white - uncorreleted sample to sample. In that case, R xx = σ 2 xi, where σ 2 x is the variance of the input reference signal, and I is the identity matrix, resulting in all of the signal s power concentrated on the main diagonal of matrix R xx. This follows from definition of the autocorrelation matrix. For that reason many methods to pre-whiten the input signal were developed [6], [11], [22]. These methods seek to reduce the spread of eigenvalues, and in that manner increase the convergence speed while keeping the algorithm stable. This is typically done by applying a certain transform method to the input signal. The transforms commonly used are the Discrete Fourier Transforms (DFT) [1], [22], and the Discrete Cosine Transforms (DCT) [1], [11]. Unlike most research, we apply the wavelet transform techniques in order to reduce the spread of the eigenvalues. 22

2.2.2 Normalized Least Mean Squares Algorithms A subclass of least mean square algorithms that is very commonly used is called the Normalized Least Mean Square (NLMS) [3]. These algorithms are upgrades to the standard LMS algorithms, and it is intended to increase convergence speed, and stability of the algorithms. The reason for their development is the dependance of LMS adaptive filters on the step size µ. Thetrade-offsregardingthestepsizecanhaveaprofoundimpacton the applications. Increasing µ increases the speed, as well as the asymptotic error, and hence decreases stability. For that reason, a scaling factor for µ is added to (2.6) in order to constantly scale the step size, and in doing so to increase the algorithm s performance while increasing the stability. This is achieved by normalizing the step size µ with an estimate of the input signal power, such that µ(n) = µ γ + kx(n)k 2, (2.11) where kx(n)k 2 is the squared norm of input signal vector, and γ is a small positive constant used to insure that if kx(n)k 2 is zero or close to it, the instability due to division by zero is avoided. The denominator in (2.11) is used in the algorithm as an instantaneous estimate of signal power. Since new step size µ(n) is dependant on the input signal power, it will change with each iteration of the program. When input signal is large, the correction factor for LMS (2µe(n)X(n)) also increases due to its direct dependance on the X(n) vector. Therefore, µ has to be made small enough to insure the stability. 23

On the other hand, when input is small, the correction factor is also small, therefore reducing the convergence speed. The normalization factor, however, is inversely proportional to the signal instantaneous power. So, when the input signal is weak, it increases the step size proportionally, and when the input is large, it reduces the step size accordingly. It acts to both increase the stability, and increase the convergence speed by allowing for larger step size to be used. Therefore from (2.6) the weight update equation for NLMS is givenby[1] µ W (n +1)=W (n)+2 2 e(n)x(n). (2.12) γ + kx(n)k Another way to approach the issue is to observe normalization factor as a normalization for the input signal magnitude. The value kx(n)k 2 is a norm of the input signal vector squared, and it can be seen as a factor used to influence the update expression mostly for the purpose of direction, and not the magnitude. The squared norm of the input signal vector (power) is calculated as kx(n)k 2 = NX (X(n k)) 2, (2.13) k=0 where N is the length of adaptive filter. In practice, however, the signal power estimate is calculated by using the recursive equation [2], [3] P (n +1)=βP (n)+(1 β)x(n) 2 (2.14) where β (0 β 1) is know as the forgetting factor. Essentially, (2.14) presents an estimate of the input signal power. Value of β determines the memory of the filter. If β is made large (close to 1), then the estimate of signal power P (k +1)depends predominantly 24

on the value of previous power estimate P (n), and the new estimate changes slowly with each new sample X(n). Generally, β is supposed to be close to unity. The reason for using (2.14) versus (2.13) in real time systems is to reduce the number of calculations required to compute the signal power,andalsotoincreasethememoryof signal power estimate. If (2.13) is used, then signal power estimate cannot extend past the number of previous input samples stored in the system memory. However, with (2.14) the estimate for the power can be extended by reducing the value of the constant β. Also, since (2.14) is recursive, it can be implemented efficiently on a per sample basis. The NLMS algorithms are the most commonly used in adaptive filtering in practice due to their simplicity in both structure and implementations. They do add more complexity to the standard form of LMS due to the fact that a division by a normalization factor is necessary. However, normalization factor is computed once per program iteration, and the total complexity added is not beyond what is achievable in a standard DSP application. Also, step size trade-offs are significantly helped by the added flexibility. 2.2.3 Recursive Least Squares Algorithms The Recursive Least Squares (RLS) class of algorithms operate in a different manner than the LMS. They seek the solution directly from (2.3). The optimal coefficient vector is calculated in each iteration, providing the filter with a better performance with respect to the LMS class of algorithms [2]. However, this performance improvement comes at the cost of inverting the matrix R xx, which involves significantly larger number of required computations, many of which are simply not practical in real time system implementations. Implementing most of the RLS algorithms would require powerful DSP chips, or FPGA 25

processors. A comparison between different algorithms available, and their implementing costswillbegivenintable4.1. There are algorithms involving more efficient methods to implement RLS based adaptive filters. Some of these algorithms involve reducing the number of required computations to obtain R 1 xx (n +1)matrix from R 1 xx (n) matrix. They are classified as the fast RLS algorithms [1]. Most of the fast RLS algorithms require substantial computing power, and in most cases sufficient resolution of the input signals. For most fast RLS algorithms, a sixteen bit precision is not sufficient, and it can lead to the instability of the algorithm. Primarily for these reasons, LMS algorithms and their variations in the transform domain remain a more commonly used approach for adaptive filtering applications. 2.3 Wavelet Transforms In the past, Fourier transforms were used to project a signal onto the subspace of orthogonal vectors, but Fourier transforms have a undesired property when analyzing certain signals. Due to the uncertainty principle, Fourier transforms cannot accurately present the signal in both time and frequency domains, particularly in the case of real time system realizations. The wavelet transforms have an advantage in this respect. For the purpose of this thesis, wavelet transforms were used to improve the properties of time domain LMS algorithms. Wavelet transforms have some desirable and useful properties for analyzing real time signals, and they present a flexible tool for multi-resolution analysis of continuous time signals [4]. Their implementation is fairly easy and straight forward. In wavelet transforms, the signal from time domain is changed to a weighted sum of translates and dilates of a mother wavelet. The multi-resolution analysis provides means of 26

constructing orthonormal bases of wavelets spanning the space L 2 (R) of square integrable functions. They can be grouped by their scaling constants into the disjoint subsets spanning proper and orthogonal subspaces of L 2 (R). These subsets that correspond to different scales are said to represent the signal at different resolution levels [6]. Wavelets transforms can be classified as Continuous Wavelet Transforms (CWT), or Discrete Wavelet Transforms (DWT). Due to the digital nature of the experiment, in this thesis, we are primarily concerned with DWT. A continuous time signal can be presented in wavelet domain as [6] X x(t) = d jk ψ jk (t) = i,k Z X X X d jk ψ jk (t)+ c 0k ϕ ok (t) j=0 k Z k Z where Z is set of integers, ϕ jk (t) =2 j 2 ϕ(2 j t k) is an orthonormal basis derived from the scaling function ϕ(t) (for the subspace V j V j+1,) and the wavelet function ψ jk (t) = 2 j 2 ψ(2 j t k) constitutes an orthonormal basis (for the subspace W j V j+1 V j )derived from ψ(t) [6]. Also, d jk and c 0k are wavelet coefficients used to represent the original time domain input vector X(n) in the wavelet domain. These subspaces constitute a multiresolution analysis on L 2 (R). Thevaluesofk and j determine the time domain shift and scale of the mother wavelet, respectively. It is through these two parameters that the different resolution levels, and therefore different subspaces are created. In practice, the infinite sums are truncated and replaced by [6] x(t) = JX j=0 X d jk ψ jk (t), k 27

which presents a description of a signal in a terms of the bandpass filters whose bandwidth and center frequency is increased by the factor of 2 j. Inthediscreteform,thesignalx(t) is sampled to obtain x(k), and the wavelet transform projects this signal onto the wavelet subspace W j given as X X x j (k) = d jk ψ jk (n)+ c 0k ϕ 0k (n). k Z k Z The wavelet coefficients approximation in discrete domain can be expressed as [4], [6] d jk = c 0k = X x(n)ψ jk (n) (2.15) n X x(n)ϕ 0k (n), (2.16) n where d jk and c 0k are the discrete wavelet coefficients. The expressions (2.15) and (2.16) present the discrete wavelet transforms. The choice of the type of wavelet determines ψ jk (n) and ϕ 0k (n). In practice, the wavelet transforms are realized as a FIR filter by using the convolution of the input signal with the combination of a lowpass filter and a highpass filters. The lowpass and the highpass filters are derived from wavelet function ψ jk (t), and scaling function ϕ 0k (t), by using h(k) and g(k) givenby[4] h(k) = 1 ϕ( t À 2 2 )ϕ(t k) (2.17) g(k) = 1 ψ( t À 2 2 )ϕ(t k), (2.18) where h(k) and g(k) are the lowpass and the highpass filters developed by Mallat [9]. The lowpass and highpass filters are obtained by dilating, time shifting and sampling the 28

original mother wavelet in accordance with (2.17) and (2.18). These filters can be used for designing the orthonormal bases for subspaces V and W if the following conditions are satisfied [4] H(ω) 2 + H(ω + π) 2 = 2 G(ω) 2 + G(ω + π) 2 = 2 G(ω)H(ω) + G(ω + π)h (ω + π) = 0, where H(ω) and G(ω) are the Fourier transformed of h(t) and g(t), respectively. These conditions are key properties for wavelets to be used as a means of orthogonalizing the auto-correlation matrix of the input reference signal. A combination of lowpass and a highpass filters is used to decompose the input signal from time domain to wavelet domain. The output of lowpass filter is the approximation portion of wavelet transforms, and the output of highpass filter is the details portion. Decomposition of the input signal is obtained by filtering input signal by h(n) and g(n) (sampled presentations of (h(t) and g(t)), and downsampling the outputs of both filters. It can be mathematically expressed as a 1 (n) = d 1 (n) = LX h(k)x(2n + k) (2.19) k=0 LX g(k)x(2n + k) (2.20) k=0 where a 1 (n) are the wavelet coefficients for the approximation portion, d 1 (n) are the wavelet coefficients for the details portion, and L is the length of wavelet filters. This 29

is the first level of wavelet decomposition. In (2.19) and (2.20), it is important to note that each time a new wavelet coefficient is calculated, the input signal vector is shifted by 2n + k, resulting in downsampling factor of two. The operation is presented in Figure 2.2. In frequency domain, this operation can be seen as decomposition the input signal into subbands. The outputs of filters are approximation and details portions of the wavelet transforms, and they represent the low band and the high band portions of the original input signal. The quality of filters as well as the computational complexity necessary for implementation depend on the type of wavelets used. There are many different types of x(n) h(n) 2 a 1 (n) g(n) 2 a 1 (n) Figure 2.2: Filter and decimate operation used for wavelet decomposition. wavelets that are readily available. Typically, the input signal properties and system computational power determine the type of wavelet used. The first level wavelet transform can be obtained by using Figure 2.2. The subsequent levels are obtained by further filtering the approximation (or detail) portions of the first level wavelet transforms [4], [8]. In order to do that, we use approximation or the detail coefficients of the previous level wavelet transforms as an input to the same filter, convoluting them with h(k) and g(k). If approximation section of the wavelet transforms is used for further decomposition, then low portion of the original bandwidth is split again, and if the details section is used, 30

then the high portion of the bandwidth will be split. That is, the second level coefficients are obtained as follows, a 2 (n) = d 2 (n) = LX h(k)a 1 (2n + k) (2.21) k=0 LX g(k)a 1 (2n + k) (2.22) k=0 where a 1 (n) and d 1 (n) are the first level transforms, a 2 (n) and d 2 (n) are the second level wavelet transforms, and L is the length of wavelet filters. In this manner we can proceed to decompose the input signal, and obtain the subband decomposition of the input signal that would best suit the desired goal of the system Since the decomposition of the bandwidth is segmented into low and high portions (dyadically), any further split (by using the same filters) would result in spliting only the selected portion of the band, again into low and high subbands. From (2.17) and (2.18), it can be seen that the filters are designed so that the time scale factor going from one level transform to the next is two. Therefore, the bandwidth is also split by the factor of two (dyadically) each time decompositionisperformed.also,dueto the shift by two in input signal in (2.19), (2.20), (2.21) and (2.22), the length of the vector obtained after each filtering operation is one half of the length of original vector. Acommonapproachtoperformingandanalyzingwavelettransformsistoexpress them in their matrix form [4], [8], [11]. The matrix contains the lowpass and highpass filters. The size of the matrix is NxN, where N is length of the adaptive filter. The length of the filter used to obtain wavelet transforms depends on the choice of wavelet. For the 31

Daubechies 4 wavelets, an 8 8 matrix can be defined as h 1 (1) h 1 (2) h 1 (3) h 1 (4) 0 0 0 0... 0 0 0 h 1 (1) h 1 (2) h 1 (3) h 1 (4) 0 0... 0 0 0 0 0 h 1 (1) h 1 (2) h 1 (3) h 1 (4)... 0.......... D N = h 1 (3) h 1 (4) 0 0 0 0 0... h 1 (1) h 1 (2) g 1 (1) g 1 (2) g 1 (3) g 1 (4) 0 0 0 0... 0 0 0 g 1 (1) g 1 (2) g 1 (3) g 1 (4) 0 0... 0 0 0 0 0 g 1 (1) g 1 (2) g 1 (3) g 1 (4)... 0 g 1 (3) g 1 (4) 0 0 0 0... 0 g 1 (1) g 1 (2) (2.23) where h 1 (n) and g 1 (n) are coefficients of the lowpass and highpass wavelet filters, respectively. The rows of the matrix are orthonormal [8], comprised of the lowpass and highpass wavelet filter coefficients. The output of the transforms is given by V (n) =D N X(n), where X(n) is the input signal vector of length N. The output vector is given by [8] V (n) = a 1 (1) a 1 (2) a 1 ( N2 ) d N2 1(1) d 1 (2) d 1 ( ). The first half of the vector is the approximation section, and the second half is the details section. The shift by two in each of the rows is due to down-sampling by two done at each level of the transforms [11]. As mentioned above, the length of each newly obtained vector (approximations and details) is N 2 - one half of the original input vector. To obtain the second level of the transform, either of the two vectors (approximations and details) is used as input signal, and then multiplied by D N 2 obtained in the same manner as D N. In our experiment, the approximation section is used to obtain the subsequent levels of the transform. 32

2.4 Wavelet Transform Domain LMS Algorithm Since wavelet transforms are orthogonal transforms, they can be used as a method to reduce the eigenvalue spread of input signal auto-correlation matrix [6], [11]. This is an efficient tool for improving the performance of the LMS algorithm. The amount by which transforms reduce this spread compared to the pre-transform eigenvalue spread determines the effectiveness of the improvements. It also depends on the properties of the input signal. Different transform methods will yield a different results for different input signals. Recall that the conventional time domain LMS algorithm will lead to approximate optimal solution for the weights. However, how quickly it converges depends directly on the properties of the input signal and the step size used in the algorithm [6]. For input signals that have a large disparity in the eigenvalues, the weights corresponding to the small eigenvalues may converge slowly. On the other hand, if the step size is increased to compensate for this problem, the weights corresponding to large eigenvalues can simply become unstable, or yield a large error in convergence [11]. So, in order to have a properly behaved time domain LMS algorithm, it is necessary to find a good trade-off point for the value of the step size. This problem has inspired a new subclass of LMS algorithms know as Transform domain LMS (TLMS) algorithms. The basic idea of TLMS is to use an orthogonal transform on the input reference signal in order to orthogonalize its auto-correlation matrix. If this is achieved, then different step sizes can be used for different modes, compensating for the large variations in the eigenvalues. Such an algorithm will converge faster than the general time domain algorithm [22]. 33

This goal is achieved by first taking the orthogonal transform of the input reference signal, which maps the input signal onto the subspace of orthogonal vectors. When the auto-correlation matrix of the input reference signal is orthogonalized, - it is diagonal or near diagonal. In a sense, the transforms applied to the reference signal prior to LMS algorithm have an effect of pre-whitening or uncorrelating the input signal, resulting in reduced eigenvalue spread, and therefore increasing the convergence speed of the algorithm. Different orthogonal transforms can be used to achieve this objective. The popular transforms include the Discrete Fourier Transform (DFT) [1], [22], but they also includediscretecosinetransforms(dct)[1], and Walsh Hadamard Transforms (WHT) [1]. However, this thesis is focused on Discrete Wavelet Transforms (DWT) [6], [11]. Basic algorithm for discrete Wavelet transform LMS (WLMS) is shown in Figure 2.3. The input signal is transformed into subbands by using DWT, and then adaptive LMS algorithm is applied on each subband. After filtering, output signals from each of the subbands are added together, and then subtracted from the reference signal, yielding the output of the system. Assuming N point time domain input vector given by X(n) =[x(n) x(n 1) x(n 2)...x(n N +1)], the wavelet transform is given by V (n) =D N X =[v(n) v(n 1) v(n 2)...v(n N +1)] where D N is the wavelet transform matrix, described in section 2.3. 34

Figure 2.3: Wavelet transform domain LMS algorithm. The auto-correlation matrix of the transformed input vector is given by [6] R vv = E[V (n)v (n) t ]=D N E[X(n)X(n) t ]D t N = D N R xx D t N. (2.24) where D N is given in (2.23), and R xx is given in (2.8). The expression (2.24) presents the autocorrelation matrix of transformed input, which will determine the convergence properties of the wavelet domain LMS, similar to R xx in the time domain LMS. Analogously to time domain analysis, the cross-correlation vector between input primary and reference vectors is given as [11] P vd = E[d(n)V (n)] = E[d(n)D N X(n)] = D N P xd, 35

where P xd is given in (A.6). This implies that the optimal weight vector can be written as W opt = R 1 vv P vd = R 1 vv D N R xx R 1 xx P = R 1 vv D N R xx W opt time, (2.25) where W opt time is the optimal weight vector in time domain applications. The expression (2.25) represents the relationship between time and wavelet domain optimal vectors. As in time domain, the iterative weight update is given by [6], [11], [12] W (n +1)=W (n)+2µe(n)v (n), where W (n +1)is updated weight, W (n) is previous weight, µ is the step size, e(n) is error (or in noise canceling the output), and V (n) is transformed input. By observing the input signal auto-correlation matrices in wavelet and time domains (R xx and R vv )itcanbeshownthat[6] λ v max λ v min λ max λ min, where λ v max λv min is the eigenvalue ratio in transformed domain, and λ max λ min is the eigenvalue ratio in time domain, indicating that the spread of eigenvalues in transform domain is less than in time domain. The wavelet transforms used in this thesis were configured as shown in Figure 2.4. The transforms split the input signal into the approximation and details sections, obtaining the first level of the transforms. Then, the approximation section of the first level is transformed to the second level using the same filters as in the first level transforms. Performing the transforms in this manner effectively keeps spliting the lower band of the original band of interest. 36

h ( n) 2 a 2 (n) h ( n) 2 a 1 (n) x(n) g ( n) 2 d2(n) g ( n) 2 d 1 (n) Figure 2.4: Level two wavelet transform decomposition. This configuration of wavelet transforms results in a band of interest being split dyadically as shown in the Figure 2.4. The input signal x(n) is filtered by the lowpass and highpass wavelet filters h(n) and g(n) to produce the approximation and details sections a 1 (n) and d 1 (n). Thisisthefirstlevelwavelettransform. The approximation section is then filtered again by the same filters h(n) and g(n) to produce the second level approximations and details portions a 2 (n) and d 2 (n). This is the second level wavelet transform. For audio signals it is beneficial to split the band in this particular manner, due to the nature of human perception of voice. Due to dyadic nature of the subdivisions in wavelet transform domain, it could be advantageous to use the exponentially weighted values of the step size instead of constant. Hence, the update equation can be written as [6], W (n +1)=W (n)+2( µ )e(n)v (n). 2n This is because if the input signal is white (or approximately white) then the power of the signal is spread in the frequency domain relatively evenly across the bandwidth, depending on the level of whiteness of the input signal. 37

In this thesis, the wavelet transforms were configured insuchamannerthatthelow band is split with each additional level of the transform. Therefore, the total power of input signal keeps decreasing by the factor of two as the level of transform increases. If the step size is decreased by two for each subsequent subband, the amount of the adaption performed by the algorithm is kept constant. However, this is the case with the approximately white noise input signals. For different input signal distributions, this approach does not necessarily yield better results than a constant step size LMS algorithm. Theoretically, the best results are obtained when the step size is normalized by the power of the signal [6], [11], [12]. Each subband can be normalized by the power of the signal present in that particular band. This will lead to self-normalizing of the step size according to the signal content of the band. This procedure outlines the true advantage of transform domain LMS. Each subband can have entirely self normalized step size. The bands that need more aggressive adaption will increase their corresponding step sizes, while the rest will keep the sizes at low levels, yielding reduced post convergence error of the algorithm. The weight update in the normalized transform domain LMS is given as [6] µ W j (n +1)=W j (n)+2( E [V j (k)] 2 )e(n)v j(n), (2.26) where E [V j (k)] 2 is the power of signal in the transform domain for subband j. It is the sum of squares of diagonal elements of the auto-correlation matrix, and it is analogous to the signal power calculations in time domain explained. This implies that (2.26) presents the estimate of the input signal power in each subband. This sum is calculated for each of 38

the transform levels separately, which is given by [11] P j (n +1)=βP j (n)+(1 β)v j (n) 2, where β (0 β 1) is a constant know as the forgetting factor. The power normalized wavelet domain LMS offers the best opportunities for noise cancelling since each of the bands is self normalized. The entire algorithm as used in this thesis is summarized below: a) First take wavelet transforms of the input reference signal. This is achieved by using the following expressions to obtain the approximation and details portions of the first transform level a 1 (n) = X k d 1 (n) = X k h k x(2n + k), g k x(2n + k), and for subsequent level j,wehave a j (n) = X k d j (n) = X k h k a j 1 (2n + k), g k a j 1 (2n + k). b) Then apply the transformed input to wavelet transform domain adaptive filter by first forming the transformed input vectors as V j (n) = [a j (n) a j (n-1) a j (1)], V j 1 (n) = [a j 1 (n) a j 1 (n-1)...a j (1)],. V 1 (n) = [d 1 (n) d 1 (n-1)...d 1 (1)]. 39

c) The filter output is calculated as y(n) = P j W t j (n)v j (n). d) After the filter output, the error is calculated as e(n) =d(n) y(n). e) The filter coefficients vectors are updated according to W j (n+1) = W j (n)+ 2µe(n)V j(n) E[V j (k)] 2. 2.5 Past and Current Work Theideaofusingwaveletsasatransformmethodforimprovingtheperformanceof LMS class of algorithms has been widely investigated. In the past, many different orthogonal transform methods were used to achieve the goal. Some of those include Discrete Fourier Transforms (DFT) [1], [22], Discrete Cosine Transforms (DCT) [1], and Walsh Hadamard Transforms (WHT) [1]. The advantages and disadvantages of each transform method for the purpose of enhancing the performance of adaptive filtersisstillanarea under development. The LMS algorithms for the purpose of adaptive noise cancellers was developed by Widrow [3], who presented the different configurations and algorithms that can be used for the purpose of adaptive noise cancelation, including the analytical expressions and performance criteria. This presentation is considered by many as the cornerstone for many subsequent developments in the filed of adaptive filtering. Probably the most interesting work on WLMS is done by Erdol and Basbug [6], in which the analysis and results of wavelet domain LMS was presented. The wavelets used in [6] are configured in a slightly different manner than the ones used in this thesis. In [6], the input reference signal in the multi-resolution space is provided by the wavelet transforms. However,theconstructionusedisnotbasedon Mallat s multi-resolution algorithm [9]. The filters used are formed by scaling and sampling the convolutional square of the wavelet 40

ψ(t). It was shown that the eigenvalue spread of the input signal auto-correlation matrix can be reduced by using the wavelet transforms. It also analyzes three possible update equations for adaptive filters namely the constant convergence factor, the exponentially weighted convergence factor, and the time-varying convergence factor. Also, some simulation results are presented showing the effects of wavelet transforms on the adaptive filters. Shamma [11], presented different applications and simulation results for the wavelet transform domain LMS. The paper presents not only the adaptive noise canceller used in our experiment, but also the adaptive beamforming applications. It is an intuitive work that bridged the WLMS theory and practical implementations. It describes an application of wavelet transforms in a matrix form, and provides simulation results of applying WLMS in adaptive equalizer applications used in telecommunication systems. It also provides simulation comparisons of the wavelet transform versus DCT, DHT, and DCT. Ogunfunmi and Dang [12], describe a performance analysis of the WLMS in different systems and with different parameters. Results were also compared to the performance of DCT, DFT and DHT. 2.6 Conclusion In this chapter, the fundamentals of the time domain and the wavelet transform domain LMS algorithms were reviewed. This serves as the foundation of the experimental implementation in the next chapter. The performance criteria, and the dependance of the algorithms on the statistical properties of the input signals were discussed. The dependance of the LMS class of algorithms on the eigenvalue of the auto-correlation matrix was 41

explained. The problem of slow convergence was introduced in the NLMS algorithms, and the solution of using WLMS was proposed. 42

CHAPTER III DESCRIPTION OF THE EXPERIMENT 3.1 Introduction In this chapter, we present the description of the experiment on wavelet transform adaptive filtering. The hardware description, experimental setup, software description and experimental methodology are presented. Also, the trade-offs encountered, and decisions made throughout the experiment are explained. The detailed description of the program used to implement the algorithms is presented. The computational cost required for implementation is included and analyzed. 3.2 Description of Hardware The experiment was designed with audio signals processing applications in mind. In order to implement the system, a circuit board was designed to perform the task. The board contains analog section, analog-to-digital converters, Digital Signal Processor (DSP) chip, and digital-to-analog converters. The detailed description of components used to process the signal as it flows through the board is given in the following section. A block diagram representation of the equipment usedinthethesisisgiveninfigure 3.1. In order to generate input signals, and analyze the output signals, Audio Precision System 2 (AP 2322) [26] was used. The Audio Precision system is designed for a variety of 43

possible audio applications, and it is used widely in the audio industry for analyzing signals and systems [26]. A variety of signals can be generated, including sinusoids, white noise, pink noise, and so on. Many different methods are available for analyzing the signals, both in time and in frequency domains. The system is controlled by the accompanying software, APWIN Version 2.24, which is used to control the hardware, and to generate and analyze the results. The plots presented in Chapter 4 of this thesis are generated with the software. Oscilloscope Audio Precision System 2 PC With Software to control the Audio Precision Board with DSP Figure 3.1: Block diagram of system used for the experiment. In Figure 3.1, the signals are generated by the Audio Precision, taken to the board under test for processing, and then the output is fed back to the Audio Precision in order to be analyzed. The board was designed specifically for the purpose of processing the audio signals using DSP processor [17]. The custom design of the board enables EEPROM chips containing the software programs to be easily changed. Figure 3.2 shows the board used in the experiment. Figure 3.3 shows the Audio Precision System 2. The figure shows the front panel of the unit with input and output interfacing connectors. It also contains an internal DSP chip, which used by the System 2 in order to generate output signals, and analyze the input signals [26]. The unit is accompanied with the software used to control the hardware shown 44

in Figure 3.3. The software enables the user to control the properties of the output signal generated by the unit. It also controls the analyzes tools built into the hardware. The unit is a common tool for professional audio systems analysis. Figure 3.4 depicts the overall system used in the experiment. It shows the Audio Precision System 2 connected to the board under test, and a computer running the test. It also shows the oscilloscope used for visual inspection of the signals during the test. The oscilloscope data is not otherwise usedinthisthesis. Inthefigure, the inputs and outputs from the board under test are connected to the appropriate connectors on the Audio Precision unit. Figure 3.2: The DSP board used in the experiment. 45

Figure 3.3: The Audio Precision System2usedintheexperiment. Figure 3.4: The overall system setup. 3.2.1 Signal Description The signal upon entering the DSP board goes throughachainofelementsdesignedto process it. The chain starts with the analog section designed to filter the signal to baseband. 46