R 1
IN357: ADAPTIVE FILTERS Course book: Chap. 9 Statistical Digital Signal Processing and modeling, M. Hayes 1996 (also builds on Chap 7.2). David Gesbert Signal and Image Processing Group (DSB) http://www.ifi.uio.no/~gesbert March 2003 DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing1 of 21
Outline Motivations for adaptive filtering The adaptive FIR filter Steepest descent and optimization theory Steepest descent in adaptive filtering The LMS algorithm Performance of LMS The RLS algorithm Performance of RLS Example: Adaptive beamforming in mobile networks DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing2 of 21
Motivations for adaptive filtering Goal: Extending optimum (ex: Wiener) filters to the case where the data is not stationary or the underlying system is time varying {d(n)} desired random process (unobserved) may be non stationary {x0(n)} observed random process, may be non stationary {x2(n)} observed random process,may be non stationary. {xp 1(n)} observed random process, may be non stationary p observations x (n) x (n) 2 x (n) p 1 filter W must be adjusted over time n 0 filter Wn d(n) desired signal DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing3 of 21 ^ d(n) estimated signal e(n) error signal
Cases of non stationarity The filter W must be adjusted over time and is denoted W (n) in order to track non stationarity: Example 1: To find the wiener solution to the linear prediction of speech signal. The speech signal is non stationary beyond approx 20ms of observations. d(n), {xi(n)} are non stationary. Example 2: To find the adaptive beamformer that tracks the location of a mobile user, in a wireless network. d(n) is stationary (sequence of modulation symbols), but {xi(n)} are not because the channel is changing. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing4 of 21
Aproaches to the problem Two solutions to track filter W (n): (Adaptive filtering) One has a long training signal for d(n) and one adjusts W (n) to minimize the power of e(n) continuously. (Block filtering) One splits time into short time intervals where the data is approximately stationary, and re-compute the Wiener solution for every block. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing5 of 21
Vector Formulation (time-varying filter) W (n) =[w0(n),w1(n),.., wp 1(n)] T X(n) =[x0(n),x1(n),.., xp 1(n)] T ˆd(n) =W(n) T X(n) where T is the transpose operator. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing6 of 21
Time varying optimum linear filtering e(n) =d(n) ˆd(n) J(n)=E e(n) 2 varies with n due to non-stationarity where E() is the expectation. Find W (n) such that J(n) is minimum at time n. W (n) is the optimum linear filter in the Wiener sense at time n. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing7 of 21
Finding the solution The solution W (n) is given by the time varying Wiener-Hopf equations. Rx(n)W (n) =rdx(n) where (1) Rx(n) =E(X(n) X(n) T ) (2) rdx(n) =E(d(n)X(n) ) (3) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing8 of 21
Adaptive Algorithms The time-varying statistics used in (1) are unknown but can be estimated. Adaptive algorithms aim at estimating and tracking the solution W (n) given the observations {xi(n)},i=0..p 1 and a training sequence for d(n). Two key approaches: Steepest descent (also called gradient search) algorithms. Recursive Least Squares (RLS) algorithm. Tracking is formulated by: (n +1)=W(n)+ W(n) (4) where W (n) is the correction applied to the filter at time n. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing9 of 21
Steepest descent in optimization theory Assumptions: Stationary case. Idea: Local extrema of cost function J(W) can be found by following the path with the largest gradient (derivative) on the surface of J(W). W (0) is an arbitrary initial point W (n +1)=W(n) µ δj δw W=W(n) where µ is a small step-size (µ <<1). Because J() is quadratic here, there is only one local minimum toward which W (n) will converge. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing10 of 21
The steepest descent Wiener algorithm Derivation of the gradient expression: J(W )=E(e(n)e(n) ) where e(n) =d(n) ˆd(n)=d(n) W T X(n) δj δw = E(δe(n) δw e(n) + e(n) δe(n) δw ) δj δw δj δw = E(e(n)X(n) ) = E(0 + e(n)δe(n) δw ) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing11 of 21
Algorithm: The steepest descent Wiener algorithm W (0) is an arbitrary initial point W (n +1)=W(n)+µE(e(n)X(n) ) W (n) will converge to Wo = R 1 x r dx (wiener solution) if 0 <µ<2/λmax (max eigenvalue of Rx. (see p. 501 for proof). Problem: E(e(n)X(n) ) is unknown! DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing12 of 21
The Least Mean Square (LMS) Algorithm Idea: E(e(n)X(n) ) is replaced by its instantaneous value. W (0) is an arbitrary initial point W (n +1)=W(n)+µe(n)X(n) Repeat with n +2.. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing13 of 21
The Least Mean Square (LMS) Algorithm Lemma: W (n) will converge in the mean toward Wo = R 1 x r dx, if 0<µ<2/λmax, (see p. 507) ie.: (W (n)) R 1 x r dx when n (5) Important Remarks: ThevarianceofW(n)around its mean is function of µ. µ allows a trade-off between speed of convergence and accuracy of the estimate. Asmallµresults in larger accuracy but slower convergence. The algorithm is derived under the assumption of stationarity, but can be used in non-stationary environment as a tracking method. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing14 of 21
A faster-converging algorithm Idea: build a running estimate of the statistics Rx(n), rdx(n), and solve the Wiener Hopf equation at each time: x(n)w (n) =rdx(n) (6) Where Rx(n) = rdx(n) = k=n k=0 k=n k=0 λ n k X(k) X(k) T (7) λ n k d(k)x(k) (8) where λ is the forgetting factor (λ <1close to 1) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing15 of 21
Recursive least-squares (RLS) To avoid inverting a matrix a each step, ones finds a recursive solution for W (n). Rx(n) =λrx(n 1) + X(n) X(n) T (9) rdx(n) =λrdx(n 1) + d(n)x(n) (10) W (n) =W(n 1) + W (n 1) (11) Question: How to determine the right correction W (n 1)?? Answer: Using the matrix inversion lemma (Woodbury s identity) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing16 of 21
Matrix inversion lemma We define P(n) =Rx(n) 1. The M.I.L. is used to update P(n 1) to P(n) directly: A + uv H ) 1 = A 1 A 1 uv H A 1 1+v H A 1 u We apply to Rx(n) 1 =(λrx(n 1) + X(n) X(n) T ) 1 (12) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing17 of 21
Matrix inversion lemma Rx(n) 1 = λ 1 Rx(n 1) 1 λ 1 Rx(n 1) 1 X(n) X(n) T Rx(n 1) 1 1+λ 1 X(n) T Rx(n 1) 1 X(n) (13) (14) DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing18 of 21
The RLS algorithm W (0) = 0 P(0) = δ 1 I (15) (16) Z(n) =P(n 1)X(n) (17) G(n) = Z(n) λ+x(n) T Z(n) (18) α(n) =d(n) W(n 1) T X(n) (19) W (n) =W(n 1) + α(n)g(n) (20) P(n) = 1 λ (P(n 1) G(n)Z(n)H ) (21) where δ<<1is a small arbitrary initialization parameter DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing19 of 21
RLS vs. LMS Complexity: RLS more complex because of matrix multiplications. LMS simpler to implement. Convergence speed: LMS slower because depends on amplitude of gradient and eigenvalue spread of correlation matrix. RLS is faster because it points always at the right solution (it solves the problem exactly at each step). Accuracy: In LMS the accuracy is controlled via the step size µ. In RLS via the forgetting factor λ. In both cases very high accuracy in the stationary regime can be obtained at the loss of convergence speed. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing20 of 21
Application The LMS applied to the problem of adaptive beamforming... To be developped in class. DEPARTMENT OF INFORMATICS D. Gesbert: IN357 Statistical Signal Processing21 of 21