THE UNIVERSITY OF NAIROBI SCHOOL OF ENGINEERING DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT

THE UNIVERSITY OF NAIROBI SCHOOL OF ENGINEERING DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT KALMAN FILTER FOR LONG TERM STIMATION OF HYDROELECTIC CAPACITY FOR KENGEN By MWENDWA GIDEON MULATYA REGISTRATION NUMBER: F17/1423/2011 SUPERVISOR: DR. G.S.O ODHIAMBO EXAMINER: PROF. V.K ODUOL Project report submitted in partial fulfillment of the requirement for the award of the Degree of: Bachelor of Science in Electrical & Electronic Engineering of The University of Nairobi. Date of Submission: 18 th May, 2016

DECLARATION OF ORIGINALITY NAME OF STUDENT: REGISTRATION NUMBER: COLLEGE: FACULTY/SCHOOL/INSTITUTE: DEPARTMENT: COURSE NAME: Mwendwa Gideon Mulatya F17/1423/2011 Architecture and Engineering Engineering Electrical and Information Engineering Bachelor of Science in Electrical and Information Engineering TITLE OF WORK: Kalman filter for long term estimation of hydroelectric capacity for KenGen 1) I understand what plagiarism is and I am aware of the university policy in this regard. 2) I declare that this final year project report is my original work and has not been submitted elsewhere for examination, award of a degree or publication. Where other people s work or my own work has been used, this has properly been acknowledged and referenced in accordance with the University of Nairobi s requirements. 3) I have not sought or used the services of any professional agencies to produce this work. 4) I have not allowed, and shall not allow anyone to copy my work with the intention of passing it off as his/her own work. 5) I understand that any false claim in respect of this work shall result in disciplinary action, in accordance with University anti-plagiarism policy. Signature: Date:

DEDICATION This project is dedicated to my parents, Mr. and Mrs. Mwendwa, for the unrelenting support they have provided me in my education and their unwavering belief in me. I am forever in their debt.

ACKNOWLEDGEMENTS I am grateful to the Almighty God for His abundance of grace for every moment under the sun. I wish to express my gratitude to my supervisor, Dr. G.S.O. Odhiambo, for his guidance and support in the undertaking of this project. His depth of knowledge and insight into the topic and other broader issues was incredibly useful and refreshing. For that I am eternally grateful. Finally, I would like to extend my sincere gratitude to my parents, Mr. and Mrs. Mwendwa and my entire family for their unwavering support.

TABLE OF CONTENTS DECLARATION OF ORIGINALITY... i ACKNOWLEDGEMENTS... iv TABLE OF CONTENTS... v LIST OF FIGURES... vii LIST OF ABBREVIATIONS...viii ABSTRACT... ix CHAPTER ONE: INTRODUCTION... 1 1.0 General introduction... 1 1.1 Problem statement... 1 1.2 Objectives... 1 1.3 Project scope... 1 1.5 Report organization... 1 CHAPTER TWO: LITERATURE REVIEW... 2 2.0 Introduction... 2 2.1 A brief introduction to Time series analysis... 2 2.1.1 Definition of a time series... 2 2.1.2 Components of a time series... 2 2.1.3 Time series analysis... 3 2.2 Estimation theory... 4 2.2.1 Bayesian estimation... 5 2.2.2 Gaussian distribution... 5 2.3 State space modelling... 5 2.4 The Kalman filter... 6 2.4.1 The Discrete Kalman filter algorithm (conventional Kalman filter)... 7 Time update equations... 10 Measurement update equations... 11 2.4.2 The Kalman filter as a filter... 13 2.4.3 Kalman filter as a predictor... 13 2.4.4 Kalman filter as a smoother... 13 2.4.5 Merits of the Kalman filter... 13 2.5 The Kalman filter and the RLS algorithm... 14 2.5.1 Adaptive filtering... 14

2.5.2 The Recursive Least Square formulation... 15 2.5.3 The Adaptive Kalman filter (AKF)... 18 2.5.4 Shaping the memory of the estimator.... 18 CHAPTER THREE: METHODOLOGY... 21 3.0 Introduction... 21 3.1 Study area and data... 22 3.2 Modelling the filter... 24 3.3 Initializing of the filter parameters... 25 3.4 Training and forecasting... 26 3.4.1 One - step ahead prediction... 26 3.4.2 n-step ahead prediction... 26 3.5 Tuning... 27 3.5.1 Determining the optimal filter length... 27 3.5.2 Determining the forgetting factor... 28 3.5.3 Determining the inverse covariance matrix... 28 3.6 Smoothing and Decomposition of the time series... 28 CHAPTER FOUR: RESULTS AND ANALYSIS... 29 4.1 Masinga time series... 29 4.2 Kamburu time series... 30 4.3 Turkwel time series... 32 4.4 Nile time series... 34 4.5 Implication of the data KenGen s hydroelectric capacity... 35 CHAPTER FIVE: CONCLUSION... 36 REFERENCES... 37 APPENDIX... 38 A.1: MATLAB function for implementing the adaptive Kalman filter algorithm ( main.m )... 38 A.2: MATLAB code for calculating the MSE for different filter lengths.... 40 A3: Matlab function for smoothing and decomposition of the time series... 41

LIST OF FIGURES Figure 1: Time series decomposition... 3 Figure 2: Kalman filter circuit... 8 Figure 3:Kalman filter equations... 10 Figure 4: Basic adaptive filter... 14 Figure 5: Moving rectangular weighting... 19 Figure 6:Exponential data weighting... 20 Figure 7:Kenya s electricity generation... 23 Figure 8: Map of the Nile River basin... 24 Figure 9: Adaptive predictor design... 25 Figure 10:FIR Transversal Structure... 25 Figure 11: Predicted Masinga dam levels... 29 Figure 12: MSE for different filter lengths... 30 Figure 13: Decomposition of the Masinga time series... 30 Figure 14:Dam levels- Kamburu time series... 31 Figure 15: Mean Square Error for different filter lengths- Kamburu time series... 31 Figure 16: Decomposition of the Kamburu time series... 32 Figure 17: Dam levels- Turkwel time series... 32 Figure 18: Mean Square Error for different filter lengths- Turkwel time series... 33 Figure 19: Decomposition of the Turkwel time series... 33 Figure 20: Nile water levels... 34 Figure 21: Mean Square Error for different filter lengths... 34 Figure 22: Decomposed Nile time series... 35

LIST OF ABBREVIATIONS LS Least squares LMS RLS KF AKF ARIMA MSE EWP Least Mean Square Recursive Least Square Kalman Filter Adaptive Kalman Filter Autoregressive Integrated Moving Average Mean Square Error Exponentially Weighted Past

ABSTRACT The demand for accurate and reliable predictions and forecasts in virtually every activity has necessitated the development of numerous estimation algorithms. The Kalman filter is one such tool that is ideal for estimating the state of a dynamic system from a series of incomplete or noisy measurements. The target of this project is to make use of the Kalman filter for estimating the long term hydroelectric capacity for KenGen as inferred from past climatic records. It further compares the data from Kenyan rivers with that of the River Nile to identify any climatic similarities shared within the region. In order to achieve this, the Kalman filter is used to develop a model that comprehensively describes the available river level measurement data and is capable of predicting the future water levels for rivers supplying KenGen s hydro-electric dams.

CHAPTER ONE: INTRODUCTION 1.0 General introduction In Kenya, hydropower is a major contributor of the total electrical power supplied to the grid. It forms a significantly cheap source of energy and supplies most of the country s electricity demands. Since it is generated from harnessing the power of moving water stored in dams or diverged from rivers, its generation is directly dependent on the prevailing climatic conditions in the region. Due to the important contribution of this kind of electrical power, it is critical to understand the climatic trends contained in past records and therefore effectively forecast probable future courses so as to plan accordingly. Such information is beneficial in planning for the operations of the individual power plants and the entire power system as a whole to ensure its reliability. 1.1 Problem statement To adequately model past climatic data and use it to predict the probable future values so as to infer KenGen s hydroelectric capacity in the future. 1.2 Objectives To use available climate data to predict the future power generation capacity for Kenyan rivers and compare data with the four thousand year data of the River Nile. 1.3 Project scope The focus of this project will be on: Modelling the various datasets by using the adaptive Kalman filter. Using the developed models for extrapolation into the future. Decomposing the data into various components for further analysis. 1.5 Report organization The report will be organized into the following chapters: Chapter 2 will contain the literature review on the Kalman filter and adaptive Kalman filter. Chapter 3 discusses on the project methodology and the implementation. Chapter 4 discusses the results and findings of the project Chapter 5 contains the conclusions and future work.

CHAPTER TWO: LITERATURE REVIEW 2.0 Introduction This chapter provides an overview of the original Kalman filter formulation and how the filter can be used in an adaptive context. It begins by offering a concise review of some conceptual tools that include relevant topics in estimation theory and state space analysis that form the foundation of the filter s operation. A brief introduction to time series analysis is also offered at the outset of this chapter to highlight on the general statistical background and the need for the Kalman filter. 2.1 A brief introduction to Time series analysis 2.1.1 Definition of a time series A time series can be defined as a sequential set of data points, measured over successive times. Mathematically, it is defined as a set of vectors X (t) where t 0, 1, 2... with t used to represent the elapsed time. The variable X (t) is treated as a random variable while the measurements in a time series are arranged in a proper chronological order. [1] A time series may be described as either being univariate or multivariate. A univariate time series contains records of a single variable while the multivariate time series comprises of records of several variables. A time series can also be continuous or discrete. A continuous time series contains records at every instance of time while the discrete time series contains records taken at quasi-spaced time intervals. 2.1.2 Components of a time series A time series is composed of four main components that can be decomposed from the observed data. These are: Trend, Seasonal, Cyclical and Irregular components. Trend can be defined as the general tendency of a time series to increase, decrease or stagnate over a long span of time. It accounts for the gradual shifting of the time series over a long period of time. Seasonal variations can be described as fluctuations exhibited over specified time frames in the time series. They account for regular patterns of variability within the specific time periods. Cyclical variation can be defined as the medium-term changes in the series, caused by circumstances and are repeated in cycles. They are the regular pattern of sequences of values above and below the trend line. Cycles are exhibited over longer periods of time (usually two or more years).

Irregular or random components in a time series are the random fluctuations around the average of the series. They result from unpredictable influences which are not regular and don t repeat in a particular pattern. Using these components, a time series can be defined in two broad models: The additive model where the time series is the sum of its components. It is appropriate when there is no exponential growth in the series, and the amplitude of the seasonal component remains constant over time. μ ɤ The multiplicative model where the time series is the product of its components. It is appropriate when there is exponential growth in the series, and the amplitude of the seasonal component grows with the level of the series. μ ɤ where z (t) represents the observation while μ, ɤ, seasonal, cyclical and irregular variation at time t. and respectively represent the trend, Figure 1: Time series decomposition 2.1.3 Time series analysis In order to understand the nature of a given time series and gain meaningful insight from the data contained in a time series, a suitable model is fitted to the given time series and the corresponding parameters are estimated from the known data values. It is this process of fitting a time series to its proper model that is termed as Time Series Analysis [1].

The objectives of time series analysis include: Compact description of data into mathematical notations such as the mean, variance and autocorrelation structure of the data Interpretation of the data through decomposing the time series into either of its four components. Forecasting which entails predicting the probable future course of the time series. Hypothesis testing. Facilitating simulations of physical phenomena. Time series forecasting involves analysis of past observations to develop a suitable mathematical model that captures the underlying data generating process for the series. The model is then used to predict probable future events using the developed model. Therefore, it is of great importance to fit an appropriate model to a time series in order to generate meaningful and reliable forecasts. In the selection of the forecasting model, the selected model should include the features that capture all the important qualitative properties of the data. These include but are not limited to the patterns of variation in level and trend, effects of seasonality, correlations among variables. The assumptions underlying the selected model should also agree with intuition about how the series is likely to behave in the future. Since it is impossible to predict with certainty what will happen in future, time series are described as being non-deterministic in nature. There are several methods used for time series analysis. The most commonly used include: Deterministic modelling/numerical Analysis Modelling. Stochastic modelling/time Series Modelling. State space modelling. The time series analysis approach employed in the execution of this project is the state space modelling method since it can easily incorporate various time series models such as the univariate and multivariate ARIMA models, linear regression model and structural time series model. 2.2 Estimation theory Estimation theory is a branch of probability and statistics that deals with the problem of deriving information about properties of random variables and stochastic processes, given a set of observed samples [2].

2.2.1 Bayesian estimation Bayesian theory is a branch of mathematical probability theory that enables modelling of uncertainty about the world and outcomes of interest by incorporating prior knowledge and observational evidence [3]. In other terms, Bayesian probability determines what is likely to be true based on past information. Bayesian analysis therefore interprets probability as being a conditional measure of uncertainty. The mathematical culmination of Bayesian probability is the Bayes theorem: ( ) (2.1) ( ) ( ) denotes the probability of event A and Where is the conditional probability representing the probability of event A happening if event B happens. In the Bayes theorem ( )is the prior, equation, B is the evidence, is the likelihood, and is the posterior. 2.2.2 Gaussian distribution Gaussian distribution is a computationally efficient unimodal, continuous way of representing probabilities modelling how the real world works. In Bayesian terms, Gaussians describe the probability about a measurement, by expressing the precision of the measurement and the variance associated with the measurements. The expression ~(μ, ) denotes a continuous random process x that is normally distributed about the mean µ and having a variance [4].The probability density function for this example is given as: for < 1 ( ) (2.2) 2 <. Gaussian probability is popular in modelling randomness such as the noise experience in measurements due to the following reasons: It adequately models many naturally occurring processes. It is computationally efficient multiplying Gaussians yields a new Gaussian that describes new belief from prior belief pertaining to a given system. 2.3 State space modelling State-space modelling provides an elegant and convenient approach for the analysis of dynamic systems. It offers simplified notation for estimation and control problems [5]. It also provides a

flexible approach to time series analysis since it virtually allows a general treatment of any linear time series model through the general algorithm of the Kalman filter and smoother [6]. The distinguishing feature of state space time series models is that observations are regarded as made up of distinct components such as trend, seasonal, regression elements and disturbance terms, each of which is modeled separately [7] It relies on the dynamics of the state variables and the relation between the observed variables and the state variables to draw statistical inference about the unobserved states. This makes it particularly useful for models involving unobserved states. The general state-space model is represented by the following two equations (2.3) and (2.4) + (2.3) + (2.4) Equation (2.3) is the process model and represents how a new state is modelled as a linear combination of the previous state and some process noise. Equation (2.4) is the measurement model and describes how the process measurements are derived from the internal state with the inclusion of some measurement noise [5]. Advantages of using the state space model approach for time series analysis include: It provides for general treatment of virtually any linear time series models through the general algorithms of the Kalman filter and the associated smoother [6]. It facilitates the computation of the likelihood function. It offers flexibility when dealing with data irregularities such as observations at mixed frequencies and missing observations. It allows observations to be added one at a time, with the estimating equations updated to produce new estimates. 2.4 The Kalman filter The Kalman filter is an optimal estimator for a linear quadratic problem which is a problem of estimating the instantaneous state of a linear dynamic system perturbed by white noise [8]. It can also be considered as an extension of the Wiener filtering concept with its main objective being the minimization of the estimation square error of a non-stationary signal buried in noise [9]. The Kalman filter is considered an optimal estimator since it produces the best estimates of unknown variables based on previous measurements. It infers parameters of interest from

indirect, uncertain and inaccurate observations by minimizing the mean square error of the estimated parameters if all noise is Gaussian. It is also a recursive algorithm since it entails two phases i.e. prediction and correction which are executed cyclically for every new measurement being processed as it arrives [8] [5]. The Kalman filter is considered to be a filter since it separates the noise from the parameters of interest. The process of finding the best estimate from noisy data may be viewed as filtering out the noise [8]. In summary, the Kalman filter is used for inferring missing information from indirect and noisy measurements and predicting the probable future courses of dynamic systems such as prices of traded commodities, future wind speeds, water levels in dams. 2.4.1 The Discrete Kalman filter algorithm (conventional Kalman filter) There exist several derivations of the Kalman filter. The discrete Kalman filter algorithm is the original basic and most common derivation where the measurements occur and the state is estimated at discrete points in time [5].This description of the Kalman filter makes use of the state-space form and assumes availability of knowledge on the noise covariance matrices involving both the process noise and measurement noise. The role of the Kalman filter is to update knowledge of the new estimate recursively once a new observation becomes available. The filter therefore performs two operations within each iteration; 1. Prediction step performed with the dynamic model 2. Correction step performed with the observation model It is this predictor-corrector type of operation that makes the Kalman filter an optimal estimator since it strives to minimize the estimated error covariance with every iteration. These two stages are sequentially repeated with the state of the previous time step as the initial value for a new iteration as illustrated by Figure 2.

Figure 2: Kalman filter circuit The Kalman filter attempts to estimate the system state vector, X t, governed by the linear stochastic difference equations (2.3) and (2.4) repeated here for convenience. (state equation) (2.3) With the measurement vector,, given by: observation equation (2.4) The parameter represents the process noise while represents the measurement noise and are assumed to be independent of each other, white and Gaussian. They are taken to have normal probability distributions: P (w) ~ N (0, Q) whereby Q is the process noise covariance (2.5) P (v) ~ N (0 R) whereby R is the measurement noise covariance (2.6) The other terms contained in the basic Kalman filter equations (2.3) and (2.4), designate the following parameters: represents the state vector containing the terms of interest for the system at time t. represents the vector containing any control input applied to the system. A represents the state transition matrix which relates the state at the previous time step to the state at the current step, in the absence of either a driving function or process noise. B represent the control input matrix which relates the optional control input to the state x.

H represents the transformation matrix in the measurement equation (2.4) mapping the state vector parameters into the measurement domain. In the computational analysis of the filter, one considers the states; and which represent the a priori state estimate and the a posteriori state estimate at time t respectively provided. It therefore follows that the a priori estimate error is described by the equation: measurement (2.7) And the a posteriori error is described by the equation: (2.8) The a priori estimate error covariance then becomes, [ ] (2.9) And the a posteriori estimate error covariance becomes, [ ] (2.10) In explaining the Kalman filter and its operation, one is expected to first formulate the a posteriori state estimate as a function of the a priori estimate and a weighted difference between an actual measurement and a measurement prediction. This is the probabilistic origin of the filter and is expressed in as, + The measurement ( ( ) (2.11) ) is reffered to as the innovation or the residual and measures the difference between the predicted value and the measured value. It is multiplied by a matrix K which minimizes the a posteriori error covariance. The matrix K is a weighting factor known as the Kalman gain. From equation (2.11), it can be seen that the process of obtaining the next estimate (which is essentially the primary task of the Kalman filter) depends only on the previous state, the measured value and the Kalman gain. The previous estimate and the measured value are already known therefore simplifying the task to that of determining the Kalman gain. One form of the Kalman gain matrix K that results in reduced error covariance is: ( + ) (2.12) (2.13) ( + ) From the equation (2.13) above, it can be noted that as the measurement error covariance R approaches zero, the gain K weights the residual more heavily.

When using the Kalman filter, the initial state of the system, X_0, and the initial error covariance, P_0, must be specified after which estimates of the state vector and its corresponding covariance matrix are recursively calculated for every other subsequent time step. This is done by the set of equations associated with the Kalman filter stages as shown in Figure 3 with their detailed explanations following thereafter. Figure 3:Kalman filter equations Time update equations (2.14) (2.15) Mean represents a system of equations, where matrix A holds The linear equation the coefficients of the states,. Since A contains the state transition for a given time step, the product Ax computes the state after that transition. Similarly, B is the control function computing the contribution of the control input The state Covariance to the state after the transition., can therefore be viewed as the updated mean of the state variables at time t.

The equation, +, computes the new covariance matrix which contains the variances of the state variables on the principal diagonal and the covariance between the state variables on the off-diagonal elements. Usually, the covariance matrix is initialized having no correlation between the state variables. However, the equation, makes use of the process model to automatically compute the covariance between the state variables during each time update iteration [4].The covariance due to the prediction can also be modeled as the expected value of the error in the prediction step, [( [ )( ) ] (2.16) ] (2.17) [xx ] (2.18) It can be noted that [xx ] is simply P hence the form: + (2.19) Measurement update equations + + (1 ( (2.20) ) (2.21) ) (2.22) + System Uncertainty: Considering the system uncertainty or innovation covariance equation: term projects the a priori covariance matrix, +, the into the measurement space. It essentially transforms the co-ordinate system to facilitate computations in the measurement domain. Once the covariance is in measurement space, the observation noise is accounted for by adding the observation error covariance, R. It can be seen that the system uncertainty equation contained in equation (2.20) and the covariance equation (2.15) bear a lot of similarity illustrated by equations (2.23) and (2.24) below: + (2.23) + (2.24) This is because the two equations are functionally similar since they are both responsible for putting the covariance, P, into a different space by using either the function H or A. Once in the new space, the noise matrix associated with the particular space is added to the result.

Kalman Gain: The underlying concept of the Kalman filter algorithm is generating estimates based on two measurements, or a measurement and a prediction depending on which parameter is thought to be more accurate. In the presence of a prediction and a measurement, a more precise estimates needs to be selected between the two measurements. If there s more certainty about the measurement, then the estimate will be closer to it. Similarly, if there s more certainty about the prediction, the estimate will be nearer to it. The Kalman gain equation computes a ratio based on how much the prediction is trusted over the measurement. It can be thought of as being: (2.25) (2.26) Residual: ( ) The measurement function, H, converts the state variables into equivalent measurements. Once that is done, the measurement can subtract the transformed states to get the residual which is the difference between the measurement and prediction. State Update: + ( ) The new state is computed to be along the residual, weighted upon by the Kalman gain. After scaling the residual, term in equation (2.23) converts the result back to state space for addition with X_a priori to yield the new estimate i.e. X_a posteriori. Combining equations (2.21) and (2.26), the new estimate can be viewed as: + Covariance Update: ( (1 (2.27) ) ) This equation aims to adjust the size of the uncertainty presented in the covariance matrix at the previous time step by some factor of the Kalman gain. With the measurement function, H, constant, the updated covariance is determined by the Kaman gain. The Kalman gain, ratio of how much prediction over measurement is used. If is large then (, is a ) is small

and is consequently made smaller than it was in the previous time step. If ( ) is large and is small, then becomes larger than it was in the previous time step. 2.4.2 The Kalman filter as a filter One of the most attractive features of the Kalman filter is that it can be employed as a filter, predictor and smoother. The purpose of filtering is the extraction of a signal ignoring the noise it may contain. As a filter, the Kalman filter makes use of observations up to and including the time the state of the dynamic system is to be estimated [8]: (2.28) Therefore, the Kalman filter is seen to be producing current estimates of the dynamic system. 2.4.3 Kalman filter as a predictor The purpose of iterating is to extract the required information from a signal, ignoring everything else. The importance of prediction is the forecasting of likely future outcomes based on the previous observations [8]. As a predictor, the Kalman filter uses observations strictly prior to the state of the dynamic system is to be estimated: (2.29) The Kalman filter therefore forecasts/predicts when it produces future estimates. 2.4.4 Kalman filter as a smoother As a smoother, the Kalman filter makes use of observations that are beyond the time that the state of the dynamic system is to be estimated [8]: (2.30) The Kalman filter acts as a smoother when trying to find past estimates. 2.4.5 Merits of the Kalman filter The Kalman has been hailed as one of the greatest discoveries in recent times due to reasons such as:

Its varied applications from engineering, mathematics and even economics Its easy formulation and implementation given a basic understanding. Its convenient implementation as a computer algorithm Its convenient form in online real time processing Good results due to its optimality and structure 2.5 The Kalman filter and the RLS algorithm 2.5.1 Adaptive filtering Adaptive filtering is the capability of a filtering algorithm to perform self-learning, that is, as time advances, the filter sets its output in conformity of the required performance. An adaptive filter is therefore capable of modifying its response in real time with the aim of improving its performance. The basic scheme for applying an adaptive filter is Figure 4 with X t representing the vector of the noisy input signal at discrete instance k, and the vector error signal designated as et (dt - yt), where dt is the vector of the desired output response free from the effects of noise and yt is the actual vector network response given an input of Xt [9] [10]. Figure 4: Basic adaptive filter Adaptive filtering has numerous applications with each application varying the basic adaptive structure to meet its objective. The most common applications and hence configurations are: 1. Adaptive prediction of the value of a random input signal. 2. Adaptive forward modelling an unknown system. 3. Adaptive inverse modelling of an unknown system.

4. Adaptive interference cancelling which increases the signal-to-noise ratio of a signal by decreasing the noise power in the signal by attempting to eliminate the noise signals [2]. In practice, the adaptation algorithm is implemented through two classical methods, gradient method and least square (LMS algorithm, RLS algorithm). Due to the scope of this project, the Recursive Least Square algorithm is investigated in section 2.5.2 that follows. 2.5.2 The Recursive Least Square formulation The Recursive Least Square algorithm is based on the least square (LS) curve fitting problem where a curve is fit to a given set of data points by minimizing the sum of the squares of the errors from all points to the curve [11]. The data points are assumed to appear in a general nth order linear regression relationship of the form, + + + ( ) (2.31) with the measurement y (t) of x (t) assumed to be corrupted by zero mean value noise e (t). The corresponding observation equation taking the form: + + ( ) ( ) whereby: ( ) (2.32) and The least squares optimization after t samples is defined as: (2.33) In order to obtain the actual least values of the cost function, all the partial derivatives of with respect to each of the parameter estimates should be simultaneously set to zero. 1 2 where 2 ( )[ + denotes the gradient of ( ) ] ( ) 0 (2.34) (2.35) with respect to all the elements of a. As a result, (2.36) () If the matrix ( ) is non-singular and invertible, equation (2.36) above can be re-written as:

where: ( ) ( ) (2.37), and ( ) ( ) and ( ) can be re-defined as: Equations for ( ) 1 + ( ) ( ) 1 + (2.39) ( ) (2.40) In order to develop a recursive version of equation (2.37), equation (2.39) is multiplied by P (t) and then multiplied by P (t 1), to give [11], ( 1) + ( ) ( ) ( 1) (2.41) Multiplying by x (t) then yields, ( 1) + ( ) [1 + ( ) ( 1) 1 Followed by post multiplying by [1 + 1 [1 + 1 (2.42) ] (2.43) 1 ] 1 ], results in; 1 ( ) ( ) ( 1) (2.44) 1 (2.45) Substituting from equation (2.43); 1 ( 1) [1 + 1 ] Equation (2.45) above is termed as the matrix inversion lemma [12] since it provides an alternative to the matrix inversion required for the solution (2.37). It is now possible to derive the equivalent recursive equation for recursively updating the estimate of the parameter vector 1 1 ; 1+ 1 1 1 + (2.46) 1 1 ( 1) 1 1+ 1+ 1 1 1 + 1 1 ( 1) ( ) [1 + ^ ( ) ( 1) ( )]^( 1) { ^ ( ) ^ ( ) ( 1) ( ) ( ) [1 + ^ ( ) ( 1) ( )] ( )} (2.47) ( 1) + (2.48)

The final result therefore becomes 1 1 (2.49) Where 11 1 (2.50) Equations 2.45, 2.49 and 2.50 form the general RLS algorithm summarised as: The RLS algorithm is generally convenient in computation since it eliminates the need for direct matrix inversion for the solution (2.37). The requirements for the implementing the RLS algorithm involve initializing 0and 0for the estimated parameter vector and the matrix respectively. Normally from experimentation, 0is an arbitrary finite vector usually a zero vector while 0is a diagonal matrix generally having large diagonal elements (of the order 10 6 ) so as to yield convergence and performance proportional to the stage-wise solution of the same problem. It can be noted that there exists a one-to-one correspondence between the Kalman and RLS. The RLS algorithm can therefore be considered as a special case of the KF which can be applied to parameter estimation i.e. does not require prior knowledge of the system s measurement and process noises [13] [9]. This gives rise to the adaptive Kalman filter which offers rapid convergence of the adaptive estimator when neither the spectral characteristics of the system output nor the measurements y[t] are known beforehand.

2.5.3 The Adaptive Kalman filter (AKF) The adaptive Kalman filter offers a means of realizing a process for which the system model is not well defined [14]. The filter allows the model parameters to vary and adapts to the incoming data accordingly. Similar to the standard discrete Kalman filter whose parameters were assumed to be known, the adaptive Kalman filter achieves optimality by varying its parameters with every iteration while taking into account information provided for by the new measurements. If the unknown parameter is a random constant, the adaptive Kalman filter makes use of the measurement data to drive the model parameters towards the process parameter value. However, when the unknown parameter is time-varying, the adaptive Kalman filter allows the process parameters to track the model parameters. The model parameter varies slowly never reaching a steady value. The rate at which the adaptive Kalman filter tracks the time varying process is governed by an adaptive control that also ensures the filter remains adaptive. In the standard Kalman filter, the state estimates become more accurate with time resulting in a reduced error covariance to illustrate this accuracy. Consequently, the Kalman gain places less weight on newer measurements. In the adaptive case, the parameters might be varying with time thus limiting the accuracy of the state estimates. The adaptive control therefore increases the a priori error matrix hence reducing the confidence in the parameter estimates and preventing the Kalman gain from becoming too small to provide sufficient weight to new measurements. In this way, the filter is able to remain adaptive. The adaptive control is also called the weighting factor or the forgetting factor since it weights/emphasizes the recent the recent measurements and tends to forget the past. This property is responsible for the tracking capability of the adaptive algorithm. 2.5.4 Shaping the memory of the estimator. The main function of the adaptive Kalman filter is to vary the model parameters hence adapts to the incoming data accordingly. In order for such parametric variation to occur, the effect of obsolete data has to be eliminated from the system. This necessitates for a means of as restricting or shaping the estimator s memory to a restricted number of samples. The common procedures for shaping the memory of an estimation scheme are: Basing an estimation only on the most recent portion of the data, Weighting the data exponentially into the past with an exponential fading memory, characterized by a decay time constant Te. [11].

The above mentioned methods can best be visualized as moving window or weighting functions, as shown in Figure 5. Figure 5: Moving rectangular weighting The moving Rectangular Window (RW) approach involves selecting an observation span/window of s samples which provides estimates of the intended accuracy based on the nature of the application. In order to obtain new estimates, this method discards the old samples and takes up an equivalent number of new samples every time each solution based on s samples is completed. The moving Exponentially-Weighted-Past (EWP) window method, Figure 6 introduces Exponential-Weighting-into-the-Past into the least squares problem formulation by substituting the finite-time averaging operations in the moving rectangular window algorithm by EWP averaging. This can also be visualized as a an EWP least squares cost function J EWP of the form; (2.51)

Figure 6:Exponential data weighting Where 0 < λ< 1.0 is an exponential forgetting factor related to the time constant Te of the exponential weighting by the expression /. is the sampling interval in time units relevant to the specific application. When λ 1.0, JEWP is reduced to the usual least squares cost function in equation (2.33). The EWP algorithm therefore does not define its memory by the number of samples but relies on progressively reducing the importance attached to old data [11]. Minimization of equation (2.51) is done in a similar fashion as the recursive least square formulation in section 2.5.2 but taking into consideration the EWP averaging of the enclosed variables as opposed to the entire span 1 < i < k. This yields the final adaptive filter equations: Equation (2.52) is responsible for computing the next estimate while equation (2.53) computes the a posteriori error between the measurement and the generated estimate. Equation (2.54)

calculates the Kalman gain necessary for weighting on the filter s coefficients so as to attain optimality. Equations (2.55) and (2.56) are the update equations responsible for updating the inverse matrix and the filter coefficients respectively. CHAPTER THREE: METHODOLOGY 3.0 Introduction This chapter presents the implementation scheme of how the Kalman filter algorithm was used to model the different sets of hydrological data and make future predictions of their respective future courses to offer insight into the future hydrological capacity for KenGen. It also highlights how the decomposition process was carried out to disintegrate the various data sets into the various time series components for comparison purposes. For the implementation of this project, the simulation environment of choice was Matlab (software version R2013a).This decision was based on the availability of inbuilt specialized functions, advanced matrix computation capabilities and enhanced graph plotting options inherent in the Matlab environment coupled with the user friendly interface that facilitates convenient visualization of the data. In order to achieve the project s objectives, a series of design tasks was undertaken in several stages as highlighted below: 1. Modelling the filter design. 2. Initialization of the modelled filter and tracking the measurements by making one-step ahead predictions. 3. Tuning to ensure optimal filter performance. 4. Forecasting by making n-step ahead predictions.

5. Decomposition of the data sets to extract meaningful information. 3.1 Study area and data In order to be able to estimate the long term hydrological capacity for KenGen, hydrological data from Masinga power station, Kamburu power station and Turkwel power station were used to represent the entire country s hydroelectric situation. The hydrological monthly data was sourced from KenGen and contained dam level measurements for the respective hydroelectric plants for a period ranging from 2001 to 2010. Masinga Power Station is part of the seven fork scheme which is responsible for generating most of Kenya s hydroelectric power. The power station has a design capacity of 40 MW and a total capacity of 1.56 billion cubic meters. It is the largest dam in the seven fork scheme and is responsible for controlling water flow into the other subsequent dams located downstream. It is situated at along the border of Embu and Machakos counties and is supplied by Tana River, whose source is at the Central Kenya highlands. Kamburu Power Station is also a member of the seven fork scheme and has a design capacity of 93 MW and a reservoir capacity of 123 million cubic meters. It is also located at boarder of Machakos and Embu counties. Similar to Masinga Power Station, it is supplied by Tana River, whose source is at the Central Kenya highlands. Turkwel Power Station is one of the major hydro-electric power stations in Kenya and has a design capacity of 106MW. It is geographically located in the north-western region of Kenya in west Pokot County. It is served by the Turkwel River which originates from Mount Elgon along the Kenya-Uganda border. The selected power stations comprehensively represented the distribution of hydroelectricity generating rivers in Kenya. They offered an inclusive representation of the country s climatic zones. This information is illustrated in Figure 7 below:

Figure 7:Kenya s electricity generation For comparison purposes, the River Nile data obtained included the ancient records of the lowest annual water levels on the Nile River during 622-1284 that were made by the Nilometer at the island of Roda, near Cairo, Egypt and more contemporary data from Dangola in Sudan from 1900 to 2000 extracted from the Global River Discharge Database (RivDIS v1.1).

Figure 8: Map of the Nile River basin 3.2 Modelling the filter In modelling the unknown system represented by the different datasets, the adaptive prediction filter scheme utilizing the Kalman filter algorithm was implemented using an adaptive FIR structure illustrated in figure 9.

Figure 9: Adaptive predictor design The FIR filter structure was used to generate the next prediction based on the input data sequence and its filter coefficients as varied by the Kalman filter algorithm. The FIR filter was defined by equation (3.1) below, (3.1) The FIR filter was therefore arranged in a transversal structure that introduced a delay to the input sequence x [t] as illustrated in figure 10. The optimal number of taps for the filter structure were then determined as highlighted in subsection 3.5.1. Figure 10:FIR Transversal Structure The desired system response d(t), was the plot generated from the various data sets that were under investigation. The filters input vector successive water level samples as presented by the respective datasets. 3.3 Initializing of the filter parameters The following parameters were initialized as follows; consisted of the

The filter coefficient vector elements set to zero i.e. were always initialized with their [0 0 0 ] since there was no a priori information on the filter coefficients. The initial inverse covariance matrix P was always set to an identity matrix with the filter length dimensions (filter length * filter length) and a pre-multiplier of 106. The filter length was initially selected arbitrarily ranging from two to the entire length of the dataset under investigation. The forgetting factor of 0.9 and an inverse covariance vector pre-multiplier of 106. The forgetting factor was arbitrarily set to 0.95 so as to put more weight on the most recent samples. Using the initialized values, the Matlab function in appendix A.1 was executed to implement the adaptive Kalman filter algorithm both for modelling the unknown system and forecasting probable future measurements. The obtained simulations were however not optimal hence the need for tuning as explained in section 3.5. 3.4 Training and forecasting The prediction problem was considered as a modeling problem in which a model was built between the filter s input and output to describe the various datasets. 3.4.1 One - step ahead prediction The model was constructed through the one-step ahead predictions that relied on the available stream of measured data that served to correct the estimated values and optimize the filter parameters. This was the process of filtering. Every estimated measurements was modelled as a function of the previous measured data. This is illustrated in equation (3.2) below: ( 1 2 ) (3.2) In order to develop a comprehensive regression model, the previous data contained in the memory of the filter was used to formulate the filter s input vector. This input vector was then multiplied with the filter coefficients which were a function of the Kalman gain and the forgetting factor which guaranteed its optimality at every time step. 3.4.2 n-step ahead prediction In the implementation of long-term prediction, the filter was required to make n-step ahead predictions into the future, where n represents the length of the prediction. However, there were

no measurements present to model the desired system response. The developed model was used to predict the future measurements based on the previous values as dictated by the adaptive algorithm. The predicted values were therefore used in the place of the unknown measurements as illustrated by equation (3.3) below; + 1 (, 1, 2 ) (3.3) This approach facilitated the recursive prediction of new probable values from yˆ (t +1) to yˆ (t + n). 3.5 Tuning Tuning was performed to investigate the effectiveness of the adaptive Kalman filters for recovery of the desired system response in the presence of noise. The varied parameters were the forgetting factor, the initial inverse covariance matrix and the number of taps used. Since adaptive filters operate by attempting to reduce a cost function, the filter parameters were varied to reduce the Mean Square Error which is the cost function associated with the Kalman filter algorithm. 3.5.1 Determining the optimal filter length The Matlab program presented in appendix A.2 was executed to calculate the Mean Square Error associated with various filter lengths ranging from two to the length of the data set under investigation. For every filter length being considered, the corresponding MSE was also computed according to equation (2.51) repeated below for convenience. ( ) (2.51) From equation (2.51) the summation limits varied from the selected filter length+1 up until the dataset length. The errors involved during the tracking / training process were not used to compute the MSE. ( ) A plot of filter length against the corresponding MSE was then made and used to judiciously select the number of filter taps. The filter length was further selected to offer ample memory length to enable long term future prediction. This was done by investigating the MSE against filter length plots together with the predictions against time plots for the dataset under

investigation. A trade-off between the minimal MSE and sufficient memory length was therefore made to select the optimal filter length. 3.5.2 Determining the forgetting factor The forgetting factor was varied between one and zero. It was noted that higher forgetting factors yielded better responses hence settling at a value of 0.95. 3.5.3 Determining the inverse covariance matrix The order of the initial inverse covariance matrix was varied for different filter lengths and forgetting factors. It was then left constant at 10 where I is an identity matrix of the filter length s order since this always yielded the best simulations. 3.6 Smoothing and Decomposition of the time series Smoothing was conducted for post processing all the data obtained from modelling and forecasting. The simulation results were smoothened using a 13-term moving average filter whose Matlab code is listed in Appendix A.3. This smoothing technique was utilized because the data exhibited seasonal variations with periodicities of 12 (measurements within a span of one year). Smoothing revealed the underlying trend in the prediction data. The data was then decomposed into its seasonal component which was then removed from the rest of the data so as to clearly display the underlying trend.

CHAPTER FOUR: RESULTS AND ANALYSIS 4.1 Masinga time series Data length (N) 101 measurements Filter length 69 MSE 41.89 Figure 11: Predicted Masinga dam levels The optimal filter length was 69, arrived upon after calculating the MSE for all possible filter lengths. The MSE for different filter lengths is provided in Figure 12.

Figure 12: MSE for different filter lengths The decomposition yielded the results in figure 13 below, Figure 13: Decomposition of the Masinga time series 4.2 Kamburu time series Data length (N) 101 measurements Filter length 70 MSE 2.9526

Figure 14:Dam levels- Kamburu time series Figure 15: Mean Square Error for different filter lengths- Kamburu time series

Figure 16: Decomposition of the Kamburu time series 4.3 Turkwel time series Data length (N) 101 measurements Filter length 70 MSE 10.5575 Figure 17: Dam levels- Turkwel time series

Figure 18: Mean Square Error for different filter lengths- Turkwel time series Figure 19: Decomposition of the Turkwel time series

4.4 Nile time series Data length (N) 362 measurements Filter length 198 Resulting MSE 8.8326 Figure 20: Nile water levels Figure 21: Mean Square Error for different filter lengths.