Kalman Filters and Adaptive Windows for Learning in Data Streams

Similar documents
Kalman Filters and Adaptive Windows for Learning in Data Streams

Kalman Filters and Adaptive Windows for Learning in Data Streams

Introduction to Source Coding

Chapter 4 SPEECH ENHANCEMENT

CS188 Spring 2014 Section 3: Games

On Kalman Filtering. The 1960s: A Decade to Remember

Wireless Network Delay Estimation for Time-Sensitive Applications

Anomaly Detection based Secure In-Network Aggregation for Wireless Sensor Networks

Outlier-Robust Estimation of GPS Satellite Clock Offsets

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

Knowledge discovery & data mining Classification & fraud detection

IMPROVEMENTS TO A QUEUE AND DELAY ESTIMATION ALGORITHM UTILIZED IN VIDEO IMAGING VEHICLE DETECTION SYSTEMS

Communication Theory II

Skip Lists S 3 S 2 S 1. 2/6/2016 7:04 AM Skip Lists 1

UNIVERSITY OF MICHIGAN DEPARTMENT OF ELECTRICAL ENGINEERING : SYSTEMS EECS 555 DIGITAL COMMUNICATION THEORY

Chapter 7: Sorting 7.1. Original

Change detection with Kalman Filter and CUSUM

Adaptive Filters Linear Prediction

Lecture5: Lossless Compression Techniques

Fundamentals of Statistical Monitoring: The Good, Bad, & Ugly in Biosurveillance

Mikko Myllymäki and Tuomas Virtanen

Privacy preserving data mining multiplicative perturbation techniques

Digital Audio. Lecture-6

Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers

Closing the loop around Sensor Networks

Picking microseismic first arrival times by Kalman filter and wavelet transform

Background Pixel Classification for Motion Detection in Video Image Sequences

Chapter 1. Probability

Kalman Tracking and Bayesian Detection for Radar RFI Blanking

Energy Measurement in EXO-200 using Boosted Regression Trees

Information Theory and Communication Optimal Codes

Data Dissemination and Broadcasting Systems Lesson 06 Adaptive Dispersal Algorithms, Bandwidth allocation and Scheduling

Statistics, Probability and Noise

State-Space Models with Kalman Filtering for Freeway Traffic Forecasting

A. Siffer, P-A Fouque, A. Termier and C. Largouet April 26, 2017

Bandit Algorithms Continued: UCB1

The fundamentals of detection theory

The tenure game. The tenure game. Winning strategies for the tenure game. Winning condition for the tenure game

Wireless Communication Technologies (16:332:546)

Topic 23 Red Black Trees

Pulse Code Modulation

Dipl.-Ing. Wanda Benešová PhD., vgg.fiit.stuba.sk, FIIT, Bratislava, Vision & Graphics Group. Kalman Filter

Travel time uncertainty and network models

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Applications of Monte Carlo Methods in Charged Particles Optics

Transient detection and classification in energy meters. M. Nagaraju, M. Naresh and S. Jayasimha Signion Systems Ltd., Hyderabad

Report 3. Kalman or Wiener Filters

Lecture - 06 Large Scale Propagation Models Path Loss

Permutation Editing and Matching via Embeddings

Automatic High Dynamic Range Image Generation for Dynamic Scenes

Animation Demos. Shows time complexities on best, worst and average case.

Estimation Theory - ENEL 625 Project as a sub for Assignment Five

Merge Sort. Note that the recursion bottoms out when the subarray has just one element, so that it is trivially sorted.

Kalman Filtering, Factor Graphs and Electrical Networks

Animation Demos. Shows time complexities on best, worst and average case.

k-means Clustering David S. Rosenberg December 15, 2017 Bloomberg ML EDU David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, / 18

Efficiency and detectability of random reactive jamming in wireless networks

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

Wireless Sensor Network Assited Fire Detection And Prevention With Classification Algorithms

CSc 110, Spring Lecture 40: Sorting Adapted from slides by Marty Stepp and Stuart Reges

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

CS510 \ Lecture Ariel Stolerman

Effective prediction of dynamic bandwidth for exchange of Variable bit rate Video Traffic

THE use of balanced codes is crucial for some information

Divide & conquer. Which works better for multi-cores: insertion sort or merge sort? Why?

Fourier Analysis and Change Detection. Dynamic Network Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 6 Lecture - 37 Divide and Conquer: Counting Inversions

Entropy, Coding and Data Compression

Lectures: Feb 27 + Mar 1 + Mar 3, 2017

Permutations with short monotone subsequences

Algorithms and Data Structures CS 372. The Sorting Problem. Insertion Sort - Summary. Merge Sort. Input: Output:

Paper Presentation. Steve Jan. March 5, Virginia Tech. Steve Jan (Virginia Tech) Paper Presentation March 5, / 28

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

A Brief Introduction to Information Theory and Lossless Coding

Cognitive Radar Experiments At The Ohio State University. Graeme E. Smith The OSU ElectroScience Lab

SSB Debate: Model-based Inference vs. Machine Learning

Population Structure and Genealogies

Efficient RFID Data Cleaning Method

Supplementary Materials for

Honors Algebra 2 Assignment Sheet - Chapter 1

TSIN01 Information Networks Lecture 9

Data processing framework for decision making

Link State Routing. Brad Karp UCL Computer Science. CS 3035/GZ01 3 rd December 2013

GIDE: Graphical Image Deblurring Exploration

Small power load disaggregation in office buildings based on electrical signature classification

Research Article n-digit Benford Converges to Benford

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

Information Management course

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

COMP Online Algorithms. Paging and k-server Problem. Shahin Kamali. Lecture 11 - Oct. 11, 2018 University of Manitoba

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Scheduling and Communication Synthesis for Distributed Real-Time Systems

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

Jitter in Digital Communication Systems, Part 1

Math 58. Rumbos Fall Solutions to Exam Give thorough answers to the following questions:

Module 8. Some multi-sample examples. Prof. Stephen B. Vardeman Statistics and IMSE Iowa State University. March 5, 2008

CODING TECHNIQUES FOR ANALOG SOURCES

Virtual Digital Control Experimental System

Characteristics of Routes in a Road Traffic Assignment

Introduction to Spring 2009 Artificial Intelligence Final Exam

Transcription:

Kalman Filters and Adaptive Windows for Learning in Data Streams Albert Bifet Ricard Gavaldà Universitat Politècnica de Catalunya DS 06 Barcelona A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 1 / 29

Outline 1 Introduction 2 The Kalman Filter and the CUSUM Test 3 The ADWIN Algorithm 4 General Framework 5 K-ADWIN 6 Experimental Validation of K-ADWIN 7 Conclusions A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 2 / 29

Introduction Introduction Data Streams Sequence potentially infinite High amount of data: sublinear space High Speed of arrival: small constant time per example Estimation and prediction Distribution and concept drift K-ADWIN : Combination Kalman filter ADWIN : Adaptive window of recently seen data items. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 3 / 29

Introduction Introduction Problem Given an input sequence x 1, x 2,..., x t,... we want to output at instant t a prediction x t+1 minimizing prediction error: x t+1 x t+1 considering distribution changes overtime. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 4 / 29

Introduction Time Change Detectors and Predictors: A General Framework x t Estimator Estimation A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 5 / 29

Introduction Time Change Detectors and Predictors: A General Framework x t Estimator Change Detect. Estimation Alarm A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 5 / 29

Introduction Time Change Detectors and Predictors: A General Framework x t Estimator Change Detect. Estimation Alarm Memory A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 5 / 29

Introduction Introduction Our generic proposal: Use change detector Use memory Our particular proposal: K-ADWIN Kalman filter as estimator Use ADWIN as change detector with memory [BG06] Application Estimate statistics from data streams In Data Mining Algorithms based on counters, replace them for estimators. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 6 / 29

Introduction Data Mining Algorithms with Concept Drift No Concept Drift Concept drift input DM Algorithm Counter 5 Counter 4 Counter 3 output input DM Algorithm Static Model output Counter 2 Counter 1 Change Detect. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 7 / 29

Introduction Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input Counter 5 output input Estimator 5 output Counter 4 Estimator 4 Counter 3 Estimator 3 Counter 2 Estimator 2 Counter 1 Estimator 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 7 / 29

The Kalman Filter and the CUSUM Test The Kalman Filter Optimal recursive algorithm Minimum mean-square error estimator Estimate the state x R n of a discrete-time controlled process x k = Ax k 1 + Bu k + w k 1 with a measurement z R m that is Z k = Hx k + v k. The random variables w k and v k represent the process and measurement noise (respectively). They are assumed to be independent (of each other), white, and with normal probability distributions p(w) N(0, Q) p(v) N(0, R). A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 8 / 29

The Kalman Filter The Kalman Filter and the CUSUM Test The difference equation of our discrete-time controlled process is K k = P k 1 /(P k 1 + R) X k = X k 1 + K k (z k X k 1 ) P k = P k (1 K k ) + Q A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 9 / 29

The Kalman Filter and the CUSUM Test The Kalman Filter The difference equation of our discrete-time controlled process is K k = P k 1 /(P k 1 + R) X k = X k 1 + K k (z k X k 1 ) P k = P k (1 K k ) + Q The performance of the Kalman filter depends on the accuracy of the a-priori assumptions: linearity of the difference stochastic equation estimation of covariances Q and R, assumed to be fixed, known, and follow normal distributions with zero mean. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 9 / 29

The Kalman Filter and the CUSUM Test The CUSUM Test The cumulative sum (CUSUM algorithm),is a change detection algorithm that gives an alarm when the mean of the input data is significantly different from zero. The CUSUM test is memoryless, and its accuracy depends on the choice of parameters υ and h. It is as follows: g 0 = 0, g t = max (0, g t 1 + ɛ t υ) if g t > h then alarm and g t = 0 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 10 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 1 W 1 = 01010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 10 W 1 = 1010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 101 W 1 = 010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 1010 W 1 = 10110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 10101 W 1 = 0110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 101010 W 1 = 110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 1010101 W 1 = 10111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 W 0 = 10101011 W 1 = 0111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 ˆµ W0 ˆµ W1 ɛ c : CHANGE DETECTED! W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 101010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Example W = 01010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 ɛ c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 11 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 1 01010110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 10 1010110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 101 010110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 1010 10110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 10101 0110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 101010 110111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 1010101 10111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 10101011 0111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 101010110 111111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 1010101101 11111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 10101011011 1111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 101010110111 111 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 1010101101111 11 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004 Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004 ADWIN: All Adjacent subwindows 10101011011111 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 12 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] ADWIN has rigorous guarantees On ratio of false positives On ratio of false negatives On the relation of the size of the current window and change rates A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 13 / 29

The ADWIN Algorithm Algorithm ADWIN [BG06] Theorem At every time step we have: 1 (Few false positives guarantee) If µ t remains constant within W, the probability that ADWIN shrinks the window at this step is at most δ. 2 (Few false negatives guarantee) If for any partition W in two parts W 0 W 1 (where W 1 contains the most recent items) we have µ W0 µ W1 > ɛ, and if ɛ 4 3 max{µ W0, µ W1 } ln 4n min{n 0, n 1 } δ then with probability 1 δ ADWIN shrinks W to W 1, or shorter. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 14 / 29

The ADWIN Algorithm Data Streams Algorithm ADWIN2 [BG06] ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Sliding Window Model 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 15 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Insert new Item 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 16 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Insert new Item 1010101 101 11 1 1 1 Content: 4 2 2 1 1 1 Capacity: 7 3 2 1 1 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 16 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Compressing Buckets 1010101 101 11 1 1 1 Content: 4 2 2 1 1 1 Capacity: 7 3 2 1 1 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 16 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Compressing Buckets 1010101 101 11 11 1 Content: 4 2 2 2 1 Capacity: 7 3 2 2 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 17 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Compressing Buckets 1010101 101 11 11 1 Content: 4 2 2 2 1 Capacity: 7 3 2 2 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 17 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Compressing Buckets 1010101 10111 11 1 Content: 4 4 2 1 Capacity: 7 5 2 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 18 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Detecting Change: Delete last Bucket 1010101 10111 11 1 Content: 4 4 2 1 Capacity: 7 5 2 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 19 / 29

Algorithm ADWIN2 The ADWIN Algorithm ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ɛ log W ) memory words the processing time per example is O(log W ) (amortized) and O(log 2 W ) (worst-case). Detecting Change: Delete last Bucket 10111 11 1 Content: 4 2 1 Capacity: 5 2 1 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 19 / 29

General Framework General Framework Time Change Detectors and Predictors : Type I Example (Kalman Filter) x t Estimator Estimation A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 20 / 29

General Framework General Framework Time Change Detectors and Predictors : Type II Example (Kalman Filter + CUSUM) x t Estimator Change Detect. Estimation Alarm A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 20 / 29

General Framework General Framework Time Change Detectors and Predictors : Type III Example (Adaptive Kalman Filter) x t Estimator Estimation Memory A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 20 / 29

General Framework General Framework Time Change Detectors and Predictors : Type IV Example (ADWIN, Kalman Filter+ADWIN) x t Estimator Change Detect. Estimation Alarm Memory A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 20 / 29

General Framework Time Change Detectors and Predictors: A General Framework No memory Memory No Change Type I Type III Detector Kalman Filter Adaptive Kalman Filter Change Type II Type IV Detector Kalman Filter + CUSUM ADWIN Kalman Filter + ADWIN A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 21 / 29

General Framework Time Change Detectors and Predictors: A General Framework No memory Memory No Change Type I Type III Detector Kalman Filter Adaptive Kalman Filter Q,R estimated from window Change Type II Type IV Detector Kalman Filter + CUSUM ADWIN Kalman Filter + ADWIN Q,R estimated from window A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 21 / 29

K-ADWIN K-ADWIN = ADWIN + Kalman Filtering x t Kalman ADWIN Estimation Alarm ADWIN Memory R = W 2 /50 and Q = 200/W, where W is the length of the window maintained by ADWIN. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 22 / 29

Experimental Validation of K-ADWIN Tracking Experiments KALMAN: R=1000;Q=1 Error= 854.97 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 23 / 29

Experimental Validation of K-ADWIN Tracking Experiments ADWIN : Error= 674.66 KALMAN: R=1000;Q=1 Error= 854.97 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 23 / 29

Experimental Validation of K-ADWIN Tracking Experiments K-ADWIN Error= 530.13 ADWIN : Error= 674.66 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 23 / 29

Naïve Bayes Experimental Validation of K-ADWIN Data set that describes the weather conditions for playing some game. Example outlook temp. humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes Assume we have to classify the following new instance: outlook temp. humidity windy play sunny cool high true? A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 24 / 29

Naïve Bayes Experimental Validation of K-ADWIN Assume we have to classify the following new instance: outlook temp. humidity windy play sunny cool high true? We classify the new instance: ν NB = arg max P(ν j)p(sunny ν j )P(cool ν j )P(high ν j )P(true ν j ) ν {yes,no} Conditional probabilities can be estimated directly as frequencies: P(a i ν j ) = number of instances with attribute a i and class ν j total number of training instances with class ν j Create one estimator for each frequence that needs estimation A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 24 / 29

Experimental Validation of K-ADWIN Experimental Validation of K-ADWIN We test Naïve Bayes Predictor and k-means clustering Method: replace counters by estimators Synthetic data where change is controllable Naïve Bayes: We compare accuracy of Static model: Training of 1000 samples every instant Dynamic model: replace probabilities counters by estimators computing the ratio %Dynamic Static with tests using 2000 samples. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 25 / 29

Experimental Validation of K-ADWIN Naïve Bayes Predictor Width %Static %Dynamic % Dynamic/Static ADWIN 83,36% 80,30% 96,33% Kalman Q = 1, R = 1000 83,22% 71,13% 85,48% Kalman Q = 1, R = 1 83,21% 56,91% 68,39% Kalman Q =.25, R =.25 83,26% 56,91% 68,35% Adaptive Kalman 83,24% 76,21% 91,56% CUSUM Kalman 83,30% 50,65% 60,81% K-ADWIN 83,24% 81,39% 97,77% Fixed-sized Window 32 83,28% 67,64% 81,22% Fixed-sized Window 128 83,30% 75,40% 90,52% Fixed-sized Window 512 83,28% 80,47% 96,62% Fixed-sized Window 2048 83,24% 82,19% 98,73% A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 26 / 29

Experimental Validation of K-ADWIN k-means Clustering σ = 0.15 Width Static Dynamic ADWIN 9,72 21,54 Kalman Q = 1, R = 1000 9,72 19,72 Kalman Q = 1, R = 100 9,71 17,60 Kalman Q =.25, R =.25 9,71 22,63 Adaptive Kalman 9,72 18,98 CUSUM Kalman 9,72 18,29 K-ADWIN 9,72 17,30 Fixed-sized Window 32 9,72 25,70 Fixed-sized Window 128 9,72 36,42 Fixed-sized Window 512 9,72 38,75 Fixed-sized Window 2048 9,72 39,64 Fixed-sized Window 8192 9,72 43,39 Fixed-sized Window 32768 9,72 53,82 A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 27 / 29

Experimental Validation of K-ADWIN Results No estimator ever does much better than K-ADWIN K-ADWIN does much better than every other estimators in at least one context. Tracking problem K-ADWIN and ADWIN automatically do about as well as the Kalman filter with the best set of fixed covariance parameters. Naïve Bayes and k-means: K-ADWIN does somewhat better than ADWIN and far better than any memoryless Kalman filter. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 28 / 29

Conclusions Conclusions and Future Work K-ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters. Better results than either memoryless Kalman Filtering or sliding windows with linear estimators. Future work : Tests on real-world, not only synthetic data. Other learning algorithms: algorithms for induction of decision trees. A. Bifet, R. Gavaldà (UPC) Kalman Filters and Adaptive Windows for Learning in Data Streams DS 06 Barcelona 29 / 29