CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University

Similar documents
Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Learning Ensembles of Convolutional Neural Networks

Target Response Adaptation for Correlation Filter Tracking

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Networks. Backpropagation. Backpropagation. Introduction to. Backpropagation Network training. Backpropagation Learning Details 1.04.

Algorithms Airline Scheduling. Airline Scheduling. Design and Analysis of Algorithms Andrei Bulatov

Review: Our Approach 2. CSC310 Information Theory

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Profile Optimization of Satellite Antenna for Angular Jerk Minimization

Yutaka Matsuo and Akihiko Yokoyama. Department of Electrical Engineering, University oftokyo , Hongo, Bunkyo-ku, Tokyo, Japan

NETWORK 2001 Transportation Planning Under Multiple Objectives

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

ANNUAL OF NAVIGATION 11/2006

arxiv: v1 [cs.lg] 8 Jul 2016

Lecture 3: Multi-layer perceptron

Adaptive Modulation for Multiple Antenna Channels

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality.

Integer Programming. P.H.S. Torr Lecture 5. Integer Programming

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Optimizing Transmission Lengths for Limited Feedback with Non-Binary LDPC Examples

Optimization of Ancillary Services for System Security: Sequential vs. Simultaneous LMP calculation

ROBUST IDENTIFICATION AND PREDICTION USING WILCOXON NORM AND PARTICLE SWARM OPTIMIZATION

Power Minimization Under Constant Throughput Constraint in Wireless Networks with Beamforming

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game

Joint Power Control and Scheduling for Two-Cell Energy Efficient Broadcasting with Network Coding

Estimation of Solar Radiations Incident on a Photovoltaic Solar Module using Neural Networks

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

Unit 1. Current and Voltage U 1 VOLTAGE AND CURRENT. Circuit Basics KVL, KCL, Ohm's Law LED Outputs Buttons/Switch Inputs. Current / Voltage Analogy

Subcarrier allocation for OFDMA wireless channels using lagrangian relaxation methods

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET)

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J.

Joint Rate-Routing Control for Fair and Efficient Data Gathering in Wireless sensor Networks

Understanding the Spike Algorithm

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Phoneme Probability Estimation with Dynamic Sparsely Connected Artificial Neural Networks

Allocation of capacitor banks in distribution systems using multi-objective function

Yarn tenacity modeling using artificial neural networks and development of a decision support system based on genetic algorithms

Stochastic Programming Models for Optimization of Surgery Delivery Systems

A GBAS Testbed to Support New Monitoring Algorithms Development for CAT III Precision Approach

Decision aid methodologies in transportation

Breast Cancer Detection using Recursive Least Square and Modified Radial Basis Functional Neural Network

STRATEGIES TO SUPPORT AMBULANCE SCHEDULING WITH EFFICIENT ROUTING SERVICES

The Effect Of Phase-Shifting Transformer On Total Consumers Payments

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Chinese Remainder. Discrete Mathematics Andrei Bulatov

Chapter 1. On-line Choice of On-line Algorithms. Yossi Azar Andrei Z. Broder Mark S. Manasse

Introduction to Coalescent Models. Biostatistics 666 Lecture 4

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Coverage Maximization in Mobile Wireless Sensor Networks Utilizing Immune Node Deployment Algorithm

arxiv: v1 [cs.lg] 22 Jan 2016 Abstract

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance

Equivalent Circuit Model of Electromagnetic Behaviour of Wire Objects by the Matrix Pencil Method

Introduction to Coalescent Models. Biostatistics 666

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

New Applied Methods For Optimum GPS Satellite Selection

Sorting signed permutations by reversals, revisited

Joint Optimization of Electricity and Communication Cost for Meter Data Collection in Smart Grid

Controlled Random Search Optimization For Linear Antenna Arrays

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

New Wavelet Based Performance Analysis and Optimization of Scalable Joint Source/Channel Coder (SJSCC & SJSCCN) for Time-Varying Channels.

GP-based Design and Optimization of a Floating Voltage Source for Low-Power and Highly Tunable OTA Applications

Robust TDOA Passive Location Using Interval Analysis and Contractor Programming

Intelligent and Robust Genetic Algorithm Based Classifier

Utility Maximization for Uplink MU-MIMO: Combining Spectral-Energy Efficiency and Fairness

ESTIMATION OF DIVERGENCES IN PRECAST CONSTRUCTIONS USING GEODETIC CONTROL NETWORKS

Joint Backup Capacity Allocation and Embedding for Survivable Virtual Networks

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

Adaptive System Control with PID Neural Networks

Short Term Load Forecasting based on An Optimized Architecture of Hybrid Neural Network Model

Investigation of Hybrid Particle Swarm Optimization Methods for Solving Transient-Stability Constrained Optimal Power Flow Problems

MTBF PREDICTION REPORT

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Dynamic Lightpath Protection in WDM Mesh Networks under Risk-Disjoint Constraint

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Optimal Reconfiguration of Distribution System by PSO and GA using graph theory

Predicting Freeway Travelling Time Using Multiple- Source Data

Secure Transmission of Sensitive data using multiple channels

Kernels and Support Vector Machines

New Parallel Radial Basis Function Neural Network for Voltage Security Analysis

The PID Controller Based on the Artificial Neural Network and the Differential Evolution Algorithm

Optimum Allocation of Distributed Generations Based on Evolutionary Programming for Loss Reduction and Voltage Profile Correction

RUNWAY SCHEDULE DETERMINATION BY SIMULATION OPTIMIZATION. Thomas Curtis Holden Frederick Wieland

Dynamic Lightpath Protection in WDM Mesh Networks under Wavelength Continuity Constraint

Resource Allocation for Throughput Enhancement in Cellular Shared Relay Networks

Comparison of Gradient descent method, Kalman Filtering and decoupled Kalman in training Neural Networks used for fingerprint-based positioning

Probable Optimization of Reactive Power in distribution systems, in presence of distributed generation sources conjugated to network and islanding

Research Article Semidefinite Relaxation Algorithm for Multisource Localization Using TDOA Measurements with Range Constraints

Full-duplex Relaying for D2D Communication in mmwave based 5G Networks

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

Applying Rprop Neural Network for the Prediction of the Mobile Station Location

th year, No., Computational Intelligence in Electrical Engineering,

Combining Fitness-based Search and User Modeling in Evolutionary Robotics

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

Transcription:

CS345a: Data Mnng Jure Leskovec and Anand Rajaraman Stanford Unversty

HW3 s out Poster sesson s on last day of classes: Thu March 11 at 4:15 Reports are due March 14 Fnal s March 18 at 12:15 Open book, open notes No laptop 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 2

Whch h s best tlnear separator? + + + + + + Data: Examples: (x 1, y 1 ), (x n, y n ) Example : x =(x (1) 1,, x (d) 1 ) y {1, {, +1} Inner product: x= 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 3

+ + + + + + + x=0 Confdence: fd =(x )y For alldataponts: = 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 4

Maxmze the margn: + + Good accordng to ntuton, theory & practce + max, + + + + s. t., y ( x ) + x=0 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 5

Canoncal hyperplanes: Projecton of x on plane x=0: x x 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 6

Maxmzng the margn: max, s. t., y ( x ) Equvalent: mn 2 s. t., y ( x ) 1 SVM th hard constrants 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 7

If data not separable ntroduce penalty 1 mn C # number of 2 s. t., y ( x ) 1 Choose C based on cross valdaton Ho to penalze mstakes? tk mstakes + + + + + + + x=0 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 8

Introduce slack varables : mn 1 C n, 0 2 1 + s. t., y Hnge loss: ( x ) 1 + + + + + + + x=0 For each datapont: If margn>1, don t care If margn<1, pay lnear penalty 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 9

SVM n the natural form arg mn Where: f f () n 1 ( ) C max{0,1 2 1 y ( x )} 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 10

Use quadratc solver: n 1 Use quadratc solver: Mnmze quadratc functon S bj tt l t t n x y t s C 1 ) ( 2 1 mn 1 0, Subject to lnear constrants Stochastc gradent descent: M x y t s 1 ) (,.. Mnmze: n x y C f )} ( max{0,1 2 1 ) ( Update: y x L f t t ), ( ) ( ' 1 2 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 11 f t t ) (

Example by Leon Bottou: Reuters RCV1 document corpus m=781k tranng examples, 23k test texamples d=50k features Tranng tme: 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 12

3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 13

What f e subsample the dataset? SGD on full dataset vs. Conjugate gradent on n tranng examples 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 14

Need to choose learnng rate : t Leon suggests: L'( ) 1 t t Select small subsample Try varous rates Pck the one that t most reduces the loss Use for next 100k teratons on the full dataset 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 15

Stoppng crtera: Ho many teratons of SGD? Early stoppng th cross valdaton Create valdaton set Montor cost functon on the valdaton set Stop hen loss stops decreasng Early stoppng a pror Extract to dsjont subsamples A and B of tranng data Determne the number of epochs k by tranng on A, stop by valdatng on B Tran for k epochs on the full dataset 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 16

Kernel functon: K(x,x j ) = (x ) (x j ) Does the SVM kernel trck stll ork? Yes (but not thout a prce): Represent th ts kernel expanson: = (x ) Usually: dl()/d= (x j ) Then update at epoch t by combnng : t = (1 ) t + 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 17

[ShalevShartz et al. ICML 07] We had before: Can replace C th : 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 18

[ShalevShartz et al. ICML 07] A t = S At t = 1 Subgradent method Stochastc gradent Subgradent Projecton 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 19

[ShalevShartz et al. ICML 07] Choosng A =1 n t and a lnear kernel over R Theorem [ShalevShartz et al. 07]: Runtme requred for Pegasos to fnd accurate soluton th prob. >1 Runtme depends d on number of features n Does not depend on #examples m Depends on dffculty of problem ( and ) 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 20

SVM and structured output predcton Settng: Assume: Data s..d. from Gven: Tranng sample Goal: Fnd functon from nput space X to output Y Complex objects 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 21

Examples: Natural Language Parsng Gven a sequence of ords x, predct the parse tree y Dependences from structural constrants, snce y has to be a tree y S x The dog chased the cat NP VP NP Det N V Det N 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 22

Approach: ve as multclass classfcaton task Every complex output s one class Problems: Exponentally many classes! Ho to predct effcently? Ho to learn effcently? Potentally huge model! Manageable number of features? y 1 V VP N S V VP Det NP N x The dog chased the cat y 2 NP Det N S V VP Det NP N y k VP Det N NP S V Det NP N 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 23

Feature vector descrbes match beteen x and y Learn sngle eght vector and rank by Hardmargn optmzaton problem: 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 24

[Yue et al., SIGIR 07] Rankng: Gven a query x, predct a rankng y. Dependences beteen bt results (e.g. avod redundant hts) Loss functon over rankngs (e.g. AvgPrec) x y 1. KernelMachnes SVM 2. SVMLght 3. Learnng th h Kernels 4. SV Meppen Fan Club 5. Servce Master & Co. 6. School of Volunteer Management 7. SV Mattersburg Onlne 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 25

[Yue et al., SIGIR 07] Gven: a complete (eak) rankng of documents for a query Predct: rankng for the nput query and document set The true labelng s a rankng here the relevant documents are all ranked n the front, eg e.g., An ncorrect labelng s any other rankng, e.g., g, There are ntractable many rankngs, thus an ntractable t number of constrants! t 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 26

[Yue et al., SIGIR 07] Let x s a set of documents/query examples Let y denote a eak rankng (parse orderngs) y j {1, +1} j 2 SVM objectve functon: C Constrants t are df defned dfor each ncorrect rankng y over the set of documents x: 1 2 y' y : T ( y, x) T ( y', x) ( y, y') s the match beteen target and predcton 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 27

[Yue et al., SIGIR 07] Loss: Average precson s the average of the precson scores at the rank locatons of each relevant document. Ex: has average precson 1 3 1 1 2 3 3 5 0.76 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 28

[Yue et al., SIGIR 07] Maxmze: subject to: here: and: 1 2 2 y' ( y', x) C y : : rel j:! rel T T ( y, x ) ( y', x ) ( y, y') y' j ( x x ( y, y') 1 AvgPrec( y') j ) After learnng, predct by sortng on x 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 29

[Yue et al., SIGIR 07] Orgnal SVM Problem Exponental constrants t Most are domnated by a small set of mportant constrants Structural SVM Approach Repeatedly fnds the next most volated constrant untl set of constrants s a good approxmaton. 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 30

[Yue et al., SIGIR 07] Orgnal SVM Problem Exponental constrants t Most are domnated by a small set of mportant constrants Structural SVM Approach Repeatedly fnds the next most volated constrant untl set of constrants s a good approxmaton. 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 31

[Yue et al., SIGIR 07] Orgnal SVM Problem Exponental constrants t Most are domnated by a small set of mportant constrants Structural SVM Approach Repeatedly fnds the next most volated constrant untl set of constrants s a good approxmaton. 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 32

[Yue et al., SIGIR 07] Orgnal SVM Problem Exponental constrants t Most are domnated by a small set of mportant constrants Structural SVM Approach Repeatedly fnds the next most volated constrant untl set of constrants s a good approxmaton. 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 33

Input: REPEAT FOR Compute ENDFOR IF _ Fnd most volated constrant t Volated by more than? ENDIF optmze StructSVMover Add constrant to orkng set UNTIL has not changed durng teraton [Jo06] [JoFnYu08] 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 34

Cuttng plane algorthm: STEP 1: Solve the SVM objectve functon usng only the current orkng set of constrants STEP 2: Usng the model learned n STEP 1, fnd the most volated constrant from the exponental set of constrants STEP 3: If the constrant returned n STEP 2 s more volated than the most volated constrant the orkng set by some small constant, add that constrant to the orkng set Repeat STEP 13 untl no addtonal constrants are added. Return the most recent model that as traned n STEP 1. STEP 13 s guaranteed to loop for at most a polynomal number of teratons. [Tsochantards et al. 2005] 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 35

StructuralSVM SVM s an oracle frameork Requres subroutne for fndng the most volated constrant Dependents on the formulaton of loss functon and jont feature representaton Exponental number of constrants! Effcent algorthm n the case of optmzng Mean Avg. Prec. (MAP): MAP s nvarant on the order of documents thn a relevance class 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 36

[Yue et al., SIGIR 07] T T H ( y'; ) ( y, y') y' ( x x ) y j : rel j:! rel Observaton: MAP s nvarant on the order of documents thn a relevance class Sappng to relevant or nonrelevant documents does not change MAP. Jont SVM score s optmzed by sortng by document score, x j Reduces to fndng an nterleavng beteen tosorted lstsofdocuments 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 37

( y'; ) ( y, y') T T H y' ( x x ) y j : rel j:! rel j Start th perfect rankng Consder sappng adjacent relevant/nonrelevant documents 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 38

( y'; ) ( y, y') T T H y' ( x x ) y j : rel j:! rel j Start th perfect rankng Consder sappng adjacent relevant/nonrelevant documents Fnd the best feasble rankng of the nonrelevant document 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 39

( y'; ) ( y, y') T T H y' ( x x ) y j : rel j:! rel j Start th perfect rankng Consder sappng adjacent relevant/nonrelevant documents Fnd the best feasble rankng of the nonrelevant document Repeat for next nonrelevant document 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 40

( y'; ) ( y, y') T T H y' ( x x ) y j : rel j:! rel j Start th perfect rankng Consder sappng adjacent relevant/nonrelevant documents Fnd the best feasble rankng of the nonrelevant document Repeat for next nonrelevant document Never ant to sap past prevous nonrelevant document 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 41

( y'; ) ( y, y') T T H y' ( x x ) y j : rel j:! rel j Start th perfect rankng Consder sappng adjacent relevant/nonrelevant documents Fnd the best feasble rankng of the nonrelevant document Repeat for next nonrelevant document Never ant to sap past prevous nonrelevant document Repeat untl all nonrelevant documents have been consdered 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 42

SVM Formulaton SVMs optmze a tradeoff beteen model complexty and MAP loss Exponental number of constrants (one for each ncorrect rankng) Structural SVMs fnds a small subset of mportant constrants Requres subprocedure to fnd most volated constrant Fnd Most Volated Constrant Loss functon nvarant to reorderng of relevant documents SVM score mposes an orderng of the relevant documents Fndng nterleavng of to sorted lsts Loss functon has certan monotonc propertes Effcent algorthm 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mnng 43