Statistical Tests: More Complicated Discriminants

Similar documents
The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

SSB Debate: Model-based Inference vs. Machine Learning

Voice Activity Detection

Transactions on Information and Communications Technologies vol 1, 1993 WIT Press, ISSN

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

A COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND OTHER STATISTICAL METHODS FOR ROTATING MACHINE

Multiple-Layer Networks. and. Backpropagation Algorithms

Stochastic Resonance and Suboptimal Radar Target Classification

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

ECOSYSTEM MODELS. Spatial. Tony Starfield recorded: 2005

CS 229 Final Project: Using Reinforcement Learning to Play Othello

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design

MITOCW watch?v=fp7usgx_cvm

MITOCW R11. Principles of Algorithm Design

Background Pixel Classification for Motion Detection in Video Image Sequences

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

Image Forgery. Forgery Detection Using Wavelets

Distinguishing Photographs and Graphics on the World Wide Web

Things I DON'T Like. Things I DO Like. Skill Quizzes. The Agenda

ARTIFICIAL NEURAL NETWORK BASED CLASSIFICATION FOR MONOBLOCK CENTRIFUGAL PUMP USING WAVELET ANALYSIS

GE 113 REMOTE SENSING

Stacking Ensemble for auto ml

An Hybrid MLP-SVM Handwritten Digit Recognizer

Midterm for Name: Good luck! Midterm page 1 of 9

Application of Multi Layer Perceptron (MLP) for Shower Size Prediction

A Comparison of Particle Swarm Optimization and Gradient Descent in Training Wavelet Neural Network to Predict DGPS Corrections

Introduction to Machine Learning

Student: Nizar Cherkaoui. Advisor: Dr. Chia-Ling Tsai (Computer Science Dept.) Advisor: Dr. Eric Muller (Biology Dept.)

Th ELI1 07 How to Teach a Neural Network to Identify Seismic Interference

SELECTING RELEVANT DATA

Lecture5: Lossless Compression Techniques

MITOCW Mega-R4. Neural Nets

Convolutional Networks Overview

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007

Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment

Tic-tac-toe. Lars-Henrik Eriksson. Functional Programming 1. Original presentation by Tjark Weber. Lars-Henrik Eriksson (UU) Tic-tac-toe 1 / 23

Warm Up The following table lists the 50 states.

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

DETECTION AND LOCALIZATION OF WATER LEAKS IN WATER NETS BY MEANS OF A MONITORING SYSTEM, HYDRAULIC MODEL AND NEURONAL NETWORKS

CARRIER PHASE VS. CODE PHASE

Image Extraction using Image Mining Technique

MITOCW ocw lec11

Automatic Speech Recognition (CS753)

Digital Neural Network Hardware For Classification

Grade 6 Math Circles November 15 th /16 th. Arithmetic Tricks

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Approximation a One-Dimensional Functions by Using Multilayer Perceptron and Radial Basis Function Networks

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Neural Networks and Antenna Arrays

Representation Learning for Mobile Robots in Dynamic Environments

MITOCW 6. AVL Trees, AVL Sort

IBM SPSS Neural Networks

Use of Neural Networks in Testing Analog to Digital Converters

Image Classification (Decision Rules and Classification)

Constant False Alarm Rate Detection of Radar Signals with Artificial Neural Networks

MITOCW R18. Quiz 2 Review

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network

Are there alternatives to Sigmoid Hidden Units? MLP Lecture 6 Hidden Units / Initialisation 1

Module 3 Greedy Strategy

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46

MITOCW watch?v=uk5yvoxnksk

Understanding Sound System Design and Feedback Using (Ugh!) Math by Rick Frank

Artificial Neural Networks

MINE 432 Industrial Automation and Robotics

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

MITOCW 7. Counting Sort, Radix Sort, Lower Bounds for Sorting

MITOCW R3. Document Distance, Insertion and Merge Sort

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014

Parsimony II Search Algorithms

System Identification and CDMA Communication

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

COLD CALLING SCRIPTS

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

ES 492: SCIENCE IN THE MOVIES

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

Math Fundamentals for Statistics (Math 52) Unit 2:Number Line and Ordering. By Scott Fallstrom and Brent Pickett The How and Whys Guys.

A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS

Image Recognition for PCB Soldering Platform Controlled by Embedded Microchip Based on Hopfield Neural Network

Description: PUP Math World Series Location: David Brearley High School Kenilworth, NJ Researcher: Professor Carolyn Maher

Voice Recognition Technology Using Neural Networks

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

MITOCW mit_jpal_ses06_en_300k_512kb-mp4

BEST PRACTICES COURSE WEEK 14 PART 2 Advanced Mouse Constraints and the Control Box

MITOCW R9. Rolling Hashes, Amortized Analysis

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Ultra wideband and Bluetooth detection based on energy features

Session 124TS, A Practical Guide to Machine Learning for Actuaries. Presenters: Dave M. Liner, FSA, MAAA, CERA

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best

SHA532 Transcripts. Transcript: Forecasting Accuracy. Transcript: Meet The Booking Curve

Introduction to Coding Theory

MITOCW watch?v=2g9osrkjuzm

AN ANN BASED FAULT DETECTION ON ALTERNATOR

COMPSCI 223: Computational Microeconomics - Practice Final

Research on Application of Conjoint Neural Networks in Vehicle License Plate Recognition

Evolutionary Artificial Neural Networks For Medical Data Classification

An Idea for a Project A Universe for the Evolution of Consciousness

Transcription:

03/07/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 14 Statistical Tests: More Complicated Discriminants Road Map When the likelihood discriminant will fail The Multi Layer Perceptron discriminant Other Heuristics

03/07/07 PHY310: Statistical Data Analysis 2 Constructing the Test Statistic Suppose you've got measurements And likelihood functions for the alternate hypotheses L x ;H 0 =g x H 0 The best test statistic is the likelihood ratio (It's called optimal ) t x = L x ;H 0 L x ;H 1 x= x 1, x 2,..., x n L x ;H 1 =g x H 1 or log t =log L x ;H 0 log L x ; H 1 The problem is that we usually can't construct the full likelihood Approximate as a product of independent likelihood functions Independent likelihoods are much easier to construct L x ;H 0 = L x 1 ;H 0 L x 2 ;H 0... L x n ;H 0 Resulting statistic may not be optimal, but remember any test statistic that works is great

03/07/07 PHY310: Statistical Data Analysis 3 Independent Likelihoods Can Fail The likelihood method discussed last lecture is very powerful, but sometimes the full likelihoods have bad internal correlations The independent approximation won't work Two classes of data X variable is Gaussian Y variable is correlated to X, but classes have a different offset. Almost no overlap between the data sets, so we expect the discriminant to work very well

03/07/07 PHY310: Statistical Data Analysis 4 The Simple Likelihood Discriminant Black points of measurements correctly identified as data class one Red points are measurements correctly identified as data class two Green points are measurements that were incorrectly identified

03/07/07 PHY310: Statistical Data Analysis 5 Comparison of the Discriminants Not a very good approximation! Observation: We only need to know the location of the log(t)=0 surface (the discriminant surface ).

03/07/07 PHY310: Statistical Data Analysis 6 Approximating Functions with Multi Layer Perceptron Networks Neural Nets don't really have anything to do with Neurons They are a way to parameterize a function Modeled on a simple mathematical model of how neurons work. Advantage of this parameterization Able to fit an arbitrarily complicated function Don't need to know the value of the function! Do need to know if a change improves the approximation Not a new idea. Original papers and research was done in the 50's and early 60's! Sin(x) parameterized using a MLP Neural Net with -- one input [ x ] -- two hidden layers with 3 perceptrons -- one output [ sin(x) ]

03/07/07 PHY310: Statistical Data Analysis 7 Neurons as Mathematical Functions A neuron is a function, Input is a vector of values, A=(a 1,a 2,...,a n ) Output is a activation strength, b a 1 w 1 a 2 w 2 a 3 a 4 a 5 w 3 w 4 w 5 f w i a i w 0 b 1 f x = 1 e x The threshold function is usually implemented as a sigmoid, but can be anything Popular alternatives Gaussian for input layers Linear for output layers

03/07/07 PHY310: Statistical Data Analysis 8 MLP Networks Input Layer Output Layer Internal (Hidden) Layer There is a weight and threshold for each connection between neutrons. In this network, there are 9 weights and 4 thresholds. These are free parameters that can used to parameterize a function. Several methods to find the weights, but all are beyond this class. All that is needed to fit the weights is a method of determining if a change in the weight makes the output value closer to correct. You don't need to know the correct output value to find the weights!

03/07/07 PHY310: Statistical Data Analysis 9 The MLP Parameterization of Sin(x) To parameterize a function, you need A training set : Random inputs, and the desired value for each input A test set : Random inputs and the desired value for each input Used to determine how well the network is parameterizing the function A three layer network (2 hidden layers) can provide an arbitrarily good parameterization. Resulting Network for Sin(x)

Using a MLP to Parameterize the Discriminant Surface Do the discriminate between classes of data, we need to find the hypersurface where log(t) = 0 log t =log L x ;H 0 log L x ; H 1 =0 Off the surface, we don't really care about the value of log(t) Parameterizing the likelihood ratio surface Training Set: A set of simulated measurements and the desired log likelihood ratio (usually -1 and 1) for each measurement. The desired value is used to determine if a weight is improving the parameterization. Test Set: A set of simulated measurements and the desired log likelihood ratio for each measurement. The training sets need to be pretty large, or the surface may be very poorly approximated If there are n input variables with two hidden layers containing m 1, and m 2 nodes, there are (n+1) m 1 + (m 1 +1) m 2 + m 2 + 1 weights 03/07/07 PHY310: Statistical Data Analysis 10

03/07/07 PHY310: Statistical Data Analysis 11 Constructing the MLP discriminant Choose a canned NN implementation I'm using ROOT Choose the network Number of internal layers Number of neurons per layer Generate a set of simulated data Use half for the training set Use half for the test set Train the network

03/07/07 PHY310: Statistical Data Analysis 12 Results of the MLP Network The MLP network is a high order fit to the log(t)=0 surface Many free parameters, so any surface can be described Lot's of free parameters, can fall into a local minima! Result is more or less a step function

03/07/07 PHY310: Statistical Data Analysis 13 Likelihood Method vs MLP Method For some distributions, MLP can provide much better discrimination than the likelihood method, but... Likelihood Method MLP Method

Cost of Using an MLP Network With MLP networks, you are parameterizing the discriminant with A non-linear function A large number of free parameters All caveats and warnings about multidimensional minimization apply! Understanding (or visualizing) the discriminant Very difficult to understand the effect of one variable on the output With likelihood, the single variable pdf is the entire story Very difficult to understand the shape of the discriminant surface With likelihood, it comes from the product of the individual p.d.f. Can study each p.d.f independently My Biased Recommendation: Stick with a likelihood based discriminant that can be understood Use an MLP network to check if the discriminant is nearly optimal If it's not optimal, look for a better set of variables Avoid using an MLP network for final results unless it provides a dramatic improvement. Consider using another higher-order discriminant 03/07/07 PHY310: Statistical Data Analysis 14

K Nearest Neighbors A new definition for brute force Start with a large sample of example events Find the k nearest neighbors to your test event (just like the name) k is just an integer, for example 15 Vote to find the right classification Count the number of examples from each class The most frequent is the class for the event 9 Red Neighbors Take a region 6 Black Neighbors Test point is assigned RED Partial assignment possible Test point is 60% red and 40% black 03/07/07 PHY310: Statistical Data Analysis 15

03/07/07 PHY310: Statistical Data Analysis 16 Decision Tree Classification Twenty Questions on Steroids Decision Trees are a heuristic, and not really a statistical test Sequentially choose cuts to classify data This really is twenty questions Consider 20 yes/no questions: Can isolate 1 out of a million Questions are chosen to optimize the information entropy Advantages Explanation is very simple (see above) It's a white box You can look at each individual decision and understand it's affect Easy to implement Disadvantages Not all that common in physics

03/07/07 PHY310: Statistical Data Analysis 17 Finally The Likelihood Discriminant is still your best choice Be on the look out for when it is sub-optimal When Likelihood doesn't is sub-optimal, you have moved into an area of active research MLP Neural Nets are old and understood Unfortunately, after 40 years of research nobody knows how to understand how each variable is affecting the output K-Nearest Neighbors is easy Usually, it's not computationally viable Decision Trees The latest hot item, and it's a simple heuristic Not all that common in physics (yet) The End