Machine Learning for Language Technology

Size: px
Start display at page:

Download "Machine Learning for Language Technology"

Transcription

1 Machine Learning for Language Technology Generative and Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology Machine Learning for Language Technology 1(7)

2 Generative Models Naive Bayes is a generative probabilistic model because it models the joint distribution of inputs and outputs: P(x,y)=P(y)P(x y)=p(y) ny P(f i (x) y) i=1 Machine Learning for Language Technology 2(7)

3 Pros and Cons of Generative Models Pros: Straightforward to do estimation (MLE or MAP) Informative other distributions can be derived: P(x)= P y P(x,y) P(y x)= Marginalization P(x,y) Py P(x,y) Conditionalization Cons: Unnecessary to model input distribution (for classification) Necessary to make rigid independence assumptions Machine Learning for Language Technology 3(7)

4 Discriminative Models A discriminative (or conditional) probabilistic model only models the conditional distribution of outputs given inputs: P(y x) We can discriminate between different outputs for an input, but we cannot generate input-output pairs Machine Learning for Language Technology 4(7)

5 Pros and Cons of Discriminative Models Pros: Only models the distribution relevant for classification Less rigid independence assumptions Cons: Harder to do estimation (MLE or MAP) Less informative other distributions can not be derived Machine Learning for Language Technology 5(7)

6 From Naive Bayes to Logistic Regression Logistic regression P(y x)= exp[w f(x,y)] P y exp[w 0 f(x,y0 )] Models the conditional distribution directly The discriminative counterpart of Naive Bayes Machine Learning for Language Technology 6(7)

7 Quiz Which of the following statements are true? 1. If we know P(x,y), we can derive P(x y). 2. If we know P(x y), we can derive P(x,y). 3. If we know P(x,y), we can derive P(x). 4. If we know P(x), we can derive P(x,y). Machine Learning for Language Technology 7(7)

8 Machine Learning for Language Technology Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Machine Learning for Language Technology 1(11)

9 Introduction I We want to build a discriminative (or conditional) classifier: f (x) =argmax P(y x) y I We will do this in two steps: 1. Build a feature-based linear classifier 2. Normalize to a conditional probability model I The result will be a log-linear model Machine Learning for Language Technology 2(11)

10 Feature Representations I We assume a mapping from input-output pairs (x, y) to a high dimensional feature vector I f(x, y) :X Y! R m I For any vector v 2 R m,letv j be the j th value Machine Learning for Language Technology 3(11)

11 Examples I x is a document and y is a label 8 < 1 if x contains the word interest f j (x, y) = and y = financial : 0 otherwise f j (x, y) =%ofwordsinx with punctuation and y = scientific I x is a word and y is a part-of-speech tag 1 if x = bank and y = Verb f j (x, y) = 0 otherwise Machine Learning for Language Technology 4(11)

12 Examples I x is a name, y is a label classifying the name 8 < f 0 (x, y) = : 1 if x contains George and y = Person 0 otherwise 8 < f 4 (x, y) = : 1 if x contains George and y = Object 0 otherwise 8 < f 1 (x, y) = : 8 < f 2 (x, y) = : 8 < f 3 (x, y) = : 1 if x contains Washington and y = Person 0 otherwise 1 if x contains Bridge and y = Person 0 otherwise 1 if x contains General and y = Person 0 otherwise 8 < f 5 (x, y) = : 8 < f 6 (x, y) = : 8 < f 7 (x, y) = : 1 if x contains Washington and y = Object 0 otherwise 1 if x contains Bridge and y = Object 0 otherwise 1 if x contains General and y = Object 0 otherwise I x=general George Washington, y=person! f(x, y) =[ ] I x=george Washington Bridge, y=object! f(x, y) =[ ] I x=george Washington George, y=object! f(x, y) =[ ] Machine Learning for Language Technology 5(11)

13 Block Feature Vectors I x=general George Washington, y=person! f(x, y) =[ ] I x=george Washington Bridge, y=object! f(x, y) =[ ] I x=george Washington George, y=object! f(x, y) =[ ] I Each equal size block corresponds to one label I Non-zero values allowed only in one block Machine Learning for Language Technology 6(11)

14 Linear Classifiers I Linear classifier: I The score (or probability) of a particular classification is based on a linear combination of features and their weights I Let w 2 R m be a weight vector for f(x, y) 2 R m I The weight wi reflects the significance of feature f i (x, y) I wi >0, f i (x, y) favors class y I The larger wi is, the stronger the association I Example: I w3 =1.7) the word General favors the class Person I w7 = 0.9 ) the word General disfavors the class Object Machine Learning for Language Technology 7(11)

15 Linear Classifiers I The score of a class y is the inner product of f(x, y) and w: f(x, y) w = mx f i (x, y) w i i=1 I The highest scoring class wins: f (x) =argmax y f(x, y) w = argmax y mx f i (x, y) w i i=1 Machine Learning for Language Technology 8(11)

16 Binary Linear Classifier Divides all points: Machine Learning for Language Technology 9(11)

17 Multiclass Linear Classifier Defines regions of space: I i.e., + are all points (x, y) where + =argmax y w f(x, y) Machine Learning for Language Technology 10(11)

18 Quiz I Suppose that these are all the features used in a spam filter: I f1 (x, y) = x contains buy and y = SPAM I f 2 (x, y) = x contains buy and y = HAM I Suppose the corresponding weights are: I w1 =1.0 I w2 =0.0 I Which of the following statements is false? 1. The SPAM score for a document containing the word buy is The HAM score for a document containing the word buy is The SPAM score for a document not containing the word buy is The HAM score for a document not containing the word buy is 1.0 Machine Learning for Language Technology 11(11)

19 Machine Learning for Language Technology Log-Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Machine Learning for Language Technology 1(6)

20 Introduction I We want to build a discriminative (or conditional) classifier: where f (x) =argmax P(y x) y nx P(y i x) =1 i=1 I Alinearclassifieroutputsscoresintherange( 1, 1) I We need to do two things: 1. Make sure that all scores are positive 2. Normalize scores to sum to 1 Machine Learning for Language Technology 2(6)

21 Log-Linear Models I Linear model: I Make scores positive: f(x, y) w exp [f(x, y) w] I Normalize: P(y x) = exp [f(x, y) w] P n i=1 exp [f(x, y i) w] Machine Learning for Language Technology 3(6)

22 Log-Linear Models I Crash course in exponentiation: I Note: exp x = a x (for some base a) 0 < exp x < 1 if x < 0 exp x = 1 if x = 0 1 < exp x if x > 0 I The inverse of exponentiation is the logarithm: log exp x = x I Hence, the log-linear model is linear in log(arithmic) space Machine Learning for Language Technology 4(6)

23 Log-Linear Models I Suppose we have (only) two classes with the following scores: I Using base 2, we have: I Normalizing, we get: P(y 1 x) = P(y 2 x) = f(x, y 1 ) w = 1.0 f(x, y 2 ) w = 2.0 exp [f(x, y 1 ) w] = 2 exp [f(x, y 2 ) w] = 0.25 exp[f(x,y 1 ) w] exp[f(x,y 1 ) w]+exp[f(x,y 2 ) w] exp[f(x,y 2 ) w] exp[f(x,y 1 ) w]+exp[f(x,y 2 ) w] = = 0.89 = = 0.11 Machine Learning for Language Technology 5(6)

24 Quiz 2 I Suppose a (plain) linear classifier for spam filtering assigns SPAM and HAM the same score for a document d. I What is P(SPAM d) in the normalized log-linear model? Impossible to tell Machine Learning for Language Technology 6(6)

25 Machine Learning for Language Technology Logistic Regression Joakim Nivre Uppsala University Department of Linguistics and Philology Machine Learning for Language Technology 1(6)

26 Logistic Regression I We know how to do classification with a log-linear model: exp [f(x, y) w] f (x) =argmax P(y x) =argmax P n y y i=1 exp [f(x, y i) w] I But how do we learn the weights? Machine Learning for Language Technology 2(6)

27 Maximum Likelihood Estimation I For a generative model like NB, we maximize joint likelihood: I argmax ny P (y i )P (x i y i ) i=1 We can use relative frequencies to get the joint MLE I Now we want to maximize conditional likelihood: ny argmax P (y i x i ) I I i=1 Bad news: there is no analytical solution Good news: the likelihood function is convex Machine Learning for Language Technology 3(6)

28 Gradient Ascent I Convexity guarantees a single maximum I Gradient ascent: 1. Guess an initial weight vector w 0 (all w 0 = 0.0) 2. Repeat until convergence: 2.1 Use gradient of w i to determine ascent direction 2.2 Update w i+1 w i + gradient step Machine Learning for Language Technology 4(6)

29 Linear Models and Logistic Regression I Linear model: I Classifier score is a linear combination of weighted features I Logistic regression (log-linear model): I Learn weights to maximize conditional likelihood I Only one of many possible ways to learn weights I This does not matter for classification: argmax y exp [f(x, y) w] P n i=1 exp [f(x, y i) w] = argmax f(x, y) w y Machine Learning for Language Technology 5(6)

30 Naive Bayes and Logistic Regression Naive Bayes Logistic Regression Generative model Discriminative model Estimates P(x, y) Estimates P(y x) MLE has closed form solution MLE requires numerical optimization Strong independence assumptions No independence assumptions Better on small training sets Better on medium-sized training sets Machine Learning for Language Technology 6(6)

Kernels and Support Vector Machines

Kernels and Support Vector Machines Kernels and Support Vector Machines Machine Learning CSE446 Sham Kakade University of Washington November 1, 2016 2016 Sham Kakade 1 Announcements: Project Milestones coming up HW2 You ve implemented GD,

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

CRF and Structured Perceptron

CRF and Structured Perceptron CRF and Structured Perceptron CS 585, Fall 2015 -- Oct. 6 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2015/ Brendan O Connor Viterbi exercise solution CRF & Structured

More information

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Parallel to AIMA 8., 8., 8.6.3, 8.9 The Automatic Classification Problem Assign object/event or sequence of objects/events

More information

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search The revolution of the empiricists Machine Translation Word alignment & Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Classical approaches

More information

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1.

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1. EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code Project #1 is due on Tuesday, October 6, 2009, in class. You may turn the project report in early. Late projects are accepted

More information

The Game-Theoretic Approach to Machine Learning and Adaptation

The Game-Theoretic Approach to Machine Learning and Adaptation The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning

More information

Study guide for Graduate Computer Vision

Study guide for Graduate Computer Vision Study guide for Graduate Computer Vision Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 November 23, 2011 Abstract 1 1. Know Bayes rule. What

More information

Introduction to Spring 2009 Artificial Intelligence Final Exam

Introduction to Spring 2009 Artificial Intelligence Final Exam CS 188 Introduction to Spring 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet, double-sided. Please use non-programmable

More information

Learning Structured Predictors

Learning Structured Predictors Learning Structured Predictors Xavier Carreras Xerox Research Centre Europe Supervised (Structured) Prediction Learning to predict: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y

More information

CS229: Machine Learning

CS229: Machine Learning CS229: Machine Learning Event Identification in Continues Seismic Data Please print out, fill in and include this cover sheet as the first page of your submission. We strongly recommend that you use this

More information

Independence of Path and Conservative Vector Fields

Independence of Path and Conservative Vector Fields Independence of Path and onservative Vector Fields MATH 311, alculus III J. Robert Buchanan Department of Mathematics Summer 2011 Goal We would like to know conditions on a vector field function F(x, y)

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) Lecture, Feb 2 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom Learning Goals Joint Distributions, Independence Class 7, 8.5 Jeremy Orloff and Jonathan Bloom. Understand what is meant by a joint pmf, pdf and cdf of two random variables. 2. Be able to compute probabilities

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

Lecture 20 November 13, 2014

Lecture 20 November 13, 2014 6.890: Algorithmic Lower Bounds: Fun With Hardness Proofs Fall 2014 Prof. Erik Demaine Lecture 20 November 13, 2014 Scribes: Chennah Heroor 1 Overview This lecture completes our lectures on game characterization.

More information

Decoding of Ternary Error Correcting Output Codes

Decoding of Ternary Error Correcting Output Codes Decoding of Ternary Error Correcting Output Codes Sergio Escalera 1,OriolPujol 2,andPetiaRadeva 1 1 Computer Vision Center, Dept. Computer Science, UAB, 08193 Bellaterra, Spain 2 Dept. Matemàtica Aplicada

More information

BLIND DETECTION OF PSK SIGNALS. Yong Jin, Shuichi Ohno and Masayoshi Nakamoto. Received March 2011; revised July 2011

BLIND DETECTION OF PSK SIGNALS. Yong Jin, Shuichi Ohno and Masayoshi Nakamoto. Received March 2011; revised July 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 3(B), March 2012 pp. 2329 2337 BLIND DETECTION OF PSK SIGNALS Yong Jin,

More information

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su Lecture 5: Pitch and Chord (1) Chord Recognition Li Su Recap: short-time Fourier transform Given a discrete-time signal x(t) sampled at a rate f s. Let window size N samples, hop size H samples, then the

More information

Learning Structured Predictors

Learning Structured Predictors Learning Structured Predictors Xavier Carreras 1/70 Supervised (Structured) Prediction Learning to predict: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor

More information

Midterm for Name: Good luck! Midterm page 1 of 9

Midterm for Name: Good luck! Midterm page 1 of 9 Midterm for 6.864 Name: 40 30 30 30 Good luck! 6.864 Midterm page 1 of 9 Part #1 10% We define a PCFG where the non-terminals are {S, NP, V P, V t, NN, P P, IN}, the terminal symbols are {Mary,ran,home,with,John},

More information

Contents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements

Contents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements Contents List of Figures List of Tables Preface Notation Structure of the Book How to Use this Book Online Resources Acknowledgements Notational Conventions Notational Conventions for Probabilities xiii

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

Predicting the outcome of NFL games using machine learning Babak Hamadani bhamadan-at-stanford.edu cs229 - Stanford University

Predicting the outcome of NFL games using machine learning Babak Hamadani bhamadan-at-stanford.edu cs229 - Stanford University Predicting the outcome of NFL games using machine learning Babak Hamadani bhamadan-at-stanford.edu cs229 - Stanford University 1. Introduction: Professional football is a multi-billion industry. NFL is

More information

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Activity Recognition Based on L. Liao, D. J. Patterson, D. Fox,

More information

Information and Decisions

Information and Decisions Part II Overview Information and decision making, Chs. 13-14 Signal coding, Ch. 15 Signal economics, Chs. 16-17 Optimizing communication, Ch. 19 Signal honesty, Ch. 20 Information and Decisions Signals

More information

arxiv: v1 [cs.ni] 23 Jan 2019

arxiv: v1 [cs.ni] 23 Jan 2019 Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey Jithin Jagannath, Nicholas Polosky, Anu Jagannath, Francesco Restuccia, and Tommaso Melodia ANDRO Advanced

More information

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION Scott Deeann Chen and Pierre Moulin University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering 5 North Mathews

More information

Predicting Video Game Popularity With Tweets

Predicting Video Game Popularity With Tweets Predicting Video Game Popularity With Tweets Casey Cabrales (caseycab), Helen Fang (hfang9) December 10,2015 Task Definition Given a set of Twitter tweets from a given day, we want to determine the peak

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

Log-linear models (part III)

Log-linear models (part III) Log-linear models (part III) Lecture, Feb 7 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Naoki Mizukami 1 and Yoshimasa Tsuruoka 1 1 The University of Tokyo 1 Introduction Imperfect information games are

More information

SSB Debate: Model-based Inference vs. Machine Learning

SSB Debate: Model-based Inference vs. Machine Learning SSB Debate: Model-based nference vs. Machine Learning June 3, 2018 SSB 2018 June 3, 2018 1 / 20 Machine learning in the biological sciences SSB 2018 June 3, 2018 2 / 20 Machine learning in the biological

More information

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 NLP Task I Determining Part of Speech Tags Given a text, assign each token its correct part of speech (POS) tag, given its

More information

Compound Object Detection Using Region Co-occurrence Statistics

Compound Object Detection Using Region Co-occurrence Statistics Compound Object Detection Using Region Co-occurrence Statistics Selim Aksoy 1 Krzysztof Koperski 2 Carsten Tusk 2 Giovanni Marchisio 2 1 Department of Computer Engineering, Bilkent University, Ankara,

More information

Fast Blur Removal for Wearable QR Code Scanners (supplemental material)

Fast Blur Removal for Wearable QR Code Scanners (supplemental material) Fast Blur Removal for Wearable QR Code Scanners (supplemental material) Gábor Sörös, Stephan Semmler, Luc Humair, Otmar Hilliges Department of Computer Science ETH Zurich {gabor.soros otmar.hilliges}@inf.ethz.ch,

More information

Section 7.2 Logarithmic Functions

Section 7.2 Logarithmic Functions Math 150 c Lynch 1 of 6 Section 7.2 Logarithmic Functions Definition. Let a be any positive number not equal to 1. The logarithm of x to the base a is y if and only if a y = x. The number y is denoted

More information

Instructions: Good luck! Math 21a Second Midterm Exam Spring, 2009

Instructions: Good luck! Math 21a Second Midterm Exam Spring, 2009 Your Name Your Signature Instructions: Please begin by printing and signing your name in the boxes above and by checking your section in the box to the right You are allowed 2 hours (120 minutes) for this

More information

Generating Groove: Predicting Jazz Harmonization

Generating Groove: Predicting Jazz Harmonization Generating Groove: Predicting Jazz Harmonization Nicholas Bien (nbien@stanford.edu) Lincoln Valdez (lincolnv@stanford.edu) December 15, 2017 1 Background We aim to generate an appropriate jazz chord progression

More information

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT Statistical Machine Translation Machine Translation Phrase-Based Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University October 2009 Probabilistic

More information

The Log-Log Term Frequency Distribution

The Log-Log Term Frequency Distribution The Log-Log Term Frequency Distribution Jason D. M. Rennie jrennie@gmail.com July 14, 2005 Abstract Though commonly used, the unigram is widely known as being a poor model of term frequency; it assumes

More information

Exercise Problems: Information Theory and Coding

Exercise Problems: Information Theory and Coding Exercise Problems: Information Theory and Coding Exercise 9 1. An error-correcting Hamming code uses a 7 bit block size in order to guarantee the detection, and hence the correction, of any single bit

More information

Digital Television Lecture 5

Digital Television Lecture 5 Digital Television Lecture 5 Forward Error Correction (FEC) Åbo Akademi University Domkyrkotorget 5 Åbo 8.4. Error Correction in Transmissions Need for error correction in transmissions Loss of data during

More information

Computing and Communications 2. Information Theory -Channel Capacity

Computing and Communications 2. Information Theory -Channel Capacity 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Communication

More information

Question Score Max Cover Total 149

Question Score Max Cover Total 149 CS170 Final Examination 16 May 20 NAME (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt): This is a closed book, closed calculator, closed computer, closed

More information

BayesChess: A computer chess program based on Bayesian networks

BayesChess: A computer chess program based on Bayesian networks BayesChess: A computer chess program based on Bayesian networks Antonio Fernández and Antonio Salmerón Department of Statistics and Applied Mathematics University of Almería Abstract In this paper we introduce

More information

Feature Selection for Activity Recognition in Multi-Robot Domains

Feature Selection for Activity Recognition in Multi-Robot Domains Feature Selection for Activity Recognition in Multi-Robot Domains Douglas L. Vail and Manuela M. Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA USA {dvail2,mmv}@cs.cmu.edu

More information

Reading. Angel. Chapter 5. Optional

Reading. Angel. Chapter 5. Optional Projections Reading Angel. Chapter 5 Optional David F. Rogers and J. Alan Adams, Mathematical Elements for Computer Graphics, Second edition, McGraw-Hill, New York, 1990, Chapter 3. The 3D synthetic camera

More information

Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design

Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design Sundara Venkataraman, Dimitris Metaxas, Dmitriy Fradkin, Casimir Kulikowski, Ilya Muchnik DCS, Rutgers University, NJ November

More information

Error Detection and Correction: Parity Check Code; Bounds Based on Hamming Distance

Error Detection and Correction: Parity Check Code; Bounds Based on Hamming Distance Error Detection and Correction: Parity Check Code; Bounds Based on Hamming Distance Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin

More information

Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Launchmetrics

Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Launchmetrics Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Intelligence @ Launchmetrics annaboschrue@gmail.com Motivating example 90% Accuracy and you want to do better IDEAS: - Collect

More information

266&deployment= &UserPass=b3733cde68af274d036da170749a68f6

266&deployment= &UserPass=b3733cde68af274d036da170749a68f6 Sections 14.6 and 14.7 (1482266) Question 12345678910111213141516171819202122 Due: Thu Oct 21 2010 11:59 PM PDT 1. Question DetailsSCalcET6 14.6.012. [1289020] Find the directional derivative, D u f, of

More information

THE problem of automating the solving of

THE problem of automating the solving of CS231A FINAL PROJECT, JUNE 2016 1 Solving Large Jigsaw Puzzles L. Dery and C. Fufa Abstract This project attempts to reproduce the genetic algorithm in a paper entitled A Genetic Algorithm-Based Solver

More information

Auto-tagging The Facebook

Auto-tagging The Facebook Auto-tagging The Facebook Jonathan Michelson and Jorge Ortiz Stanford University 2006 E-mail: JonMich@Stanford.edu, jorge.ortiz@stanford.com Introduction For those not familiar, The Facebook is an extremely

More information

MatMap: An OpenSource Indoor Localization System

MatMap: An OpenSource Indoor Localization System MatMap: An OpenSource Indoor Localization System Richard Ižip and Marek Šuppa Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia izip1@uniba.sk, suppa1@uniba.sk,

More information

Algebra 2 (Standard) DIA #6

Algebra 2 (Standard) DIA #6 Name: Class: Date: Algebra 2 (Standard) DIA #6 Multiple Choice Identify the choice that best completes the statement or answers the question.. An initial population of 865 quail increases at an annual

More information

HW1 is due Thu Oct 12 in the first 5 min of class. Read through chapter 5.

HW1 is due Thu Oct 12 in the first 5 min of class. Read through chapter 5. Stat 100a, Introduction to Probability. Outline for the day: 1. Bayes's rule. 2. Random variables. 3. cdf, pmf, and density. 4. Expected value, continued. 5. All in with AA. 6. Pot odds. 7. Violette vs.

More information

Machine Translation - Decoding

Machine Translation - Decoding January 15, 2007 Table of Contents 1 Introduction 2 3 4 5 6 Integer Programing Decoder 7 Experimental Results Word alignments Fertility Table Translation Table Heads Non-heads NULL-generated (ct.) Figure:

More information

Multivariate Calculus

Multivariate Calculus Multivariate Calculus Partial Derivatives 1 Theory Recall the definition of the partial derivatives of a function of two variables, z = f(x, y): f x = lim f(x + x, y) f(x, y) x 0 x f y f(x, y + y) f(x,

More information

Cooperative Sensing for Target Estimation and Target Localization

Cooperative Sensing for Target Estimation and Target Localization Preliminary Exam May 09, 2011 Cooperative Sensing for Target Estimation and Target Localization Wenshu Zhang Advisor: Dr. Liuqing Yang Department of Electrical & Computer Engineering Colorado State University

More information

Exercises to Chapter 2 solutions

Exercises to Chapter 2 solutions Exercises to Chapter 2 solutions 1 Exercises to Chapter 2 solutions E2.1 The Manchester code was first used in Manchester Mark 1 computer at the University of Manchester in 1949 and is still used in low-speed

More information

An Experiment-Based Quantitative and Comparative Analysis of Target Detection and Image Classification Algorithms for Hyperspectral Imagery

An Experiment-Based Quantitative and Comparative Analysis of Target Detection and Image Classification Algorithms for Hyperspectral Imagery 1044 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 38, NO. 2, MARCH 2000 An Experiment-Based Quantitative and Comparative Analysis of Target Detection and Image Classification Algorithms for

More information

Mixture of Discrete and Continuous Random Variables

Mixture of Discrete and Continuous Random Variables Mixture of Discrete and Continuous Random Variables What does the CDF F X (x) look like when X is discrete vs when it s continuous? A r.v. could have a continuous component and a discrete component. Ex

More information

ENGR 102 PROBLEM SOLVING FOR ENGINEERS

ENGR 102 PROBLEM SOLVING FOR ENGINEERS PRACTICE EXAM 1. Problem statement 2. Diagram 3. Theory 4. Simplifying assumptions 5. Solution steps 6. Results & precision 7. Conclusions ENGR 102 PROBLEM SOLVING FOR ENGINEERS I N T O / C S U P A R T

More information

b. How would you model your equation on a number line to show your answer?

b. How would you model your equation on a number line to show your answer? Exercise 1: Real-World Introduction to Integer Addition Answer the questions below. a. Suppose you received $10 from your grandmother for your birthday. You spent $4 on snacks. Using addition, how would

More information

Math 2321 Review for Test 2 Fall 11

Math 2321 Review for Test 2 Fall 11 Math 2321 Review for Test 2 Fall 11 The test will cover chapter 15 and sections 16.1-16.5 of chapter 16. These review sheets consist of problems similar to ones that could appear on the test. Some problems

More information

FREQUENCY ESTIMATION OF UNDAMPED EXPONENTIAL SIGNALS USING GENETIC ALGORITHMS

FREQUENCY ESTIMATION OF UNDAMPED EXPONENTIAL SIGNALS USING GENETIC ALGORITHMS Wednesday, December 4, 005 FREQUENCY ESTIMATION OF UNDAMPED EXPONENTIAL SIGNALS USING GENETIC ALGORITHMS Amit Mitra, Debasis Kundu and Gunjan Agrawal Department of Mathematics & Statistics, Indian Institute

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1.

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1. Alphabets EE 387, Notes 2, Handout #3 Definition: An alphabet is a discrete (usually finite) set of symbols. Examples: B = {0,1} is the binary alphabet T = { 1,0,+1} is the ternary alphabet X = {00,01,...,FF}

More information

Introduction to Markov Models

Introduction to Markov Models Introduction to Markov Models But first: A few preliminaries Estimating the probability of phrases of words, sentences, etc. CIS 391 - Intro to AI 2 What counts as a word? A tricky question. How to find

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Probability Review Probabilistic Events and Outcomes Example: Sample space: set of all possible outcomes of an experiment Event: subspace of a sample

More information

Predicting outcomes of professional DotA 2 matches

Predicting outcomes of professional DotA 2 matches Predicting outcomes of professional DotA 2 matches Petra Grutzik Joe Higgins Long Tran December 16, 2017 Abstract We create a model to predict the outcomes of professional DotA 2 (Defense of the Ancients

More information

Lecture 3 - Regression

Lecture 3 - Regression Lecture 3 - Regression Instructor: Prof Ganesh Ramakrishnan July 25, 2016 1 / 30 The Simplest ML Problem: Least Square Regression Curve Fitting: Motivation Error measurement Minimizing Error Method of

More information

Learning Dota 2 Team Compositions

Learning Dota 2 Team Compositions Learning Dota 2 Team Compositions Atish Agarwala atisha@stanford.edu Michael Pearce pearcemt@stanford.edu Abstract Dota 2 is a multiplayer online game in which two teams of five players control heroes

More information

Logs and Exponentials Higher.notebook February 26, Daily Practice

Logs and Exponentials Higher.notebook February 26, Daily Practice Daily Practice 2.2.2015 Daily Practice 3.2.2015 Today we will be learning about exponential functions and logs. Homework due! Need to know for Unit Test 2: Expressions and Functions Adding and subtracng

More information

Privacy preserving data mining multiplicative perturbation techniques

Privacy preserving data mining multiplicative perturbation techniques Privacy preserving data mining multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity Outline Review and critique of randomization approaches (additive noise) Multiplicative data

More information

You could identify a point on the graph of a function as (x,y) or (x, f(x)). You may have only one function value for each x number.

You could identify a point on the graph of a function as (x,y) or (x, f(x)). You may have only one function value for each x number. Function Before we review exponential and logarithmic functions, let's review the definition of a function and the graph of a function. A function is just a rule. The rule links one number to a second

More information

, x {1, 2, k}, where k > 0. (a) Write down P(X = 2). (1) (b) Show that k = 3. (4) Find E(X). (2) (Total 7 marks)

, x {1, 2, k}, where k > 0. (a) Write down P(X = 2). (1) (b) Show that k = 3. (4) Find E(X). (2) (Total 7 marks) 1. The probability distribution of a discrete random variable X is given by 2 x P(X = x) = 14, x {1, 2, k}, where k > 0. Write down P(X = 2). (1) Show that k = 3. Find E(X). (Total 7 marks) 2. In a game

More information

A Signal Space Theory of Interferences Cancellation Systems

A Signal Space Theory of Interferences Cancellation Systems A Signal Space Theory of Interferences Cancellation Systems Osamu Ichiyoshi Human Network for Better 21 Century E-mail: osamu-ichiyoshi@muf.biglobe.ne.jp Abstract Interferences among signals from different

More information

Leveraging Game Phase in Arimaa

Leveraging Game Phase in Arimaa Abstract Leveraging Game Phase in Arimaa Vivek Choksi, Neema Ebrahim-Zadeh, Vasanth Mohan Stanford University Past research into AI techniques for the game Arimaa has dealt with refining the evaluation

More information

Hamming Codes as Error-Reducing Codes

Hamming Codes as Error-Reducing Codes Hamming Codes as Error-Reducing Codes William Rurik Arya Mazumdar Abstract Hamming codes are the first nontrivial family of error-correcting codes that can correct one error in a block of binary symbols.

More information

5.5 Properties of Logarithms. Work with the Properties of Logarithms. 296 CHAPTER 5 Exponential and Logarithmic Functions

5.5 Properties of Logarithms. Work with the Properties of Logarithms. 296 CHAPTER 5 Exponential and Logarithmic Functions 296 CHAPTER 5 Exponential and Logarithmic Functions The Richter Scale Problems 3 and 32 use the following discussion: The Richter scale is one way of converting seismographic readings into numbers that

More information

Graphs and Network Flows IE411. Lecture 14. Dr. Ted Ralphs

Graphs and Network Flows IE411. Lecture 14. Dr. Ted Ralphs Graphs and Network Flows IE411 Lecture 14 Dr. Ted Ralphs IE411 Lecture 14 1 Review: Labeling Algorithm Pros Guaranteed to solve any max flow problem with integral arc capacities Provides constructive tool

More information

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Qin Huazheng 2014/10/15 Graph-of-word and TW-IDF: New Approach

More information

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best Elementary Plots Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best More importantly, it is easy to lie

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Lecture # 5 Image Enhancement in Spatial Domain- I ALI JAVED Lecturer SOFTWARE ENGINEERING DEPARTMENT U.E.T TAXILA Email:: ali.javed@uettaxila.edu.pk Office Room #:: 7 Presentation

More information

MATH CALCULUS & STATISTICS/BUSN - PRACTICE EXAM #1 - SPRING DR. DAVID BRIDGE

MATH CALCULUS & STATISTICS/BUSN - PRACTICE EXAM #1 - SPRING DR. DAVID BRIDGE MATH 205 - CALCULUS & STATISTICS/BUSN - PRACTICE EXAM # - SPRING 2006 - DR. DAVID BRIDGE TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. Tell whether the statement is

More information

Game Theory. Chapter 2 Solution Methods for Matrix Games. Instructor: Chih-Wen Chang. Chih-Wen NCKU. Game Theory, Ch2 1

Game Theory. Chapter 2 Solution Methods for Matrix Games. Instructor: Chih-Wen Chang. Chih-Wen NCKU. Game Theory, Ch2 1 Game Theory Chapter 2 Solution Methods for Matrix Games Instructor: Chih-Wen Chang Chih-Wen Chang @ NCKU Game Theory, Ch2 1 Contents 2.1 Solution of some special games 2.2 Invertible matrix games 2.3 Symmetric

More information

Mathacle. Name: Date:

Mathacle. Name: Date: Quiz Probability 1.) A telemarketer knows from past experience that when she makes a call, the probability that someone will answer the phone is 0.20. What is probability that the next two phone calls

More information

DECISION TREE TUTORIAL

DECISION TREE TUTORIAL Kardi Teknomo DECISION TREE TUTORIAL Revoledu.com Decision Tree Tutorial by Kardi Teknomo Copyright 2008-2012 by Kardi Teknomo Published by Revoledu.com Online edition is available at Revoledu.com Last

More information

Econ 172A - Slides from Lecture 18

Econ 172A - Slides from Lecture 18 1 Econ 172A - Slides from Lecture 18 Joel Sobel December 4, 2012 2 Announcements 8-10 this evening (December 4) in York Hall 2262 I ll run a review session here (Solis 107) from 12:30-2 on Saturday. Quiz

More information

Image Filtering. Median Filtering

Image Filtering. Median Filtering Image Filtering Image filtering is used to: Remove noise Sharpen contrast Highlight contours Detect edges Other uses? Image filters can be classified as linear or nonlinear. Linear filters are also know

More information

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes 216 7th International Conference on Intelligent Systems, Modelling and Simulation Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes Yuanyuan Guo Department of Electronic Engineering

More information

Latest trends in sentiment analysis - A survey

Latest trends in sentiment analysis - A survey Latest trends in sentiment analysis - A survey Anju Rose G Punneliparambil PG Scholar Department of Computer Science & Engineering Govt. Engineering College, Thrissur, India anjurose.ar@gmail.com Abstract

More information

1 of 5 7/16/2009 6:57 AM Virtual Laboratories > 13. Games of Chance > 1 2 3 4 5 6 7 8 9 10 11 3. Simple Dice Games In this section, we will analyze several simple games played with dice--poker dice, chuck-a-luck,

More information

SEMANTIC ANNOTATION AND RETRIEVAL OF MUSIC USING A BAG OF SYSTEMS REPRESENTATION

SEMANTIC ANNOTATION AND RETRIEVAL OF MUSIC USING A BAG OF SYSTEMS REPRESENTATION SEMANTIC ANNOTATION AND RETRIEVAL OF MUSIC USING A BAG OF SYSTEMS REPRESENTATION Katherine Ellis University of California, San Diego kellis@ucsd.edu Emanuele Coviello University of California, San Diego

More information

Learning with Confidence: Theory and Practice of Information Geometric Learning from High-dim Sensory Data

Learning with Confidence: Theory and Practice of Information Geometric Learning from High-dim Sensory Data Learning with Confidence: Theory and Practice of Information Geometric Learning from High-dim Sensory Data Professor Lin Zhang Department of Electronic Engineering, Tsinghua University Co-director, Tsinghua-Berkeley

More information

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. MATH 1324 Review for Test 3 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Find the value(s) of the function on the given feasible region. 1) Find the

More information

DOWNLINK TRANSMITTER ADAPTATION BASED ON GREEDY SINR MAXIMIZATION. Dimitrie C. Popescu, Shiny Abraham, and Otilia Popescu

DOWNLINK TRANSMITTER ADAPTATION BASED ON GREEDY SINR MAXIMIZATION. Dimitrie C. Popescu, Shiny Abraham, and Otilia Popescu DOWNLINK TRANSMITTER ADAPTATION BASED ON GREEDY SINR MAXIMIZATION Dimitrie C Popescu, Shiny Abraham, and Otilia Popescu ECE Department Old Dominion University 231 Kaufman Hall Norfol, VA 23452, USA ABSTRACT

More information

Hybrid Discriminative/Class-Specific Classifiers for Narrow-Band Signals

Hybrid Discriminative/Class-Specific Classifiers for Narrow-Band Signals To appear IEEE Trans. on Aerospace and Electronic Systems, October 2007. Hybrid Discriminative/Class-Specific Classifiers for Narrow-Band Signals Brian F. Harrison and Paul M. Baggenstoss Naval Undersea

More information