Learning Structured Predictors

Size: px
Start display at page:

Download "Learning Structured Predictors"

Transcription

1 Learning Structured Predictors Xavier Carreras Xerox Research Centre Europe

2 Supervised (Structured) Prediction Learning to predict: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor x y that works well on unseen inputs x Non-Structured Prediction: outputs y are atomic Binary prediction: y { 1, +1} Multiclass prediction: y {1, 2,..., L} Structured Prediction: outputs y are structured Sequence prediction: y are sequences Parsing: y are trees...

3 Named Entity Recognition y per - qnt - - org org - time x Jim bought 300 shares of Acme Corp. in 2006

4 Named Entity Recognition y per - qnt - - org org - time x Jim bought 300 shares of Acme Corp. in 2006 y per per - - loc x Jack London went to Paris y per per - - loc x Paris Hilton went to London y per - - loc x Jackie went to Lisdon

5 Part-of-speech Tagging y NNP NNP VBZ NNP. x Ms. Haag plays Elianti.

6 Syntactic Parsing P LOC ROOT VC OBJ NMOD PMOD SBJ TMP NMOD NAME Unesco is now holding its biennial meetings in New York. x are sentences y are syntactic dependency trees

7 Machine Translation the transformation of does not ench. A particular instance may, RB(not), x 0:VB) ne, x 0, pas arbitrary syntax tree fragment. r lexicalized (e.g. does) or vari-. rhs(r i ) is represented as a senguage words and variables. a brief overview of how such ules are acquired automatically ure 1, the (π, f, a) triple is repted graph G (edges going downtinction between edges of π and node of the graph is labeled with lement span (the latter in italic span of a node n is defined by first and last word in f that are The complement span of n is pans of all nodes n in G that dants nor ancestors of n. Nodes and complement spans are nonthe frontier set F G. larly interesting about the fronfrontier of graph G containing '& ''( $%*!" #! $%& ''( )' ''& # #"& #%" *"& "!"& "!"&!!"#$%"& $&!%&!"#$& $%&!"'$& # " "! $%& ' ' ( ) # "! ' ( * $ & ) ( #%)! '& $%&!"'$& '& '%&!"*$& '!"& $& '%(!"*$("& && '%(!"%$("& '!"& '& (!"%$+("& (!"%$("& "#$%$ 5 &$'&($ )*+(,-$.%/0'*.,/% +'1)*2 30'1 40.*+$ + -!! "#$ %& '( ) *+, " Figure 1: Spans and complement-spans determine what rules are extracted. Constituents (Galley et alin2006) gray are members of the frontier set; a minimal rule is extracted from each of them. (a) S(x 0:NP, x 1:VP, x 2:.) x 0, x 1, x 2 (b) NP(x 0:DT, CD(7), NNS(people)) x 0,7 (c) DT(these) (d) VP(x 0:VBP, x 1:NP) x 0, x 1 (e) VBP(include) (f) NP(x 0:NP, x 1:VP) x 1,, x 0 (g) NP(x 0:NNS) x 0 x are sentences in Chinese y are sentences in English aligned to x + )!")

8 Object Detection (Kumar and Hebert 2003) x are images y are grids labeled with object types

9 Object Detection (Kumar and Hebert 2003) x are images y are grids labeled with object types

10 Today s Goals Introduce basic concepts for structured prediction We will restrict to sequence prediction What can we can borrow from standard classification? Learning paradigms and algorithms, in essence, work here too However, computations behind algorithms are prohibitive What can we borrow from HMM and other structured formalisms? Representations of structured data into feature spaces Inference/search algorithms for tractable computations E.g., algorithms for HMMs (Viterbi, forward-backward) will play a major role in today s methods

11 Today s Goals Introduce basic concepts for structured prediction We will restrict to sequence prediction What can we can borrow from standard classification? Learning paradigms and algorithms, in essence, work here too However, computations behind algorithms are prohibitive What can we borrow from HMM and other structured formalisms? Representations of structured data into feature spaces Inference/search algorithms for tractable computations E.g., algorithms for HMMs (Viterbi, forward-backward) will play a major role in today s methods

12 Sequence Prediction y per per - - loc x Jack London went to Paris

13 Sequence Prediction x = x 1 x 2... x n are input sequences, x i X y = y 1 y 2... y n are output sequences, y i {1,..., L} Goal: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor x y that works well on unseen inputs x What is the form of our prediction model?

14 Exponentially-many Solutions Let Y = {-, per, loc} The solution space (all output sequences): Jack London went to Paris per loc per loc per loc per loc per loc Each path is a possible solution For an input sequence of size n, there are Y n possible outputs

15 Exponentially-many Solutions Let Y = {-, per, loc} The solution space (all output sequences): Jack London went to Paris per loc per loc per loc per loc per loc Each path is a possible solution For an input sequence of size n, there are Y n possible outputs

16 Approach 1: Local Classifiers? Jack London went to Paris Decompose the sequence into n classification problems: A classifier predicts individual labels at each position ŷ i = argmax w f(x, i, l) l {loc, per, -} f(x, i, l) represents an assignment of label l for x i w is a vector of parameters, has a weight for each feature of f Use standard classification methods to learn w At test time, predict the best sequence by a simple concatenation of the best label for each position

17 Approach 1: Local Classifiers? Jack London went to Paris Decompose the sequence into n classification problems: A classifier predicts individual labels at each position ŷ i = argmax w f(x, i, l) l {loc, per, -} f(x, i, l) represents an assignment of label l for x i w is a vector of parameters, has a weight for each feature of f Use standard classification methods to learn w At test time, predict the best sequence by a simple concatenation of the best label for each position

18 Indicator Features f(x, i, l) is a vector of d features representing label l for x i [ f 1 (x, i, l),..., f j (x, i, l),..., f d (x, i, l) ] What s in a feature f j (x, i, l)? Anything we can compute using x and i and l Anything that indicates whether l is (not) a good label for xi Indicator features: binary-valued features looking at: a simple pattern of x and target position i and the candidate label l for position i { 1 if xi =London and l =loc f j (x, i, l) = 0 otherwise { 1 if xi+1 =went and l =loc f k (x, i, l) = 0 otherwise

19 Feature Templates Feature templates generate many indicator features mechanically A feature template is identified by a type, and a number of values Example: template word extracts the current word { 1 if xi = w and l = a f word,a,w (x, i, l) = 0 otherwise A feature of this type is identified by the tuple word, a, w Generates a feature for every label a Y and every word w e.g.: a = loc w = London, a = - w = London a = loc w = Paris a = per w = Paris a = per w = John a = - w = the

20 Feature Templates Feature templates generate many indicator features mechanically A feature template is identified by a type, and a number of values Example: template word extracts the current word { 1 if xi = w and l = a f word,a,w (x, i, l) = 0 otherwise A feature of this type is identified by the tuple word, a, w Generates a feature for every label a Y and every word w e.g.: a = loc w = London, a = - w = London a = loc w = Paris a = per w = Paris a = per w = John a = - w = the In feature-based models: Define feature templates manually Instantiate the templates on every set of values in the training data generates a very high-dimensional feature space Define parameter vector w indexed by such feature tuples Let the learning algorithm choose the relevant features

21 More Features for NE Recognition per Jack London went to Paris In practice, construct f(x, i, l) by... Define a number of simple patterns of x and i current word x i is xi capitalized? xi has digits? prefixes/suffixes of size 1, 2, 3,... is x i a known location? is x i a known person? next word previous word current and next words together other combinations Define feature templates by combining patterns with labels l Generate actual features by instantiating templates on training data

22 More Features for NE Recognition per per - Jack London went to Paris In practice, construct f(x, i, l) by... Define a number of simple patterns of x and i current word x i is xi capitalized? xi has digits? prefixes/suffixes of size 1, 2, 3,... is x i a known location? is x i a known person? next word previous word current and next words together other combinations Define feature templates by combining patterns with labels l Generate actual features by instantiating templates on training data Main limitation: features can t capture interactions between labels!

23 Approach 2: HMM for Sequence Prediction π per per T per,per per - - loc O per, London Jack London went to Paris Define an HMM were each label is a state Model parameters: πl : probability of starting with label l Tl,l : probability of transitioning from l to l O l,x : probability of generating symbol x given label l Predictions: p(x, y) = π y1 O y1,x 1 T yi 1,y i O yi,x i i>1 Learning: relative counts + smoothing Prediction: Viterbi algorithm

24 Approach 2: Representation in HMM π per per T per,per per - - loc O per, London Jack London went to Paris Label interactions are captured in the transition parameters But interactions between labels and input symbols are quite limited! Only Oyi,x i = p(x i y i ) Not clear how to exploit patterns such as: Capitalization, digits Prefixes and suffixes Next word, previous word Combinations of these with label transitions Why? HMM independence assumptions: given label y i, token x i is independent of anything else

25 Approach 2: Representation in HMM π per per T per,per per - - loc O per, London Jack London went to Paris Label interactions are captured in the transition parameters But interactions between labels and input symbols are quite limited! Only Oyi,x i = p(x i y i ) Not clear how to exploit patterns such as: Capitalization, digits Prefixes and suffixes Next word, previous word Combinations of these with label transitions Why? HMM independence assumptions: given label y i, token x i is independent of anything else

26 Local Classifiers vs. HMM Form: Local Classifiers w f(x, i, l) Learning: standard classifiers Prediction: independent for each x i Advantage: feature-rich Drawback: no label interactions Form: HMM π y1 O y1,x 1 T yi 1,y i O yi,x i i>1 Learning: relative counts Prediction: Viterbi Advantage: label interactions Drawback: no fine-grained features

27 Approach 3: Global Sequence Predictors y: per per - - loc x: Jack London went to Paris Learn a single classifier from x y Next questions:... predict(x 1:n ) = argmax y Y n w f(x, y) How do we represent entire sequences in f(x, y)? There are exponentially-many sequences y for a given x, how do we solve the argmax problem?

28 Approach 3: Global Sequence Predictors y: per per - - loc x: Jack London went to Paris Learn a single classifier from x y Next questions:... predict(x 1:n ) = argmax y Y n w f(x, y) How do we represent entire sequences in f(x, y)? There are exponentially-many sequences y for a given x, how do we solve the argmax problem?

29 Factored Representations y: per per - - loc x: Jack London went to Paris How do we represent entire sequences in f(x, y)? Look at individual assignments y i (standard classification) Look at bigrams of outputs labels y i 1, y i Look at trigrams of outputs labels yi 2, y i 1, y i Look at n-grams of outputs labels yi n+1,..., y i 1, y i Look at the full label sequence y (intractable) A factored representation will lead to a tractable model

30 Factored Representations y: per per - - loc x: Jack London went to Paris How do we represent entire sequences in f(x, y)? Look at individual assignments y i (standard classification) Look at bigrams of outputs labels y i 1, y i Look at trigrams of outputs labels yi 2, y i 1, y i Look at n-grams of outputs labels yi n+1,..., y i 1, y i Look at the full label sequence y (intractable) A factored representation will lead to a tractable model

31 Factored Representations y: per per - - loc x: Jack London went to Paris How do we represent entire sequences in f(x, y)? Look at individual assignments y i (standard classification) Look at bigrams of outputs labels y i 1, y i Look at trigrams of outputs labels yi 2, y i 1, y i Look at n-grams of outputs labels yi n+1,..., y i 1, y i Look at the full label sequence y (intractable) A factored representation will lead to a tractable model

32 Factored Representations y: per per - - loc x: Jack London went to Paris How do we represent entire sequences in f(x, y)? Look at individual assignments y i (standard classification) Look at bigrams of outputs labels y i 1, y i Look at trigrams of outputs labels yi 2, y i 1, y i Look at n-grams of outputs labels yi n+1,..., y i 1, y i Look at the full label sequence y (intractable) A factored representation will lead to a tractable model

33 Bigram Feature Templates y per per - - loc x Jack London went to Paris A template for word + bigram: 1 if x i = w and f wb,a,b,w (x, i, y i 1, y i ) = y i 1 = a and y i = b 0 otherwise e.g., f wb,per,per,london (x, 2, per, per) = 1 f wb,per,per,london (x, 3, per, -) = 0 f wb,per,-,went (x, 3, per, -) = 1

34 More Templates for NER x Jack London went to Paris y per per - - loc y per loc - - loc y loc - x My trip to London... f w,per,per,london (...) = 1 iff x i = London and y i 1 = per and y i = per f w,per,loc,london (...) = 1 iff x i = London and y i 1 = per and y i = loc f prep,loc,to (...) = 1 iff x i 1 = to and x i /[A-Z]/ and y i = loc f city,loc (...) = 1 iff y i = loc and world-cities(x i) = 1 f fname,per (...) = 1 iff y i = per and first-names(x i) = 1

35 More Templates for NER x Jack London went to Paris y per per - - loc y per loc - - loc y loc - x My trip to London... f w,per,per,london (...) = 1 iff x i = London and y i 1 = per and y i = per f w,per,loc,london (...) = 1 iff x i = London and y i 1 = per and y i = loc f prep,loc,to (...) = 1 iff x i 1 = to and x i /[A-Z]/ and y i = loc f city,loc (...) = 1 iff y i = loc and world-cities(x i) = 1 f fname,per (...) = 1 iff y i = per and first-names(x i) = 1

36 More Templates for NER x Jack London went to Paris y per per - - loc y per loc - - loc y loc - x My trip to London... f w,per,per,london (...) = 1 iff x i = London and y i 1 = per and y i = per f w,per,loc,london (...) = 1 iff x i = London and y i 1 = per and y i = loc f prep,loc,to (...) = 1 iff x i 1 = to and x i /[A-Z]/ and y i = loc f city,loc (...) = 1 iff y i = loc and world-cities(x i) = 1 f fname,per (...) = 1 iff y i = per and first-names(x i) = 1

37 More Templates for NER x Jack London went to Paris y per per - - loc y per loc - - loc y loc - x My trip to London... f w,per,per,london (...) = 1 iff x i = London and y i 1 = per and y i = per f w,per,loc,london (...) = 1 iff x i = London and y i 1 = per and y i = loc f prep,loc,to (...) = 1 iff x i 1 = to and x i /[A-Z]/ and y i = loc f city,loc (...) = 1 iff y i = loc and world-cities(x i) = 1 f fname,per (...) = 1 iff y i = per and first-names(x i) = 1

38 More Templates for NER x Jack London went to Paris y per per - - loc y per loc - - loc y loc - x My trip to London... f w,per,per,london (...) = 1 iff x i = London and y i 1 = per and y i = per f w,per,loc,london (...) = 1 iff x i = London and y i 1 = per and y i = loc f prep,loc,to (...) = 1 iff x i 1 = to and x i /[A-Z]/ and y i = loc f city,loc (...) = 1 iff y i = loc and world-cities(x i) = 1 f fname,per (...) = 1 iff y i = per and first-names(x i) = 1

39 Representations Factored at Bigrams y: per per - - loc x: Jack London went to Paris f(x, i, y i 1, y i ) A d-dimensional feature vector of a label bigram at i Each dimension is typically a boolean indicator (0 or 1) f(x, y) = n i=1 f(x, i, y i 1, y i ) A d-dimensional feature vector of the entire y Aggregated representation by summing bigram feature vectors Each dimension is now a count of a feature pattern

40 Linear Sequence Prediction where predict(x 1:n ) = argmax y Y n w f(x, y) f(x, y) = n f(x, i, y i 1, y i ) i=1 Note the linearity of the expression: n w f(x, y) = w f(x, i, y i 1, y i ) i=1 n = w f(x, i, y i 1, y i ) i=1 Next questions: How do we solve the argmax problem? How do we learn w?

41 Linear Sequence Prediction where predict(x 1:n ) = argmax y Y n w f(x, y) f(x, y) = n f(x, i, y i 1, y i ) i=1 Note the linearity of the expression: n w f(x, y) = w f(x, i, y i 1, y i ) i=1 n = w f(x, i, y i 1, y i ) i=1 Next questions: How do we solve the argmax problem? How do we learn w?

42 Linear Sequence Prediction where predict(x 1:n ) = argmax y Y n w f(x, y) f(x, y) = n f(x, i, y i 1, y i ) i=1 Note the linearity of the expression: n w f(x, y) = w f(x, i, y i 1, y i ) i=1 n = w f(x, i, y i 1, y i ) i=1 Next questions: How do we solve the argmax problem? How do we learn w?

43 Predicting with Factored Sequence Models Consider a fixed w. Given x 1:n find: argmax y Y n n w f(x, i, y i 1, y i ) i=1 Use the Viterbi algorithm, takes O(n Y 2 ) Notational change: since w and x 1:n are fixed we will use s(i, a, b) = w f(x, i, a, b)

44 Viterbi for Factored Sequence Models Given scores s(i, a, b) for each position i and output bigram a, b, find: n argmax s(i, y i 1, y i ) y Y n i=1 Use the Viterbi algorithm, takes O(n Y 2 ) Intuition: output sequences that share bigrams will share scores 1... i 2 i 1 i i n best subsequence with y i 1 = per best subsequence with y i = per best subsequence with y i 1 = loc s(i,loc, per) best subsequence with y i = loc best subsequence with y i 1 = best subsequence with y i =

45 Intuition for Viterbi Consider a fixed x 1:n Assume we have the best sub-sequences up to position i i 1 i best subsequence with y i 1 = per best subsequence with y i 1 = loc best subsequence with y i 1 = What is the best sequence up to position i with y i =loc?

46 Intuition for Viterbi Consider a fixed x 1:n Assume we have the best sub-sequences up to position i i 1 i best subsequence with y i 1 = per best subsequence with y i 1 = loc best subsequence with y i 1 = What is the best sequence up to position i with y i =loc?

47 Intuition for Viterbi Consider a fixed x 1:n Assume we have the best sub-sequences up to position i i 1 i best subsequence with y i 1 = per s(i,per, loc) best subsequence with y i 1 = loc best subsequence with y i 1 = s(i,loc, loc) s(i,, loc) What is the best sequence up to position i with y i =loc?

48 Viterbi for Linear Factored Predictors ŷ = argmax y Y n n w f(x, i, y i 1, y i ) i=1 Definition: score of optimal sequence for x 1:i ending with a Y δ(i, a) = max y Y i :y i=a j=1 Use the following recursions, for all a Y: i s(j, y j 1, y j ) δ(1, a) = s(1, y 0 = null, a) δ(i, a) = max δ(i 1, b) + s(i, b, a) b Y The optimal score for x is max a Y δ(n, a) The optimal sequence ŷ can be recovered through back-pointers

49 Linear Factored Sequence Prediction predict(x 1:n ) = argmax y Y n w f(x, y) Factored representation, e.g. based on bigrams Flexible, arbitrary features of full x and the factors Efficient prediction using Viterbi Next, learning w: Probabilistic log-linear models: Local learning, a.k.a. Maximum-Entropy Markov Models Global learning, a.k.a. Conditional Random Fields Margin-based methods: Structured Perceptron Structured SVM

50 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w

51 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1

52 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1

53 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1

54 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2

55 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2 w Word,per,Jack = +2

56 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2 w Word,per,Jack = +2 w NextW,per,went = +2

57 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2 w Word,per,Jack = +2 w NextW,per,went = +2 w NextW,org,played = +2

58 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2 w Word,per,Jack = +2 w NextW,per,went = +2 w NextW,org,played = +2 w PrevW,org,against = +2

59 The Learner s Game Training Data per - - Maria is beautiful loc - - Lisbon is beautiful per - - loc Jack went to Lisbon loc - - Argentina is nice per per - - loc loc Jack London went to South Paris org - - org Argentina played against Germany Weight Vector w w Lower,- = +1 w Upper,per = +1 w Upper,loc = +1 w Word,per,Maria = +2 w Word,per,Jack = +2 w NextW,per,went = +2 w NextW,org,played = +2 w PrevW,org,against = w UpperBigram,per,per = +2 w UpperBigram,loc,loc = +2 w NextW,loc,played = 1000

60 Log-linear Models for Sequence Prediction y per per - - loc x Jack London went to Paris

61 Log-linear Models for Sequence Prediction Model the conditional distribution: Pr(y x; w) = where x = x1 x 2... x n X exp {w f(x, y)} Z(x; w) y = y1 y 2... y n Y and Y = {1,..., L} f(x, y) represents x and y with d features w R d are the parameters of the model Z(x; w) is a normalizer called the partition function Z(x; w) = exp {w f(x, z)} z Y To predict the best sequence predict(x 1:n ) = argmax Pr(y x) y Y n

62 Log-linear Models: Name Let s take the log of the conditional probability: log Pr(y x; w) = log exp{w f(x, y)} Z(x; w) = w f(x, y) log y exp{w f(x, y)} = w f(x, y) log Z(x; w) Partition function: Z(x; w) = y exp{w f(x, y)} log Z(x; w) is a constant for a fixed x In the log space, computations are linear, i.e., we model log-probabilities using a linear predictor

63 Making Predictions with Log-Linear Models For tractability, assume f(x, y) decomposes into bigrams: f(x 1:n, y 1:n ) = n f(x, i, y i 1, y i ) i=1 Given w, given x 1:n, find: exp { n i=1 argmax Pr(y 1:n x 1:n ; w) = amax w f(x, i, y i 1, y i )} y 1:n y Z(x; w) { n } = amax exp w f(x, i, y i 1, y i ) y = amax y We can use the Viterbi algorithm i=1 n w f(x, i, y i 1, y i ) i=1

64 Making Predictions with Log-Linear Models For tractability, assume f(x, y) decomposes into bigrams: f(x 1:n, y 1:n ) = n f(x, i, y i 1, y i ) i=1 Given w, given x 1:n, find: exp { n i=1 argmax Pr(y 1:n x 1:n ; w) = amax w f(x, i, y i 1, y i )} y 1:n y Z(x; w) { n } = amax exp w f(x, i, y i 1, y i ) y = amax y We can use the Viterbi algorithm i=1 n w f(x, i, y i 1, y i ) i=1

65 Parameter Estimation in Log-Linear Models Pr(y x; w) = exp {w f(x, y)} Z(x; w) How to estimate w given training data? Two approaches: MEMMs: assume that Pr(y x; w) decomposes CRFs: assume that f(x, y) decomposes

66 Parameter Estimation in Log-Linear Models Pr(y x; w) = exp {w f(x, y)} Z(x; w) How to estimate w given training data? Two approaches: MEMMs: assume that Pr(y x; w) decomposes CRFs: assume that f(x, y) decomposes

67 Maximum Entropy Markov Models (MEMMs) (McCallum, Freitag, Pereira 00) Similarly to HMMs: Pr(y 1:n x 1:n ) = Pr(y 1 x 1:n ) Pr(y 2:n x 1:n, y 1 ) n = Pr(y 1 x 1:n ) Pr(y i x 1:n, y 1:i 1 ) = Pr(y 1 x 1:n ) Assumption under MEMMs: i=2 n Pr(y i x 1:n, y i 1 ) i=2 Pr(y i x 1:n, y 1:i 1 ) = Pr(y i x 1:n, y i 1 )

68 Parameter Estimation in MEMMs Decompose sequential problem: Pr(y 1:n x 1:n ) = Pr(y 1 x 1:n ) n Pr(y i x 1:n, i, y i 1 ) i=2 Learn local log-linear distributions (i.e. MaxEnt) Pr(y x, i, y ) = exp{w f(x, i, y, y)} Z(x, i, y ) where x is an input sequence y and y are tags f(x, i, y, y) is a feature vector of x, the position to be tagged, the previous tag and the current tag Sequence learning reduced to multi-class logistic regression

69 Conditional Random Fields (Lafferty, McCallum, Pereira 2001) Log-linear model of the conditional distribution: Pr(y x; w) = exp{w f(x, y)} Z(x) where x = x1 x 2... x n X y = y1 y 2... y n Y and Y = {1,..., L} f(x, y) is a feature vector of x and y w are model parameters To predict the best sequence ŷ = argmax Pr(y x) y Y Assumption in CRF (for tractability): f(x, y) decomposes into factors

70 Parameter Estimation in CRFs Given a training set { } (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ), estimate w Define the conditional log-likelihood of the data: L(w) = m log Pr(y (k) x (k) ; w) k=1 L(w) measures how well w explains the data. A good value for w will give a high value for Pr(y (k) x (k) ; w) for all k = 1... m. We want w that maximizes L(w)

71 Learning the Parameters of a CRF We pose it as a concave optimization problem Find: w = argmax L(w) λ w R D 2 w 2 where The first term is the log-likelihood of the data The second term is a regularization term, it penalizes solutions with large norm (similar to norm-minimization in SVM) λ is a parameter to control the trade-off between fitting the data and model complexity

72 Learning the Parameters of a CRF Find w = argmax L(w) λ w R D 2 w 2 In general there is no analytical solution to this optimization We use iterative techniques, i.e. gradient-based optimization 1. Initialize w = 0 2. Take derivatives of L(w) λ 2 w 2, compute gradient 3. Move w in steps proportional to the gradient 4. Repeat steps 2 and 3 until convergence Fast and scalable algorithms exist

73 Computing the Gradient in CRFs Consider a parameter w j and its associated feature f j : L(w) w j = 1 m m f j (x (k), y (k) ) k=1 m k=1 y Y Pr(y x (k) ; w) f j (x (k), y) where f(x, y) = n f j (x, i, y i 1, y i ) i=1 First term: observed value of f j in training examples Second term: expected value of f j under current w In the optimal, observed = expected

74 Computing the Gradient in CRFs The first term is easy to compute, by counting explicitly 1 m m f j (x, i, y (k) i 1, y(k) i ) k=1 The second term is more involved, k=1 i m Pr(y x (k) ; w) f j (x (k), i, y i 1, y i ) y Y i because it sums over all sequences y Y But there is an efficient solution...

75 Computing the Gradient in CRFs For an example (x (k), y (k) ): where n Pr(y x (k) ; w) f j (x (k), i, y i 1, y i ) = y Y n i=1 n µ k i (a, b)f j (x (k), i, a, b) i=1 a,b Y µ k i (a, b) = Pr( i, a, b x (k) ; w) = Pr(y x (k) ; w) y Y n : y i 1 =a, y i =b The quantities µ k i can be computed efficiently in O(nL 2 ) using the forward-backward algorithm

76 Forward-Backward for CRFs Assume fixed x. Calculate in O(n Y 2 ) µ i (a, b) = Pr(y x; w) y Y n :y i 1=a,y i=b, 1 i n; a, b Y Definition: forward and backward quantities { i } α i (a) = exp j=1 w f(x, j, y j 1, y j ) β i (b) = Z = a α n(a) y 1:i Y i :y i=a y i:n Y (n i+1) :y i=b { n } exp j=i+1 w f(x, j, y j 1, y j ) µ i (a, b) = {α i 1 (a) exp{w f(x, i, a, b)} β i (b) Z 1 } Similarly to Viterbi, α i (a) and β i (b) can be computed efficiently in a recursive manner

77 Forward-Backward for CRFs Assume fixed x. Calculate in O(n Y 2 ) µ i (a, b) = Pr(y x; w) y Y n :y i 1=a,y i=b, 1 i n; a, b Y Definition: forward and backward quantities { i } α i (a) = exp j=1 w f(x, j, y j 1, y j ) β i (b) = Z = a α n(a) y 1:i Y i :y i=a y i:n Y (n i+1) :y i=b { n } exp j=i+1 w f(x, j, y j 1, y j ) µ i (a, b) = {α i 1 (a) exp{w f(x, i, a, b)} β i (b) Z 1 } Similarly to Viterbi, α i (a) and β i (b) can be computed efficiently in a recursive manner

78 CRFs: summary so far Log-linear models for sequence prediction, Pr(y x; w) Computations factorize on label bigrams Model form: argmax y Y w f(x, i, y i 1, y i ) i Prediction: uses Viterbi (from HMMs) Parameter estimation: Gradient-based methods, in practice L-BFGS Computation of gradient uses forward-backward (from HMMs)

79 CRFs: summary so far Log-linear models for sequence prediction, Pr(y x; w) Computations factorize on label bigrams Model form: argmax y Y w f(x, i, y i 1, y i ) i Prediction: uses Viterbi (from HMMs) Parameter estimation: Gradient-based methods, in practice L-BFGS Computation of gradient uses forward-backward (from HMMs) Next Question: MEMMs or CRFs? HMMs or CRFs?

80 MEMMs and CRFs MEMMs: Pr(y x) = n i=1 exp {w f(x, i, y i 1, y i )} Z(x, i, y i 1 ; w) CRFs: Pr(y x) = exp { n i=1 w f(x, i, y i 1, y i )} Z(x) Both exploit the same factorization, i.e. same features Same computations to compute argmax y Pr(y x) MEMMs locally normalized; CRFs globally normalized MEMM assume that Pr(y i x 1:n, y 1:i 1 ) = Pr(y i x 1:n, y i 1 ) Leads to Label Bias Problem MEMMs are cheaper to train (reduces to multiclass learning) CRFs are easier to extend to other structures (next lecture)

81 HMMs for sequence prediction x are the observations, y are the hidden states HMMs model the joint distributon Pr(x, y) Parameters: (assume X = {1,..., k} and Y = {1,..., l}) π R l, π a = Pr(y 1 = a) T R l l, T a,b = Pr(y i = b y i 1 = a) O R l k, O a,c = Pr(x i = c y i = a) Model form Pr(x, y) = π y1 O y1,x 1 n i=2 T yi 1,y i O yi,x i Parameter Estimation: maximum likelihood by counting events and normalizing

82 HMMs and CRFs In CRFs: ŷ = amax y i w f(x, i, y i 1, y i ) In HMMs: ŷ = amax y π y1 O y1,x 1 n i=2 T y i 1,y i O yi,x i = amax y log(π y1 O y1,x 1 ) + n i=2 log(t y i 1,y i O yi,x i ) An HMM can be expressed as factored linear models: f j (x, i, y, y ) w j i = 1 & y = a log(π a ) i > 1 & y = a & y = b log(t a,b ) y = a & x i = c log(o a,b ) Hence, HMM are factored linear models

83 HMMs and CRFs: main differences Representation: HMM features are tied to the generative process. CRF features are very flexible. They can look at the whole input x paired with a label bigram (y i, y i+1 ). In practice, for prediction tasks, good discriminative features can improve accuracy a lot. Parameter estimation: HMMs focus on explaining the data, both x and y. CRFs focus on the mapping from x to y. A priori, it is hard to say which paradigm is better. Same dilemma as Naive Bayes vs. Maximum Entropy.

84 Structured Prediction Perceptron, SVMs, CRFs

85 Learning Structured Predictors Goal: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor x y with small error on unseen inputs In a CRF: argmax P (y x; w) = exp { n i=1 w f(x, i, y i 1, y i )} y Y Z(x; w) n = w f(x, i, y i 1, y i ) i=1 To predict new values, Z(x; w) is not relevant Parameter estimation: w is set to maximize likelihood Can we learn w more directly, focusing on errors?

86 Learning Structured Predictors Goal: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor x y with small error on unseen inputs In a CRF: argmax P (y x; w) = exp { n i=1 w f(x, i, y i 1, y i )} y Y Z(x; w) n = w f(x, i, y i 1, y i ) i=1 To predict new values, Z(x; w) is not relevant Parameter estimation: w is set to maximize likelihood Can we learn w more directly, focusing on errors?

87 The Structured Perceptron (Collins, 2002) Set w = 0 For t = 1... T For each training example (x, y) 1. Compute z = argmax z w f(x, z) 2. If z y w w + f(x, y) f(x, z) Return w

88 The Structured Perceptron + Averaging (Freund and Schapire, 1998) (Collins 2002) Set w = 0, w a = 0 For t = 1... T For each training example (x, y) 1. Compute z = argmax z w f(x, z) 2. If z y w w + f(x, y) f(x, z) 3. w a = w a + w Return w a /mt, where m is the number of training examples

89 Perceptron Updates: Example y per per - - loc z per loc - - loc x Jack London went to Paris Let y be the correct output for x. Say we predict z instead, under our current w The update is: g = f(x, y) f(x, z) = i f(x, i, y i 1, y i ) i f(x, i, z i 1, z i ) = f(x, 2, per, per) f(x, 2, per, loc) + f(x, 3, per, -) f(x, 3, loc, -) Perceptron updates are typically very sparse

90 Properties of the Perceptron Online algorithm. Often much more efficient than batch algorithms If the data is separable, it will converge to parameter values with 0 errors Number of errors before convergence is related to a definition of margin. Can also relate margin to generalization properties In practice: 1. Averaging improves performance a lot 2. Typically reaches a good solution after only a few (say 5) iterations over the training set 3. Often performs nearly as well as CRFs, or SVMs

91 Averaged Perceptron Convergence Iteration Accuracy (results on validation set for a parsing task)

92 Margin-based Structured Prediction Let f(x, y) = n i=1 f(x, i, y i 1, y i ) Model: argmax y Y w f(x, y) Consider an example (x (k), y (k) ): y y (k) : w f(x (k), y (k) ) < w f(x (k), y) = error Let y = argmax y Y :y y (k) w f(x(k), y) Define γ k = w (f(x (k), y (k) ) f(x (k), y )) The quantity γ k is a notion of margin on example k: γ k > 0 no mistakes in the example high γ k high confidence

93 Margin-based Structured Prediction Let f(x, y) = n i=1 f(x, i, y i 1, y i ) Model: argmax y Y w f(x, y) Consider an example (x (k), y (k) ): y y (k) : w f(x (k), y (k) ) < w f(x (k), y) = error Let y = argmax y Y :y y (k) w f(x(k), y) Define γ k = w (f(x (k), y (k) ) f(x (k), y )) The quantity γ k is a notion of margin on example k: γ k > 0 no mistakes in the example high γ k high confidence

94 Margin-based Structured Prediction Let f(x, y) = n i=1 f(x, i, y i 1, y i ) Model: argmax y Y w f(x, y) Consider an example (x (k), y (k) ): y y (k) : w f(x (k), y (k) ) < w f(x (k), y) = error Let y = argmax y Y :y y (k) w f(x(k), y) Define γ k = w (f(x (k), y (k) ) f(x (k), y )) The quantity γ k is a notion of margin on example k: γ k > 0 no mistakes in the example high γ k high confidence

95 Mistake-augmented Margins (Taskar et al, 2004) e(y (k), ) x (k) Jack London went to Paris y (k) per per - - loc 0 y per loc - - loc 1 y per y - - per per - 5 Def: e(y, y ) = n i=1 [y i y i ] e.g., e(y (k), y (k) )=0, e(y (k), y )=1, e(y (k), y )=5 We want a w such that y y (k) : w f(x (k), y (k) ) > w f(x (k), y) + e(y (k), y) (the higher the error of y, the larger the separation should be)

96 Mistake-augmented Margins (Taskar et al, 2004) e(y (k), ) x (k) Jack London went to Paris y (k) per per - - loc 0 y per loc - - loc 1 y per y - - per per - 5 Def: e(y, y ) = n i=1 [y i y i ] e.g., e(y (k), y (k) )=0, e(y (k), y )=1, e(y (k), y )=5 We want a w such that y y (k) : w f(x (k), y (k) ) > w f(x (k), y) + e(y (k), y) (the higher the error of y, the larger the separation should be)

97 Mistake-augmented Margins (Taskar et al, 2004) e(y (k), ) x (k) Jack London went to Paris y (k) per per - - loc 0 y per loc - - loc 1 y per y - - per per - 5 Def: e(y, y ) = n i=1 [y i y i ] e.g., e(y (k), y (k) )=0, e(y (k), y )=1, e(y (k), y )=5 We want a w such that y y (k) : w f(x (k), y (k) ) > w f(x (k), y) + e(y (k), y) (the higher the error of y, the larger the separation should be)

98 Structured Hinge Loss Define a mistake-augmented margin γ k,y =w f(x (k), y (k) ) w f(x (k), y) e(y (k), y) γ k = min y y (k) γ k,y Define loss function on example k as: { } L(w, x (k), y (k) ) = max w f(x (k), y) + e(y (k), y) w f(x (k), y (k) ) y Y Leads to an SVM for structured prediction Given a training set, find: argmin w R D m L(w, x (k), y (k) ) + λ 2 w 2 k=1

99 Regularized Loss Minimization Given a training set { (x (1), y (1) ),..., (x (m), y (m) ) }. Find: m argmin L(w, x (k), y (k) ) + λ w R D 2 w 2 k=1 Two common loss functions L(w, x (k), y (k) ) : Log-likelihood loss (CRFs) log P (y (k) x (k) ; w) Hinge loss (SVMs) ( ) max w f(x (k), y) + e(y (k), y) w f(x (k), y (k) ) y Y

100 Learning Structure Predictors: summary so far Linear models for sequence prediction argmax w f(x, i, y i 1, y i ) y Y Computations factorize on label bigrams Decoding: using Viterbi Marginals: using forward-backward Parameter estimation: Perceptron, Log-likelihood, SVMs Extensions from classification to the structured case Optimization methods: Stochastic (sub)gradient methods (LeCun et al 98) (Shalev-Shwartz et al. 07) Exponentiated Gradient (Collins et al 08) SVM Struct (Tsochantaridis et al. 04) Structured MIRA (McDonald et al 05) i

101 Beyond Linear Sequence Prediction

102 Sequence Prediction, Beyond Bigrams It is easy to extend the scope of features to k-grams f(x, i, y i k+1:i 1, y i ) In general, think of state σ i remembering relevant history σ i = y i 1 for bigrams σ i = y i k+1:i 1 for k-grams σi can be the state at time i of a deterministic automaton generating y The structured predictor is argmax y Y w f(x, i, σ i, y i ) Viterbi and forward-backward extend naturally, in O(nL k ) i

103 Dependency Structures Dependency Structures * John saw a movie that he liked today Directed arcs represent dependencies between a head word Directed arcs represent dependencies between a head word and a modifier and a modifier word. word. E.g.: E.g.: movie movie modifies modifies saw, saw, John John modifies modifies saw, saw, today today modifies saw saw

104 Dependency Parsing: arc-factored models Dependency Parsing: arc-factored models (McDonald et al. 2005) (McDonald et al. 2005) * John saw a movie that he liked today Parse Parse trees trees decompose decompose into into single single dependencies dependencies h, m h, m argmax argmax w f(x, f(x,h,m) y Y(x) y Y(x) h,m y h,m y Some Some features: features: f 1 (x, f 1 (x,h,m)=[ saw = [ movie movie ] ] f 2 (x, f 2 (x,h,m)=[distance = [ = =+2] ] Tractable Tractable inference inference algorithms algorithms exist exist (tomorrow s (tomorrow s lecture) lecture)

105 Linear Structured Prediction Sequence prediction (bigram factorization) argmax y Y(x) Dependency parsing (arc-factored) argmax y Y(x) w f(x, i, y i 1, y i ) i h,m y w f(x, h, m) In general, we can enumerate parts r y w f(x, r) argmax y Y(x) r y

106 Factored Sequence Prediction: from Linear to Non-linear score(x, y) = i s(x, i, y i 1, y i ) Linear: s(x, i, y i 1, y i ) = w f(x, i, y i 1, y i ) Non-linear, using a feed-forward neural network: s(x, i, y i 1, y i ) = w yi 1,y i h(f(x, i)) where: h(f(x, i)) = σ(w 2 σ(w 1 σ(w 0 f(x, i)))) Remarks: The non-linear model computes a hidden representation of the input Still factored: Viterbi and Forward-Backward work Parameter estimation becomes non-convex, use backpropagation

107 Recurrent Sequence Prediction y 1 y 2 y 3 y n h 1 h 2 h 3... h n x 1 x 2 x 3 x n Maintains a state: a hidden variable that keeps track of previous observations and predictions Making predictions is not tractable In practice: greedy predictions or beam search Learning is non-convex Popular methods: RNN, LSTM, Spectral Models,...

108 Thanks!

Learning Structured Predictors

Learning Structured Predictors Learning Structured Predictors Xavier Carreras 1/70 Supervised (Structured) Prediction Learning to predict: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) } learn a predictor

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Machine Learning for Language Technology

Machine Learning for Language Technology Machine Learning for Language Technology Generative and Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Machine Learning for Language

More information

CRF and Structured Perceptron

CRF and Structured Perceptron CRF and Structured Perceptron CS 585, Fall 2015 -- Oct. 6 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2015/ Brendan O Connor Viterbi exercise solution CRF & Structured

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Parallel to AIMA 8., 8., 8.6.3, 8.9 The Automatic Classification Problem Assign object/event or sequence of objects/events

More information

Midterm for Name: Good luck! Midterm page 1 of 9

Midterm for Name: Good luck! Midterm page 1 of 9 Midterm for 6.864 Name: 40 30 30 30 Good luck! 6.864 Midterm page 1 of 9 Part #1 10% We define a PCFG where the non-terminals are {S, NP, V P, V t, NN, P P, IN}, the terminal symbols are {Mary,ran,home,with,John},

More information

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search The revolution of the empiricists Machine Translation Word alignment & Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Classical approaches

More information

Kernels and Support Vector Machines

Kernels and Support Vector Machines Kernels and Support Vector Machines Machine Learning CSE446 Sham Kakade University of Washington November 1, 2016 2016 Sham Kakade 1 Announcements: Project Milestones coming up HW2 You ve implemented GD,

More information

The Game-Theoretic Approach to Machine Learning and Adaptation

The Game-Theoretic Approach to Machine Learning and Adaptation The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) Lecture, Feb 2 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

User Goal Change Model for Spoken Dialog State Tracking

User Goal Change Model for Spoken Dialog State Tracking User Goal Change Model for Spoken Dialog State Tracking Yi Ma Department of Computer Science & Engineering The Ohio State University Columbus, OH 43210, USA may@cse.ohio-state.edu Abstract In this paper,

More information

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 NLP Task I Determining Part of Speech Tags Given a text, assign each token its correct part of speech (POS) tag, given its

More information

Feature Selection for Activity Recognition in Multi-Robot Domains

Feature Selection for Activity Recognition in Multi-Robot Domains Feature Selection for Activity Recognition in Multi-Robot Domains Douglas L. Vail and Manuela M. Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA USA {dvail2,mmv}@cs.cmu.edu

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

Multiplayer Pushdown Games. Anil Seth IIT Kanpur

Multiplayer Pushdown Games. Anil Seth IIT Kanpur Multiplayer Pushdown Games Anil Seth IIT Kanpur Multiplayer Games we Consider These games are played on graphs (finite or infinite) Generalize two player infinite games. Any number of players are allowed.

More information

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring) Information Extraction CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring) 1 Informa(on Extrac(on Automa(cally extract structure from text annotate document using tags to iden(fy

More information

Machine Translation - Decoding

Machine Translation - Decoding January 15, 2007 Table of Contents 1 Introduction 2 3 4 5 6 Integer Programing Decoder 7 Experimental Results Word alignments Fertility Table Translation Table Heads Non-heads NULL-generated (ct.) Figure:

More information

Statistical Tests: More Complicated Discriminants

Statistical Tests: More Complicated Discriminants 03/07/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 14 Statistical Tests: More Complicated Discriminants Road Map When the likelihood discriminant will fail The Multi Layer Perceptron discriminant

More information

SSB Debate: Model-based Inference vs. Machine Learning

SSB Debate: Model-based Inference vs. Machine Learning SSB Debate: Model-based nference vs. Machine Learning June 3, 2018 SSB 2018 June 3, 2018 1 / 20 Machine learning in the biological sciences SSB 2018 June 3, 2018 2 / 20 Machine learning in the biological

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function John MacLaren Walsh & Steven Weber Department of Electrical and Computer Engineering

More information

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang Introduction Recurrent neural networks Dates back to (Rumelhart et al., 1986) A family of

More information

Neural Architectures for Named Entity Recognition

Neural Architectures for Named Entity Recognition Neural Architectures for Named Entity Recognition Presented by Allan June 16, 2017 Slides: http://www.statnlp.org/event/naner.html Some content is taken from the original slides. Named Entity Recognition

More information

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw Review Analysis of Pattern Recognition by Neural Network Soni Chaturvedi A.A.Khurshid Meftah Boudjelal Electronics & Comm Engg Electronics & Comm Engg Dept. of Computer Science P.I.E.T, Nagpur RCOEM, Nagpur

More information

Research Seminar. Stefano CARRINO fr.ch

Research Seminar. Stefano CARRINO  fr.ch Research Seminar Stefano CARRINO stefano.carrino@hefr.ch http://aramis.project.eia- fr.ch 26.03.2010 - based interaction Characterization Recognition Typical approach Design challenges, advantages, drawbacks

More information

Two Bracketing Schemes for the Penn Treebank

Two Bracketing Schemes for the Penn Treebank Anssi Yli-Jyrä Two Bracketing Schemes for the Penn Treebank Abstract The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative

More information

Automata and Formal Languages - CM0081 Turing Machines

Automata and Formal Languages - CM0081 Turing Machines Automata and Formal Languages - CM0081 Turing Machines Andrés Sicard-Ramírez Universidad EAFIT Semester 2018-1 Turing Machines Alan Mathison Turing (1912 1954) Automata and Formal Languages - CM0081. Turing

More information

The fundamentals of detection theory

The fundamentals of detection theory Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection

More information

Generating Groove: Predicting Jazz Harmonization

Generating Groove: Predicting Jazz Harmonization Generating Groove: Predicting Jazz Harmonization Nicholas Bien (nbien@stanford.edu) Lincoln Valdez (lincolnv@stanford.edu) December 15, 2017 1 Background We aim to generate an appropriate jazz chord progression

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

arxiv: v1 [cs.ni] 23 Jan 2019

arxiv: v1 [cs.ni] 23 Jan 2019 Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey Jithin Jagannath, Nicholas Polosky, Anu Jagannath, Francesco Restuccia, and Tommaso Melodia ANDRO Advanced

More information

Lecture 3 - Regression

Lecture 3 - Regression Lecture 3 - Regression Instructor: Prof Ganesh Ramakrishnan July 25, 2016 1 / 30 The Simplest ML Problem: Least Square Regression Curve Fitting: Motivation Error measurement Minimizing Error Method of

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 Final Project Landscape Tabla bol transcription Music Genre Classification Audio

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT Statistical Machine Translation Machine Translation Phrase-Based Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University October 2009 Probabilistic

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 13: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 22 th, 2015 1 o Source Code Generation Lecture Outlines Source Coding

More information

Neural Network Part 4: Recurrent Neural Networks

Neural Network Part 4: Recurrent Neural Networks Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier 1, Sigurd Spieckermann 2 and Volker Tresp 1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich, Germany 2- Siemens

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Detection of Compound Structures in Very High Spatial Resolution Images

Detection of Compound Structures in Very High Spatial Resolution Images Detection of Compound Structures in Very High Spatial Resolution Images Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara, Turkey saksoy@cs.bilkent.edu.tr Joint work

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Relation Extraction, Neural Network, and Matrix Factorization

Relation Extraction, Neural Network, and Matrix Factorization Relation Extraction, Neural Network, and Matrix Factorization Presenter: Haw-Shiuan Chang UMass CS585 guest lecture on 2016 Nov. 17 Most slides prepared by Patrick Verga Relation Extraction Knowledge Graph

More information

k-means Clustering David S. Rosenberg December 15, 2017 Bloomberg ML EDU David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, / 18

k-means Clustering David S. Rosenberg December 15, 2017 Bloomberg ML EDU David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, / 18 k-means Clustering David S. Rosenberg Bloomberg ML EDU December 15, 2017 David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, 2017 1 / 18 k-means Clustering David S. Rosenberg (Bloomberg ML EDU) ML

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

CS510 \ Lecture Ariel Stolerman

CS510 \ Lecture Ariel Stolerman CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will

More information

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition SF2972: Game theory Mark Voorneveld, mark.voorneveld@hhs.se Topic 1: defining games and strategies Drawing a game tree is usually the most informative way to represent an extensive form game. Here is one

More information

Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Launchmetrics

Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Launchmetrics Cómo estructurar un buen proyecto de Machine Learning? Anna Bosch Rue VP Data Intelligence @ Launchmetrics annaboschrue@gmail.com Motivating example 90% Accuracy and you want to do better IDEAS: - Collect

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

Recommender Systems TIETS43 Collaborative Filtering

Recommender Systems TIETS43 Collaborative Filtering + Recommender Systems TIETS43 Collaborative Filtering Fall 2017 Kostas Stefanidis kostas.stefanidis@uta.fi https://coursepages.uta.fi/tiets43/ selection Amazon generates 35% of their sales through recommendations

More information

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Steve Renals Machine Learning Practical MLP Lecture 4 9 October 2018 MLP Lecture 4 / 9 October 2018 Deep Neural Networks (2)

More information

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6 MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes Contents 1 Wednesday, August 23 4 2 Friday, August 25 5 3 Monday, August 28 6 4 Wednesday, August 30 8 5 Friday, September 1 9 6 Wednesday, September

More information

Dynamic Programming in Real Life: A Two-Person Dice Game

Dynamic Programming in Real Life: A Two-Person Dice Game Mathematical Methods in Operations Research 2005 Special issue in honor of Arie Hordijk Dynamic Programming in Real Life: A Two-Person Dice Game Henk Tijms 1, Jan van der Wal 2 1 Department of Econometrics,

More information

Pedigree Reconstruction using Identity by Descent

Pedigree Reconstruction using Identity by Descent Pedigree Reconstruction using Identity by Descent Bonnie Kirkpatrick Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-43 http://www.eecs.berkeley.edu/pubs/techrpts/2010/eecs-2010-43.html

More information

Kalman Filtering, Factor Graphs and Electrical Networks

Kalman Filtering, Factor Graphs and Electrical Networks Kalman Filtering, Factor Graphs and Electrical Networks Pascal O. Vontobel, Daniel Lippuner, and Hans-Andrea Loeliger ISI-ITET, ETH urich, CH-8092 urich, Switzerland. Abstract Factor graphs are graphical

More information

Design of Parallel Algorithms. Communication Algorithms

Design of Parallel Algorithms. Communication Algorithms + Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast and All-to-One Reduction n All-to-All Broadcast and Reduction n All-Reduce and Prefix-Sum Operations n Scatter

More information

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2, Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to

More information

Project. B) Building the PWM Read the instructions of HO_14. 1) Determine all the 9-mers and list them here:

Project. B) Building the PWM Read the instructions of HO_14. 1) Determine all the 9-mers and list them here: Project Please choose ONE project among the given five projects. The last three projects are programming projects. hoose any programming language you want. Note that you can also write programs for the

More information

1. Introduction to Game Theory

1. Introduction to Game Theory 1. Introduction to Game Theory What is game theory? Important branch of applied mathematics / economics Eight game theorists have won the Nobel prize, most notably John Nash (subject of Beautiful mind

More information

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010 Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 21 Peter Bro Miltersen November 1, 21 Version 1.3 3 Extensive form games (Game Trees, Kuhn Trees)

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Training a Minesweeper Solver

Training a Minesweeper Solver Training a Minesweeper Solver Luis Gardea, Griffin Koontz, Ryan Silva CS 229, Autumn 25 Abstract Minesweeper, a puzzle game introduced in the 96 s, requires spatial awareness and an ability to work with

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

IBM SPSS Neural Networks

IBM SPSS Neural Networks IBM Software IBM SPSS Neural Networks 20 IBM SPSS Neural Networks New tools for building predictive models Highlights Explore subtle or hidden patterns in your data. Build better-performing models No programming

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring) Information Extraction CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring) Informa(on Extrac(on Automa(cally extract structure from text annotate document using tags to iden(fy

More information

Outcome Forecasting in Sports. Ondřej Hubáček

Outcome Forecasting in Sports. Ondřej Hubáček Outcome Forecasting in Sports Ondřej Hubáček Motivation & Challenges Motivation exploiting betting markets performance optimization Challenges no available datasets difficulties with establishing the state-of-the-art

More information

Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc.

Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc. Introduction to Markov Models Estimating the probability of phrases of words, sentences, etc. But first: A few preliminaries on text preprocessing What counts as a word? A tricky question. CIS 421/521

More information

Introduction to Spring 2009 Artificial Intelligence Final Exam

Introduction to Spring 2009 Artificial Intelligence Final Exam CS 188 Introduction to Spring 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet, double-sided. Please use non-programmable

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Perceptron Barnabás Póczos Contents History of Artificial Neural Networks Definitions: Perceptron, Multi-Layer Perceptron Perceptron algorithm 2 Short History of Artificial

More information

Using Administrative Records for Imputation in the Decennial Census 1

Using Administrative Records for Imputation in the Decennial Census 1 Using Administrative Records for Imputation in the Decennial Census 1 James Farber, Deborah Wagner, and Dean Resnick U.S. Census Bureau James Farber, U.S. Census Bureau, Washington, DC 20233-9200 Keywords:

More information

Dependency-based Convolutional Neural Networks for Sentence Embedding

Dependency-based Convolutional Neural Networks for Sentence Embedding Dependency-based Convolutional Neural Networks for Sentence Embedding ROOT? Mingbo Ma Liang Huang CUNY Bing Xiang Bowen Zhou IBM T. J. Watson ACL 2015 Beijing Convolutional Neural Network for NLP Kalchbrenner

More information

Dynamic Games: Backward Induction and Subgame Perfection

Dynamic Games: Backward Induction and Subgame Perfection Dynamic Games: Backward Induction and Subgame Perfection Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Jun 22th, 2017 C. Hurtado (UIUC - Economics)

More information

Module 3 Greedy Strategy

Module 3 Greedy Strategy Module 3 Greedy Strategy Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Introduction to Greedy Technique Main

More information

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques Antennas and Propagation : Array Signal Processing and Parametric Estimation Techniques Introduction Time-domain Signal Processing Fourier spectral analysis Identify important frequency-content of signal

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Activity Recognition Based on L. Liao, D. J. Patterson, D. Fox,

More information

DECISION TREE TUTORIAL

DECISION TREE TUTORIAL Kardi Teknomo DECISION TREE TUTORIAL Revoledu.com Decision Tree Tutorial by Kardi Teknomo Copyright 2008-2012 by Kardi Teknomo Published by Revoledu.com Online edition is available at Revoledu.com Last

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

UNIVERSITY OF SOUTHAMPTON

UNIVERSITY OF SOUTHAMPTON UNIVERSITY OF SOUTHAMPTON ELEC6014W1 SEMESTER II EXAMINATIONS 2007/08 RADIO COMMUNICATION NETWORKS AND SYSTEMS Duration: 120 mins Answer THREE questions out of FIVE. University approved calculators may

More information

An Hybrid MLP-SVM Handwritten Digit Recognizer

An Hybrid MLP-SVM Handwritten Digit Recognizer An Hybrid MLP-SVM Handwritten Digit Recognizer A. Bellili ½ ¾ M. Gilloux ¾ P. Gallinari ½ ½ LIP6, Université Pierre et Marie Curie ¾ La Poste 4, Place Jussieu 10, rue de l Ile Mabon, BP 86334 75252 Paris

More information

Some algorithmic and combinatorial problems on permutation classes

Some algorithmic and combinatorial problems on permutation classes Some algorithmic and combinatorial problems on permutation classes The point of view of decomposition trees PhD Defense, 2009 December the 4th Outline 1 Objects studied : Permutations, Patterns and Classes

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 AlphaZero 1 AlphaGo Fan (October 2015) AlphaGo Defeats Fan Hui, European Go Champion. 2 AlphaGo Lee (March 2016) 3 AlphaGo Zero vs.

More information

PROJECT 5: DESIGNING A VOICE MODEM. Instructor: Amir Asif

PROJECT 5: DESIGNING A VOICE MODEM. Instructor: Amir Asif PROJECT 5: DESIGNING A VOICE MODEM Instructor: Amir Asif CSE4214: Digital Communications (Fall 2012) Computer Science and Engineering, York University 1. PURPOSE In this laboratory project, you will design

More information

Modeling, Analysis and Optimization of Networks. Alberto Ceselli

Modeling, Analysis and Optimization of Networks. Alberto Ceselli Modeling, Analysis and Optimization of Networks Alberto Ceselli alberto.ceselli@unimi.it Università degli Studi di Milano Dipartimento di Informatica Doctoral School in Computer Science A.A. 2015/2016

More information

Module 3 Greedy Strategy

Module 3 Greedy Strategy Module 3 Greedy Strategy Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Introduction to Greedy Technique Main

More information

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su Lecture 5: Pitch and Chord (1) Chord Recognition Li Su Recap: short-time Fourier transform Given a discrete-time signal x(t) sampled at a rate f s. Let window size N samples, hop size H samples, then the

More information

Simple Large-scale Relation Extraction from Unstructured Text

Simple Large-scale Relation Extraction from Unstructured Text Simple Large-scale Relation Extraction from Unstructured Text Christos Christodoulopoulos and Arpit Mittal Amazon Research Cambridge Alexa Question Answering Alexa, what books did Carrie Fisher write?

More information

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems Contents 1 Introduction.... 1 1.1 Organization of the Monograph.... 1 1.2 Notation.... 3 1.3 State of Art.... 4 1.4 Research Issues and Challenges.... 5 1.5 Figures.... 5 1.6 MATLAB OCR Toolbox.... 5 References....

More information

Speech Processing. Simon King University of Edinburgh. additional lecture slides for

Speech Processing. Simon King University of Edinburgh. additional lecture slides for Speech Processing Simon King University of Edinburgh additional lecture slides for 2018-19 assignment Q&A writing exercise Roadmap Modules 1-2: The basics Modules 3-5: Speech synthesis Modules 6-9: Speech

More information

The game of Bridge: a challenge for ILP

The game of Bridge: a challenge for ILP The game of Bridge: a challenge for ILP S. Legras, C. Rouveirol, V. Ventos Véronique Ventos LRI Univ Paris-Saclay vventos@nukk.ai 1 Games 2 Interest of games for AI Excellent field of experimentation Problems

More information

Patterns and random permutations II

Patterns and random permutations II Patterns and random permutations II Valentin Féray (joint work with F. Bassino, M. Bouvel, L. Gerin, M. Maazoun and A. Pierrot) Institut für Mathematik, Universität Zürich Summer school in Villa Volpi,

More information

Learning, prediction and selection algorithms for opportunistic spectrum access

Learning, prediction and selection algorithms for opportunistic spectrum access Learning, prediction and selection algorithms for opportunistic spectrum access TRINITY COLLEGE DUBLIN Hamed Ahmadi Research Fellow, CTVR, Trinity College Dublin Future Cellular, Wireless, Next Generation

More information

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48 Scheduling Radek Mařík FEE CTU, K13132 April 28, 2015 Radek Mařík (marikr@fel.cvut.cz) Scheduling April 28, 2015 1 / 48 Outline 1 Introduction to Scheduling Methodology Overview 2 Classification of Scheduling

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016 Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural

More information