Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc.

Size: px
Start display at page:

Download "Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc."

Transcription

1 Introduction to Markov Models Estimating the probability of phrases of words, sentences, etc.

2 But first: A few preliminaries on text preprocessing

3 What counts as a word? A tricky question. CIS 421/521 - Intro to AI 3

4 How to find Sentences?? CIS 421/521 - Intro to AI 4

5 Q1: How to estimate the probability of a given sentence W? A crucial step in speech recognition (and lots of other applications) First guess: bag of words : Given word lattice: form subsidy for farm subsidies far Pˆ( W ) P ( w ) w W Unigram counts (in 1.7 * 10 6 words of AP text): form 183 subsidy 15 for farm 74 subsidies 55 far 570 Most likely word string given ˆ( ) PW isn t quite right CIS 421/521 - Intro to AI 5

6 Predicting a word sequence II Next guess: products of bigrams For W=w 1 w 2 w 3 w n, Given word lattice: Bigram counts (in 1.7 * 10 6 words of AP text): Much Better (if not quite right) (Q: the counts are tiny! Why?) CIS 421/521 - Intro to AI 6 n 1 Pˆ( W ) P ( wiwi 1 ) i 1 form subsidy for farm subsidies far form subsidy 0 subsidy for 2 form subsidies 0 subsidy far 0 farm subsidy 0 subsidies for 6 farm subsidies 4 subsidies far 0

7 How can we estimate P(W) correctly? Problem: Naïve Bayes model for bigrams violates independence assumptions. Let s do this right. Let W=w 1 w 2 w 3 w n. Then, by the chain rule, P( W ) P( w1 )* P( w2 w )* P( w3 w w2 )*...* P( wn w... wn 1) We can estimate P(w 2 w 1 ) by the Maximum Likelihood Estimator and P(w 3 w 1 w 2 ) by and so on Count( w1w 2) Count( w ) Count( w1w 2w3 ) Count( w w ) CIS 421/521 - Intro to AI 7

8 and finally, Estimating P(w n w 1 w 2 w n-1 ) Again, we can estimate P(w n w 1 w 2 w n-1 ) with the MLE Count( w w... w ) 1 2 n 1 2 wn 1 Count( w w... ) So to decide pat vs. pot in Heat up the oil in a large p?t, compute for pot Count("Heat up the oil in a large pot") Count("Heat up the oil in a large") UNLESS OUR CORPUS IS REALLY HUGE BOTH COUNTS WILL BE 0, yielding 0/0 CIS 421/521 - Intro to AI 8

9 The Web is HUGE!! (2016 version) 48.9/403=0.121 CIS 421/521 - Intro to AI 9

10 But what if we only have 100 million words for our estimates?? CIS 421/521 - Intro to AI 10

11 A BOTEC Estimate of What We Can Estimate What parameters can we estimate with 100 million words of training data?? Assuming (for now) uniform distribution over only 5000 words So even with 10 8 words of data, for even trigrams we encounter the sparse data problem.. CIS 421/521 - Intro to AI 11

12 Review: How can we estimate P(W) correctly? Problem: Naïve Bayes model for bigrams violates independence assumptions. Let s do this right. Let W=w 1 w 2 w 3 w n. Then, by the chain rule, P( W ) P( w1 )* P( w2 w )* P( w3 w w2 )*...* P( wn w... wn 1) We can estimate P(w 2 w 1 ) by the Maximum Likelihood Estimator Count( w1w 2) Count( w ) and P(w 3 w 1 w 2 ) by Count( w1w 2w3 ) Count( w1w 2) and so on CIS 421/521 - Intro to AI 12

13 The Markov Assumption: Only the Immediate Past Matters CIS 421/521 - Intro to AI 13

14 The Markov Assumption: Estimation We estimate the probability of each w i given previous context by P(w i w 1 w 2 w i-1 ) = P(w i w i-1 ) which can be estimated by Count( w w ) i 1 Count( w ) i 1 i So we re back to counting only unigrams and bigrams!! AND we have a correct practical estimation method for P(W) given the Markov assumption! CIS 421/521 - Intro to AI 14

15 Markov Models CIS 421/521 - Intro to AI 15

16 Review (and crucial for upcoming homework): Cumulative distribution Functions (CDFs) The CDF of a random variable X is denoted by F X (x) and is defined by F X (x)=pr(x x) F is monotonic nondecreasing: x y, F x F y If X is a discrete random variable that attains values x 1, x 2,, x n with probabilities p(x 1 ), p(x 2 ), then FX ( xi ) p( xi ) j i CIS 421/521 - Intro to AI 16

17 CDF for a very small English corpus Corpus: the mouse ran up the clock. The spider ran up the waterspout. P(the)=4/12, P(ran)=P(up)=2/12 P(mouse)=P(clock)=P(spider)=P(waterspout)=1/12 Arbitrarily fix an order: w1=the, w2=ran, w3=up, w4=mouse, 1 11/1 10/1 9/12 8/12 7/12 6/12 5/12 ` 4/12 3/12 2/12 F(the)=4/12 F(ran)=6/12 F(up)=8/12 F(mouse)=9/12 ` 1/12 The Ran Up Mouse Clock Spider waterspout CIS 421/521 - Intro to AI 17

18 Visualizing an n-gram based language model: the Shannon/Miller/Selfridge method To generate a sequence of n words given unigram estimates: Fix some ordering of the vocabulary v 1 v 2 v 3 v k. For each word position i, 1 i n Choose a random value r i between 0 and 1 Choose w i = the first v j such that F V v r i i.e the first v j such that j m 1 P( v ) m r i CIS 421/521 - Intro to AI 18

19 Visualizing an n-gram based language model: the Shannon/Miller/Selfridge method To generate a sequence of n words given a 1 st order Markov model (i.e. conditioned on one previous word): Fix some ordering of the vocabulary v 1 v 2 v 3 v k. Use unigram method to generate an initial word w 1 For each remaining position i, 2 i n Choose a random value r i between 0 and 1 Choose w i = the first v j such that P( vm wi 1) ri j m 1 CIS 421/521 - Intro to AI 19

20 The Shannon/Miller/Selfridge method trained on Shakespeare (This and next two slides from Jurafsky) CIS 421/521 - Intro to AI 20

21 Wall Street Journal just isn t Shakespeare CIS 421/521 - Intro to AI 21

22 Shakespeare as corpus N=884,647 tokens, V=29,066 Shakespeare produced 300,000 bigram types out of V 2 = 844 million possible bigrams. So 99.96% of the possible bigrams were never seen (have zero entries in the table) Quadgrams worse: What's coming out looks like Shakespeare because it is Shakespeare CIS 421/521 - Intro to AI 22

23 The Sparse Data Problem Again So we smooth. How likely is a 0 count? Much more likely than I let on!!! CIS 421/521 - Intro to AI 23

24 English word frequencies well described by Zipf s Law Zipf (1949) characterized the relation between word frequency and rank as: f r r C/f log(r) C (for constant log(c) - log Purely Zipfian data plots as a straight line on a loglog scale (f) C) *Rank (r): The numerical position of a word in a list sorted by decreasing frequency (f ). CIS 421/521 - Intro to AI 24

25 Word frequency & rank in Brown Corpus vs Zipf Lots of area under the tail of this curve! From: Interactive mathematics CIS 421/521 - Intro to AI 25

26 Zipf s law for the Brown corpus CIS 421/521 - Intro to AI 26

27 Exploiting Zipf to do Language ID #The following filters out arabic words that are also frequent in Spanish and English... arabic_top_12 = [ '7ata', 'ana', 'ma', 'w', 'bs', 'fe', 'b3d', '3adou', 'mn', 'kan', 'men', 'ahmed' ] #The following filters out urdu words common in English urdu_top_17 = ['hai', 'ko', 'ki', 'main', 'na', 'se', 'ho', 'bhi', 'mein', 'ka', 'tum', 'nahi', 'meri', 'jo', 'wo', 'dil', 'hain'] spanish_top_16 = ['de', 'la', 'que', 'el', 'en', 'y', 'es', 'un', 'los', 'por', 'se', 'para', 'con'] english_top_20 = ['the', 'to', 'of', 'in', 'i', 'a', 'is', 'and, 'you', 'for', 'on', 'it', 'that', 'are', 'with', 'am', 'my', 'be', 'at' 'not', 'we'] CIS 421/521 - Intro to AI 27

28 All the code you need. #TO GET BEST LANGUAGE AS STRING: lid_pick_best(lid_process_tweet(tweet)) counts=collections.counter() def lid_process_tweet(tweet): counts.clear() for word in re.split(r'[\.?!,]*\s+, tweet.encode('ascii','replace ).strip().lower()): if not re.match(r' for lang in languages: if word in topwords[lang]: #( english, arabic,...) counts[lang]+=1 #dict of word lists indexed by land return counts.most_common() def lid_pick_best (count_list): if count_list: return count_list[0][0] else: return 'UNKNOWN' CIS 421/521 - Intro to AI 28

Introduction to Markov Models

Introduction to Markov Models Introduction to Markov Models But first: A few preliminaries Estimating the probability of phrases of words, sentences, etc. CIS 391 - Intro to AI 2 What counts as a word? A tricky question. How to find

More information

Introduction to Markov Models

Introduction to Markov Models Itroductio to Markov Models But first: A few prelimiaries o text preprocessig Estimatig the probability of phrases of words, seteces, etc. What couts as a word? A tricky questio. How to fid Seteces?? CIS

More information

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 NLP Task I Determining Part of Speech Tags Given a text, assign each token its correct part of speech (POS) tag, given its

More information

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search The revolution of the empiricists Machine Translation Word alignment & Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Classical approaches

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) Lecture, Feb 2 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University

More information

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 4: n-grams in NLP LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Objectives Frequent n-grams in English n-grams and statistical NLP n-grams and conditional probability Large

More information

The fundamentals of detection theory

The fundamentals of detection theory Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection

More information

HW1 is due Thu Oct 12 in the first 5 min of class. Read through chapter 5.

HW1 is due Thu Oct 12 in the first 5 min of class. Read through chapter 5. Stat 100a, Introduction to Probability. Outline for the day: 1. Bayes's rule. 2. Random variables. 3. cdf, pmf, and density. 4. Expected value, continued. 5. All in with AA. 6. Pot odds. 7. Violette vs.

More information

Speech Recognition. Mitch Marcus CIS 421/521 Artificial Intelligence

Speech Recognition. Mitch Marcus CIS 421/521 Artificial Intelligence Speech Recognition Mitch Marcus CIS 421/521 Artificial Intelligence A Sample of Speech Recognition Today's class is about: First, why speech recognition is difficult. As you'll see, the impression we have

More information

24.09 Minds and Machines Fall 11 HASS-D CI

24.09 Minds and Machines Fall 11 HASS-D CI 24.09 Minds and Machines Fall 11 HASS-D CI self-assessment the Chinese room argument Image by MIT OpenCourseWare. 1 derived vs. underived intentionality Something has derived intentionality just in case

More information

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT

Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT Statistical Machine Translation Machine Translation Phrase-Based Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University October 2009 Probabilistic

More information

Great Is the Love/Hay Gran Amor. Jaime Cortez. Unison Keyboard

Great Is the Love/Hay Gran Amor. Jaime Cortez. Unison Keyboard 887 OCP sheet music Great Is the Love/Hay Gran Amor aime Cortez Unison Keyboard The material that you have requested is copyrighted. Copyright la requires you to obtain a license from the copyright holder

More information

Bayesian Positioning in Wireless Networks using Angle of Arrival

Bayesian Positioning in Wireless Networks using Angle of Arrival Bayesian Positioning in Wireless Networks using Angle of Arrival Presented by: Rich Martin Joint work with: David Madigan, Eiman Elnahrawy, Wen-Hua Ju, P. Krishnan, A.S. Krishnakumar Rutgers University

More information

Specimen 2018 Morning Time allowed: 1 hour

Specimen 2018 Morning Time allowed: 1 hour SPECIMEN MATERIAL GCSE SPANISH Foundation Tier Paper 4 Writing F Specimen 2018 Morning Time allowed: 1 hour Materials: You will need no other materials. Instructions Use black ink or black ball-point pen.

More information

System Identification and CDMA Communication

System Identification and CDMA Communication System Identification and CDMA Communication A (partial) sample report by Nathan A. Goodman Abstract This (sample) report describes theory and simulations associated with a class project on system identification

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Backward induction is a widely accepted principle for predicting behavior in sequential games. In the classic

Backward induction is a widely accepted principle for predicting behavior in sequential games. In the classic Published online ahead of print November 9, 212 MANAGEMENT SCIENCE Articles in Advance, pp. 1 18 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.112.1645 212 INFORMS A Dynamic

More information

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability The study of probability is concerned with the likelihood of events occurring Like combinatorics, the origins of probability theory can be traced back to the study of gambling games Still a popular branch

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

/665 Natural Language Processing

/665 Natural Language Processing 601.465/665 Natural Language Processing Prof: Jason Eisner Webpage: http://cs.jhu.edu/~jason/465 syllabus, announcements, slides, homeworks 1 Goals of the field Computers would be a lot more useful if

More information

Statistical Analysis of Modern Communication Signals

Statistical Analysis of Modern Communication Signals Whitepaper Statistical Analysis of Modern Communication Signals Bob Muro Application Group Manager, Boonton Electronics Abstract The latest wireless communication formats like DVB, DAB, WiMax, WLAN, and

More information

Card counting meets hidden Markov models

Card counting meets hidden Markov models University of New Mexico UNM Digital Repository Electrical and Computer Engineering ETDs Engineering ETDs 2-7-2011 Card counting meets hidden Markov models Steven J. Aragon Follow this and additional works

More information

Motif finding. GCB 535 / CIS 535 M. T. Lee, 10 Oct 2004

Motif finding. GCB 535 / CIS 535 M. T. Lee, 10 Oct 2004 Motif finding GCB 535 / CIS 535 M. T. Lee, 10 Oct 2004 Our goal is to identify significant patterns of letters (nucleotides, amino acids) contained within long sequences. The pattern is called a motif.

More information

Log-linear models (part III)

Log-linear models (part III) Log-linear models (part III) Lecture, Feb 7 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

Machine Learning for Language Technology

Machine Learning for Language Technology Machine Learning for Language Technology Generative and Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Machine Learning for Language

More information

Guess the Mean. Joshua Hill. January 2, 2010

Guess the Mean. Joshua Hill. January 2, 2010 Guess the Mean Joshua Hill January, 010 Challenge: Provide a rational number in the interval [1, 100]. The winner will be the person whose guess is closest to /3rds of the mean of all the guesses. Answer:

More information

Naive Bayes text classification. Sumin Han

Naive Bayes text classification. Sumin Han Naive Bayes text classification Sumin Han (hsm69@kaist.ac.kr) Contents - Introduction Bayes theorem Likelihood Text categorization Tips & Reference 2 Introduction 3 Artificial Intelligence Rule-based AI

More information

Laws of Text. Lecture Objectives. Text Technologies for Data Science INFR Learn about some text laws. This lecture is practical 9/26/2018

Laws of Text. Lecture Objectives. Text Technologies for Data Science INFR Learn about some text laws. This lecture is practical 9/26/2018 Text Technologies for Data Science INFR11145 Laws of Text Instructor: Walid Magdy 26-Sep-2018 Lecture Objectives Learn about some text laws Zipf s law Benford s law Heap s law Clumping/contagion This lecture

More information

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Parallel to AIMA 8., 8., 8.6.3, 8.9 The Automatic Classification Problem Assign object/event or sequence of objects/events

More information

Midterm for Name: Good luck! Midterm page 1 of 9

Midterm for Name: Good luck! Midterm page 1 of 9 Midterm for 6.864 Name: 40 30 30 30 Good luck! 6.864 Midterm page 1 of 9 Part #1 10% We define a PCFG where the non-terminals are {S, NP, V P, V t, NN, P P, IN}, the terminal symbols are {Mary,ran,home,with,John},

More information

A Maximum Likelihood TOA Based Estimator For Localization in Heterogeneous Networks

A Maximum Likelihood TOA Based Estimator For Localization in Heterogeneous Networks Int. J. Communications, Network and System Sciences, 010, 3, 38-4 doi:10.436/ijcns.010.31004 Published Online January 010 (http://www.scirp.org/journal/ijcns/). A Maximum Likelihood OA Based Estimator

More information

November 8, Chapter 8: Probability: The Mathematics of Chance

November 8, Chapter 8: Probability: The Mathematics of Chance Chapter 8: Probability: The Mathematics of Chance November 8, 2013 Last Time Probability Models and Rules Discrete Probability Models Equally Likely Outcomes Crystallographic notation The first symbol

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Machine Translation - Decoding

Machine Translation - Decoding January 15, 2007 Table of Contents 1 Introduction 2 3 4 5 6 Integer Programing Decoder 7 Experimental Results Word alignments Fertility Table Translation Table Heads Non-heads NULL-generated (ct.) Figure:

More information

Laboratory 1: Uncertainty Analysis

Laboratory 1: Uncertainty Analysis University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can

More information

If a series of games (on which money has been bet) is interrupted before it can end, what is the fairest way to divide the stakes?

If a series of games (on which money has been bet) is interrupted before it can end, what is the fairest way to divide the stakes? Interrupted Games of Chance Berkeley Math Circle (Advanced) John McSweeney March 13th, 2012 1 The Problem If a series of games (on which money has been bet) is interrupted before it can end, what is the

More information

Ismaila Ba MSc Student, Department of Mathematics and Statistics Université de Moncton

Ismaila Ba MSc Student, Department of Mathematics and Statistics Université de Moncton Discrimination between statistical distributions for hydrometeorological frequency modeling Ismaila Ba MSc Student, Department of Mathematics and Statistics Université de Moncton INTRODUCTION The identification

More information

10/12/2015. SHRDLU: 1969 NLP solved?? : A sea change in AI technologies. SHRDLU: A demonstration proof. 1990: Parsing Research in Crisis

10/12/2015. SHRDLU: 1969 NLP solved?? : A sea change in AI technologies. SHRDLU: A demonstration proof. 1990: Parsing Research in Crisis SHRDLU: 1969 NLP solved?? 1980-1995: A sea change in AI technologies Example: Natural Language Processing The Great Wave off Kanagawa by Hokusai, ~1830 ] Person: PICK UP A BIG RED BLOCK. Computer: OK.

More information

Latest trends in sentiment analysis - A survey

Latest trends in sentiment analysis - A survey Latest trends in sentiment analysis - A survey Anju Rose G Punneliparambil PG Scholar Department of Computer Science & Engineering Govt. Engineering College, Thrissur, India anjurose.ar@gmail.com Abstract

More information

(Small Group Sydney, Emma, Carson, Lucas) What ya gonna do when the lake goes dry, honey What ya gonna do when the lake goes dry?

(Small Group Sydney, Emma, Carson, Lucas) What ya gonna do when the lake goes dry, honey What ya gonna do when the lake goes dry? The Crawdad Song You get a line and I ll get a pole, honey You get a line and I ll get a pole, babe You get a line and I ll get a pole, and we ll go down to the crawdad hole, honey sugar ba-by mine Sittin

More information

Total. STAT/MATH 394 A - Autumn Quarter Midterm. Name: Student ID Number: Directions. Complete all questions.

Total. STAT/MATH 394 A - Autumn Quarter Midterm. Name: Student ID Number: Directions. Complete all questions. STAT/MATH 9 A - Autumn Quarter 015 - Midterm Name: Student ID Number: Problem 1 5 Total Points Directions. Complete all questions. You may use a scientific calculator during this examination; graphing

More information

3.5 Marginal Distributions

3.5 Marginal Distributions STAT 421 Lecture Notes 52 3.5 Marginal Distributions Definition 3.5.1 Suppose that X and Y have a joint distribution. The c.d.f. of X derived by integrating (or summing) over the support of Y is called

More information

Peak-based EMG Detection Via CWT

Peak-based EMG Detection Via CWT 41 Chapter 3 Peak-based EMG Detection Via CWT 3.1 Existing Methods In the EMG signal detection problem, one of the main tasks is to identify transient peaks of the muscle responses, or Motor Evoked Potentials

More information

Lesson 6.1 Linear Equation Review

Lesson 6.1 Linear Equation Review Name: Lesson 6.1 Linear Equation Review Vocabulary Equation: a math sentence that contains Linear: makes a straight line (no Variables: quantities represented by (often x and y) Function: equations can

More information

CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence CS 540: Introduction to Artificial Intelligence Mid Exam: 7:15-9:15 pm, October 25, 2000 Room 1240 CS & Stats CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

1 What s in the shipping package?

1 What s in the shipping package? SST 900B 900 MHz RS 232/RS 485 Wireless Modem Quick Start Guide 1 What s in the shipping package? SST-900B Wireless Modem CA-0910 Quick Start CD 3dBi 900M Hz Antenna Guide 2 External switch introduction

More information

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING Harman Jot, Rupinder Kaur M.Tech, Department of Electronics and Communication, Punjabi University, Patiala, Punjab, India I. INTRODUCTION

More information

Recap from previous lecture. Information Retrieval. Topics for Today. Recall: Basic structure of an Inverted index. Dictionaries & Tolerant Retrieval

Recap from previous lecture. Information Retrieval. Topics for Today. Recall: Basic structure of an Inverted index. Dictionaries & Tolerant Retrieval Recap from previous lecture nformation Retrieval Dictionaries & Tolerant Retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University nverted indexes

More information

Some Parameter Estimators in the Generalized Pareto Model and their Inconsistency with Observed Data

Some Parameter Estimators in the Generalized Pareto Model and their Inconsistency with Observed Data Some Parameter Estimators in the Generalized Pareto Model and their Inconsistency with Observed Data F. Ashkar, 1 and C. N. Tatsambon 2 1 Department of Mathematics and Statistics, Université de Moncton,

More information

Efficiency and detectability of random reactive jamming in wireless networks

Efficiency and detectability of random reactive jamming in wireless networks Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering

More information

The Log-Log Term Frequency Distribution

The Log-Log Term Frequency Distribution The Log-Log Term Frequency Distribution Jason D. M. Rennie jrennie@gmail.com July 14, 2005 Abstract Though commonly used, the unigram is widely known as being a poor model of term frequency; it assumes

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Self-Organizing Lists Marius Kloft Assumptions for Searching Until now, we implicitly assumed that every element of our list is searched with the same probability, i.e.,

More information

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1 Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic

More information

Understanding Apparent Increasing Random Jitter with Increasing PRBS Test Pattern Lengths

Understanding Apparent Increasing Random Jitter with Increasing PRBS Test Pattern Lengths JANUARY 28-31, 2013 SANTA CLARA CONVENTION CENTER Understanding Apparent Increasing Random Jitter with Increasing PRBS Test Pattern Lengths 9-WP6 Dr. Martin Miller The Trend and the Concern The demand

More information

Veracity Managing Uncertain Data. Skript zur Vorlesung Datenbanksystem II Dr. Andreas Züfle

Veracity Managing Uncertain Data. Skript zur Vorlesung Datenbanksystem II Dr. Andreas Züfle Veracity Managing Uncertain Data Skript zur Vorlesung Datenbanksystem II Dr. Andreas Züfle Geo-Spatial Data Huge flood of geo-spatial data Modern technology New user mentality Great research potential

More information

Simple Large-scale Relation Extraction from Unstructured Text

Simple Large-scale Relation Extraction from Unstructured Text Simple Large-scale Relation Extraction from Unstructured Text Christos Christodoulopoulos and Arpit Mittal Amazon Research Cambridge Alexa Question Answering Alexa, what books did Carrie Fisher write?

More information

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007 MIT OpenCourseWare http://ocw.mit.edu MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007 For information about citing these materials or our Terms of Use, visit:

More information

The Self-Avoiding Walk (Probability And Its Applications) By Neal Madras;Gordon Slade

The Self-Avoiding Walk (Probability And Its Applications) By Neal Madras;Gordon Slade The Self-Avoiding Walk (Probability And Its Applications) By Neal Madras;Gordon Slade If you are searching for a book by Neal Madras;Gordon Slade The Self-Avoiding Walk (Probability and Its Applications)

More information

Outlier-Robust Estimation of GPS Satellite Clock Offsets

Outlier-Robust Estimation of GPS Satellite Clock Offsets Outlier-Robust Estimation of GPS Satellite Clock Offsets Simo Martikainen, Robert Piche and Simo Ali-Löytty Tampere University of Technology. Tampere, Finland Email: simo.martikainen@tut.fi Abstract A

More information

Discrete Structures for Computer Science

Discrete Structures for Computer Science Discrete Structures for Computer Science William Garrison bill@cs.pitt.edu 6311 Sennott Square Lecture #23: Discrete Probability Based on materials developed by Dr. Adam Lee The study of probability is

More information

Simple Large-scale Relation Extraction from Unstructured Text

Simple Large-scale Relation Extraction from Unstructured Text Simple Large-scale Relation Extraction from Unstructured Text Christos Christodoulopoulos and Arpit Mittal Amazon Research Cambridge Alexa Question Answering Alexa, what books did Carrie Fisher write?

More information

CH 20 NUMBER WORD PROBLEMS

CH 20 NUMBER WORD PROBLEMS 187 CH 20 NUMBER WORD PROBLEMS Terminology To double a number means to multiply it by 2. When n is doubled, it becomes 2n. The double of 12 is 2(12) = 24. To square a number means to multiply it by itself.

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Outcome Forecasting in Sports. Ondřej Hubáček

Outcome Forecasting in Sports. Ondřej Hubáček Outcome Forecasting in Sports Ondřej Hubáček Motivation & Challenges Motivation exploiting betting markets performance optimization Challenges no available datasets difficulties with establishing the state-of-the-art

More information

Research Seminar. Stefano CARRINO fr.ch

Research Seminar. Stefano CARRINO  fr.ch Research Seminar Stefano CARRINO stefano.carrino@hefr.ch http://aramis.project.eia- fr.ch 26.03.2010 - based interaction Characterization Recognition Typical approach Design challenges, advantages, drawbacks

More information

STAT Statistics I Midterm Exam One. Good Luck!

STAT Statistics I Midterm Exam One. Good Luck! STAT 515 - Statistics I Midterm Exam One Name: Instruction: You can use a calculator that has no connection to the Internet. Books, notes, cellphones, and computers are NOT allowed in the test. There are

More information

A Weighted Least Squares Algorithm for Passive Localization in Multipath Scenarios

A Weighted Least Squares Algorithm for Passive Localization in Multipath Scenarios A Weighted Least Squares Algorithm for Passive Localization in Multipath Scenarios Noha El Gemayel, Holger Jäkel, Friedrich K. Jondral Karlsruhe Institute of Technology, Germany, {noha.gemayel,holger.jaekel,friedrich.jondral}@kit.edu

More information

Literature Look for these books in a library. Point out shapes and how they can be found in everyday objects. Vocabulary Builder. Home Activity.

Literature Look for these books in a library. Point out shapes and how they can be found in everyday objects. Vocabulary Builder. Home Activity. 12 Chapter Dear Family, My class started Chapter 12 this week. In this chapter, I will describe and combine two-dimensional shapes. I will learn about equal shares, halves, and fourths. Love, Vocabulary

More information

Lecture 15. Turbo codes make use of a systematic recursive convolutional code and a random permutation, and are encoded by a very simple algorithm:

Lecture 15. Turbo codes make use of a systematic recursive convolutional code and a random permutation, and are encoded by a very simple algorithm: 18.413: Error-Correcting Codes Lab April 6, 2004 Lecturer: Daniel A. Spielman Lecture 15 15.1 Related Reading Fan, pp. 108 110. 15.2 Remarks on Convolutional Codes Most of this lecture ill be devoted to

More information

Lesson 47. A 30X zoom lens

Lesson 47. A 30X zoom lens Lesson 47. A 30X zoom lens Lesson 38 showed how to design an 8X zoom lens with no starting configuration. Now we will do a more difficult job, aiming for a zoom ratio of 30X. This exercise will use many

More information

Mobility Patterns in Microcellular Wireless Networks

Mobility Patterns in Microcellular Wireless Networks Carnegie Mellon University Research Showcase @ CMU Department of Engineering and Public Policy Carnegie Institute of Technology 3-23 Mobility Patterns in Microcellular Wireless Networks Suttipong Thajchayapong

More information

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include: The final examination on May 31 may test topics from any part of the course, but the emphasis will be on topic after the first three homework assignments, which were covered in the midterm. Topics from

More information

Essential Question How can you list the possible outcomes in the sample space of an experiment?

Essential Question How can you list the possible outcomes in the sample space of an experiment? . TEXAS ESSENTIAL KNOWLEDGE AND SKILLS G..B Sample Spaces and Probability Essential Question How can you list the possible outcomes in the sample space of an experiment? The sample space of an experiment

More information

Battleship as a Dialog System Aaron Brackett, Gerry Meixiong, Tony Tan-Torres, Jeffrey Yu

Battleship as a Dialog System Aaron Brackett, Gerry Meixiong, Tony Tan-Torres, Jeffrey Yu Battleship as a Dialog System Aaron Brackett, Gerry Meixiong, Tony Tan-Torres, Jeffrey Yu Abstract For our project, we built a conversational agent for Battleship using Dialog systems. In this paper, we

More information

Chapter 3: Resistive Network Analysis Instructor Notes

Chapter 3: Resistive Network Analysis Instructor Notes Chapter 3: Resistive Network Analysis Instructor Notes Chapter 3 presents the principal topics in the analysis of resistive (DC) circuits The presentation of node voltage and mesh current analysis is supported

More information

15 Discrete-Time Modulation

15 Discrete-Time Modulation 15 Discrete-Time Modulation The modulation property is basically the same for continuous-time and discrete-time signals. The principal difference is that since for discrete-time signals the Fourier transform

More information

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments Felix Weninger, Jürgen Geiger, Martin Wöllmer, Björn Schuller, Gerhard

More information

Instrumental Considerations

Instrumental Considerations Instrumental Considerations Many of the limits of detection that are reported are for the instrument and not for the complete method. This may be because the instrument is the one thing that the analyst

More information

THE CHALLENGES OF SENTIMENT ANALYSIS ON SOCIAL WEB COMMUNITIES

THE CHALLENGES OF SENTIMENT ANALYSIS ON SOCIAL WEB COMMUNITIES THE CHALLENGES OF SENTIMENT ANALYSIS ON SOCIAL WEB COMMUNITIES Osamah A.M Ghaleb 1,Anna Saro Vijendran 2 1 Ph.D Research Scholar, Department of Computer Science, Sri Ramakrishna College of Arts and Science,(India)

More information

The Game-Theoretic Approach to Machine Learning and Adaptation

The Game-Theoretic Approach to Machine Learning and Adaptation The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning

More information

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom Learning Goals Joint Distributions, Independence Class 7, 8.5 Jeremy Orloff and Jonathan Bloom. Understand what is meant by a joint pmf, pdf and cdf of two random variables. 2. Be able to compute probabilities

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 03 Quantization, PCM and Delta Modulation Hello everyone, today we will

More information

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22. Introduction to Artificial Intelligence Announcements V22.0472-001 Fall 2009 Lecture 19: Speech Recognition & Viterbi Decoding Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from John

More information

Day Session Common Core Adaptation Common Core Standards Measurement Benchmarks

Day Session Common Core Adaptation Common Core Standards Measurement Benchmarks Unit 4 Common Core Mathematical Practices (MP) Domains Number and Operations in Base Ten (NBT) Measurement and Data (MD) Geometry (G) INVESTIG ATION 1 Linear Measurement Teach this Investigation as is.

More information

Apply Kalman Filter in Financial Time Series

Apply Kalman Filter in Financial Time Series Apply Kalman Filter in Financial Time Series Final Project for EE616 Signal Detection & Estimation Xingzhong Xu Department of Electrical & Computer Engineering Stevens Institute of Technology April 9,

More information

Image Processing Computer Graphics I Lecture 20. Display Color Models Filters Dithering Image Compression

Image Processing Computer Graphics I Lecture 20. Display Color Models Filters Dithering Image Compression 15-462 Computer Graphics I Lecture 2 Image Processing April 18, 22 Frank Pfenning Carnegie Mellon University http://www.cs.cmu.edu/~fp/courses/graphics/ Display Color Models Filters Dithering Image Compression

More information

CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS

CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS 170 CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS 9.1 INTRODUCTION It is the usual practise to test the yarn at a gauge

More information

Chapter 5 Exercise Solutions

Chapter 5 Exercise Solutions -bar R Chapter Eercise Solutions Notes:. Several eercises in this chapter differ from those in the th edition. An * indicates that the description has changed. A second eercise number in parentheses indicates

More information

Digital Communication Systems ECS 452

Digital Communication Systems ECS 452 Digital Communication Systems ECS 452 Asst. Prof. Dr. Prapun Suksompong prapun@siit.tu.ac.th 2. Source Coding 1 Office Hours: BKD, 6th floor of Sirindhralai building Monday 10:00-10:40 Tuesday 12:00-12:40

More information

Numerical: Data with quantity Discrete: whole number answers Example: How many siblings do you have?

Numerical: Data with quantity Discrete: whole number answers Example: How many siblings do you have? Types of data Numerical: Data with quantity Discrete: whole number answers Example: How many siblings do you have? Continuous: Answers can fall anywhere in between two whole numbers. Usually any type of

More information

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely

More information

Electric Guitar Pickups Recognition

Electric Guitar Pickups Recognition Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly

More information

Local Search: Hill Climbing. When A* doesn t work AIMA 4.1. Review: Hill climbing on a surface of states. Review: Local search and optimization

Local Search: Hill Climbing. When A* doesn t work AIMA 4.1. Review: Hill climbing on a surface of states. Review: Local search and optimization Outline When A* doesn t work AIMA 4.1 Local Search: Hill Climbing Escaping Local Maxima: Simulated Annealing Genetic Algorithms A few slides adapted from CS 471, UBMC and Eric Eaton (in turn, adapted from

More information

Analog Circuits Prof. Jayanta Mukherjee Department of Electrical Engineering Indian Institute of Technology-Bombay

Analog Circuits Prof. Jayanta Mukherjee Department of Electrical Engineering Indian Institute of Technology-Bombay Analog Circuits Prof. Jayanta Mukherjee Department of Electrical Engineering Indian Institute of Technology-Bombay Week -02 Module -01 Non Idealities in Op-Amp (Finite Gain, Finite Bandwidth and Slew Rate)

More information

Free-Standing Mathematics Qualification Mathematics

Free-Standing Mathematics Qualification Mathematics Free-Standing Mathematics Qualification Mathematics 4986 Data Handling Mark scheme 4986 June 016 Version 1.0: Final Mark Scheme Mark schemes are prepared by the Lead Assessment Writer and considered, together

More information

Degrees of Freedom in Adaptive Modulation: A Unified View

Degrees of Freedom in Adaptive Modulation: A Unified View Degrees of Freedom in Adaptive Modulation: A Unified View Seong Taek Chung and Andrea Goldsmith Stanford University Wireless System Laboratory David Packard Building Stanford, CA, U.S.A. taek,andrea @systems.stanford.edu

More information

CSE 255 Assignment 1: Helpfulness in Amazon Reviews

CSE 255 Assignment 1: Helpfulness in Amazon Reviews CSE 255 Assignment 1: Helpfulness in Amazon Reviews Kristján Jónsson University of California, San Diego 9500 Gilman Dr La Jolla, CA 92093 USA kjonsson@eng.ucsd.edu Devin Platt University of California,

More information

Real Time Word to Picture Translation for Chinese Restaurant Menus

Real Time Word to Picture Translation for Chinese Restaurant Menus Real Time Word to Picture Translation for Chinese Restaurant Menus Michelle Jin, Ling Xiao Wang, Boyang Zhang Email: mzjin12, lx2wang, boyangz @stanford.edu EE268 Project Report, Spring 2014 Abstract--We

More information