Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT

Size: px
Start display at page:

Download "Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT"

Transcription

1 Statistical Machine Translation Machine Translation Phrase-Based Statistical MT Jörg Tiedemann Department of Linguistics and Philology Uppsala University October 2009 Probabilistic view on MT (E = target language, F = source language): Ê = argmax E P(E F ) = argmax E P(F E)P(E) Jörg Tiedemann 1/1 Jörg Tiedemann 2/1 Statistical Machine Translation: Language Modeling Motivation for Phrase-based SMT Language modeling: (probabilistic) LM = predict likelihood of a given string What is the likelihood P(E) to observe sentence E? Exactly what we need! Estimate probabilities from corpora: decompose into N-grams! unigram model: P(E) = P(e 1 ) P(e 2 )...P(e n ) bigram model: P(E) = P(e 1 ) P(e 2 e 1 ) P(e 3 e 2 )...P(e n e n 1 ) trigram model: P(E) = P(e 1 ) P(e 2 e 1 ) P(e 3 e 1, e 2 )...P(e n e n 2 e n 1, ) There would be much more to say about language modeling... Word-based SMT statistical word alignment P(F E) language modeling P(E) global decoding argmax E P(F E)P(E) Word-by-word translation is too weak! contextual dependencies non-compositional constructions n:m relations look at larger chunks! Jörg Tiedemann 3/1 Jörg Tiedemann 4/1

2 Phrase-based SMT Phrase-based SMT Translation model in PSMT: Motivation phrases = word N-grams less ambiguity, more context in translation table handle non-compositional expressions local reorderings covered by phrase translations distortion : reordering on phrase level P(F E) = I φ(f i e i )d(start i, end i 1 ) i=1 phrases are extracted from word aligned parallel corpora phrase translation probabilities (MLE): φ(f e) = count(f, e) count(f, e) f Moses toolkit: ( Jörg Tiedemann 5/1 Jörg Tiedemann 6/1 Phrase-based SMT Statistical word alignment Standard models: Phrase translation probabilities: need phrase alignments in parallel corpus induce them from word alignments (IBM models) score extracted phrases (MLE) IBM models 1-5 (cascaded), EM training, final parameters: word translation probabilities (lexical model) fertility probabilities distortion probabilities (reordering) Viterbi alignment assign most likely links between words according to the statistical word alignment model from above Jörg Tiedemann 7/1 Jörg Tiedemann 8/1

3 Viterbi Word Alignment Viterbi Word Alignment from GIZA++ From the German-English Europarl corpus: special NULL word (NULL la) EMPTY alignment possible (did) only 1:many (slap); not many:1 depending on alignment direction Alignment tool: GIZA++ ( # Sentence pair (5) source length 12 target length 11 alignment score : e-24 ich bitte sie, sich zu einer schweigeminute zu erheben. NULL ({ }) please ({ }) rise ({ }), ({ 4 }) then ({ 5 }), ({ }) for ({ 6 }) this ({ 7 }) minute ({ 8 }) ({ }) s ({ }) silence ({ 9 10 }). ({ 11 }) # Sentence pair (6) source length 12 target length 10 alignment score : e-15 ( das parlament erhebt sich zu einer schweigeminute. ) NULL ({ }) ( ({ 1 }) the ({ 2 }) house ({ 3 }) rose ({ 4 5 }) and ({ }) observed ({ 6 }) a ({ 7 }) minute ({ 8 }) ({ }) s ({ }) silence ({ 9 }) ) ({ 10 }) Jörg Tiedemann 9/1 Jörg Tiedemann 11/1 Viterbi Word Alignment Word Alignment Symmetrization source-to-target word alignment: Asymmetric alignment! no n:1 alignments can run IBM models in both directions! different links in source-to-target and target-to-source best alignment = merge both directions (?!) How? Symmetrization heuristics! Jörg Tiedemann 12/1 Jörg Tiedemann 13/1

4 Word Alignment Symmetrization target-to-source word alignment: Word Alignment Symmetrization symmetrized word alignment: Jörg Tiedemann 14/1 Jörg Tiedemann 15/1 Word Alignment Symmetrization start with intersection, add adjacent links (from union)... Phrase extraction Get ALL phrase pairs that are consistent with word alignments Jörg Tiedemann 16/1 Jörg Tiedemann 17/1

5 Phrase extraction Phrase extraction Jörg Tiedemann 18/1 Jörg Tiedemann 19/1 Phrase extraction Phrase extraction Jörg Tiedemann 20/1 Jörg Tiedemann 21/1

6 Phrase extraction Phrase extraction Jörg Tiedemann 22/1 Jörg Tiedemann 23/1 Scoring phrases Phrase tables Examples from a phrase table (Pirates of the Caribbean): Simple Maximum likelihood estimation: φ(f e) = count(f, e) count(f, e) f A huge phrase table! (with a lot of garbage?) Swedish English Score, det?, it s , det?, that s 1 att bli besvikna be disappointed 1 att bli en sj?v to becoming one 1 bara vi just 0.1 bara just 0.6 bara only barbossa och hans bes?tning barbossa and his crew 1 barbossa och hans barbossa and his 1 barbossa t?ker g?a. allt barbossa is up to all 1 (The training set was too small to get reasonable counts!) Jörg Tiedemann 24/1 Jörg Tiedemann 25/1

7 The final model for PB-SMT PB-SMT extension: Log-linear Models Instead of noisy-channel model Ê = argmax EP(F E)P(E): Ê = argmax E P(E F ) = argmax E ( φ(f i e i ) d(start i, end i 1 ) P(E) ω length(e)) model posterior directly: Ê = argmax EP(E F ) many feature functions h m (E, F ) may influence P(E F) Distortion d: Chance to move phrases to other positions fixed distortion limit (e.g. 6) simple penalty for moving: α start i end i 1 1 OR lexicalized distortion (learned from alignment) phrase translation model E F phrase translation model F E lexical weights from underlying word alignment a language model P(E) lexicalized reordering model length features (word/phrase costs/penalties) Word cost: ω length(e) = bias for longer output P(E F) = weighted combination of feature functions! Jörg Tiedemann 26/1 Jörg Tiedemann 27/1 PB-SMT extension: Log-linear Models P(E F ) = weighted (λ m ) combination of feature functions (h m ) P(E F ) = ep M m=1 λ mh m (E,F) Z Ê = argmax E P(E F) = argmax E (logp(e F )) How to learn weights λ m? M = argmax E λ m h m (E, F ) m=1 Minimum error rate training (MERT) on development set! Measure error in terms of BLEU scores (n-best list) Iterative adjustment of model parameters (slow but effective!) Phrase table with multiple scores That s what you will get from Moses: Swedish English Scores, det?, it s att bli besvikna be disappointed att bli en sj?v to becoming one bara vi just bara just bara naught but bara only phrase translation probability φ(f e) lexical weighting lex(f e) phrase translation probability φ(e f ) lexical weighting lex(e f ) phrase penalty (always exp(1) 2.718) Jörg Tiedemann 28/1 Jörg Tiedemann 29/1

8 Translation = decoding Global search: Ê = argmax EP(E F ) Maria no dio una bofetada a la many translation alternatives (huge phrase table) many ways to segment words into phrases re-ordering makes it even more complex Very Expensive! need search heuristics pruning (early discard weak hypotheses) stack decoding (histograms & thresholds) reordering limits Mary build translation left-to-right select foreign word to be translated select translation in phrase table add translation to partial translation (hypothesis) Jörg Tiedemann 30/1 Jörg Tiedemann 31/1 Maria no dio una bofetada a la Maria no dio una bofetada a la Mary did not Mary did not slap mark first (foreign) word as translated new example: one-to-many translation many-to-one translation Jörg Tiedemann 32/1 Jörg Tiedemann 33/1

9 Maria no dio una bofetada a la Maria no dio una bofetada a la Mary did not slap the Mary did not slap the green many-to-one translation example for re-ordering Jörg Tiedemann 34/1 Jörg Tiedemann 35/1 Lattice of translation options Maria no dio una bofetada a la Mary did not slap the green witch translation finished Jörg Tiedemann 36/1 Jörg Tiedemann 37/1

10 Hypothesis expansion Hypothesis expansion Jörg Tiedemann 38/1 Jörg Tiedemann 39/1 Hypothesis expansion Hypothesis Stacks... and continue adding more hypothesis exponential explosion of search space! here: based on number of foreign words translated expand all hypotheses from one stack during translation place expanded hypotheses into appropriate stacks get n-best list of translations Jörg Tiedemann 40/1 Jörg Tiedemann 41/1

11 Phrase-based SMT Summary PB-SMT More information: Homepage of the Moses toolkit phrase-based SMT = state-of-the-art in data-driven MT (?!) based on standard word alignment models phrase extraction heuristics & simple scoring simplistic re-ordering model huge phrase table = big memory of fragment translations heuristics for efficient decoding Active research area! New developments all the time! Jörg Tiedemann 42/1 Jörg Tiedemann 43/1 What s next? Next lab session: build your own parallel corpus sentence & word alignment Lecture: a quick look at other topics course summary Last lab session: small-scale experiments with PB-SMT (Moses) basic training, evaluation shifting domains Jörg Tiedemann 44/1

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search The revolution of the empiricists Machine Translation Word alignment & Statistical MT Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Classical approaches

More information

Machine Translation - Decoding

Machine Translation - Decoding January 15, 2007 Table of Contents 1 Introduction 2 3 4 5 6 Integer Programing Decoder 7 Experimental Results Word alignments Fertility Table Translation Table Heads Non-heads NULL-generated (ct.) Figure:

More information

Challenges in Statistical Machine Translation

Challenges in Statistical Machine Translation p.1 Challenges in Statistical Machine Translation Philipp Koehn koehn@csail.mit.edu Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Outline p Statistical Machine

More information

Yu Chen Andreas Eisele Martin Kay

Yu Chen Andreas Eisele Martin Kay LREC 2008: Marrakech, Morocco Department of Computational Linguistics Saarland University May 29, 2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 SMT architecture To build a phrase-based SMT system: Parallel

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences

Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences Jin ichi Murakami, Masato Tokuhisa, Satoru Ikehara Department of Information and Knowledge Engineering Faculty

More information

Midterm for Name: Good luck! Midterm page 1 of 9

Midterm for Name: Good luck! Midterm page 1 of 9 Midterm for 6.864 Name: 40 30 30 30 Good luck! 6.864 Midterm page 1 of 9 Part #1 10% We define a PCFG where the non-terminals are {S, NP, V P, V t, NN, P P, IN}, the terminal symbols are {Mary,ran,home,with,John},

More information

Introduction to Markov Models

Introduction to Markov Models Introduction to Markov Models But first: A few preliminaries Estimating the probability of phrases of words, sentences, etc. CIS 391 - Intro to AI 2 What counts as a word? A tricky question. How to find

More information

CSCI 5832 Natural Language Processing

CSCI 5832 Natural Language Processing CSCI 5832 Natural Language Processing Lecture 25 Jim Martin 4/24/07 CSCI 5832 Spring 2007 1 Machine Translation Slides stolen from Kevin Knight (USC/ISI) 4/24/07 CSCI 5832 Spring 2007 2 1 Today 4/24 Machine

More information

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan USC SAIL Lab INTERSPEECH Articulatory Data

More information

Rule Filtering by Pattern for Efficient Hierarchical Translation

Rule Filtering by Pattern for Efficient Hierarchical Translation for Efficient Hierarchical Translation Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications University of Vigo, Spain 2 Department

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) Lecture, Feb 2 CS 690N, Spring 2017 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2017/ Brendan O Connor College of Information and Computer

More information

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 4: n-grams in NLP LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Objectives Frequent n-grams in English n-grams and statistical NLP n-grams and conditional probability Large

More information

Machine Learning for Language Technology

Machine Learning for Language Technology Machine Learning for Language Technology Generative and Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Machine Learning for Language

More information

Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc.

Introduction to Markov Models. Estimating the probability of phrases of words, sentences, etc. Introduction to Markov Models Estimating the probability of phrases of words, sentences, etc. But first: A few preliminaries on text preprocessing What counts as a word? A tricky question. CIS 421/521

More information

Log-linear models (part 1I)

Log-linear models (part 1I) Log-linear models (part 1I) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University

More information

The challenge of simultaneous speech translation

The challenge of simultaneous speech translation The challenge of simultaneous speech translation Anoop Sarkar School of Computing Science Simon Fraser University Vancouver, British Columbia, Canada PACLIC 30: Seoul. Oct 30, 2016 1 Simultaneous Translation

More information

User Goal Change Model for Spoken Dialog State Tracking

User Goal Change Model for Spoken Dialog State Tracking User Goal Change Model for Spoken Dialog State Tracking Yi Ma Department of Computer Science & Engineering The Ohio State University Columbus, OH 43210, USA may@cse.ohio-state.edu Abstract In this paper,

More information

MATHEMATICAL MODELS Vol. I - Measurements in Mathematical Modeling and Data Processing - William Moran and Barbara La Scala

MATHEMATICAL MODELS Vol. I - Measurements in Mathematical Modeling and Data Processing - William Moran and Barbara La Scala MEASUREMENTS IN MATEMATICAL MODELING AND DATA PROCESSING William Moran and University of Melbourne, Australia Keywords detection theory, estimation theory, signal processing, hypothesis testing Contents.

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521 NLP Task I Determining Part of Speech Tags Given a text, assign each token its correct part of speech (POS) tag, given its

More information

/665 Natural Language Processing

/665 Natural Language Processing 601.465/665 Natural Language Processing Prof: Jason Eisner Webpage: http://cs.jhu.edu/~jason/465 syllabus, announcements, slides, homeworks 1 Goals of the field Computers would be a lot more useful if

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Recap from previous lecture. Information Retrieval. Topics for Today. Recall: Basic structure of an Inverted index. Dictionaries & Tolerant Retrieval

Recap from previous lecture. Information Retrieval. Topics for Today. Recall: Basic structure of an Inverted index. Dictionaries & Tolerant Retrieval Recap from previous lecture nformation Retrieval Dictionaries & Tolerant Retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University nverted indexes

More information

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation Convolutional Coder Basics Coder State Diagram Encoder Trellis Coder Tree Viterbi Decoding For Simplicity assume Binary Sym.Channel

More information

Sample Spaces, Events, Probability

Sample Spaces, Events, Probability Sample Spaces, Events, Probability CS 3130/ECE 3530: Probability and Statistics for Engineers August 28, 2014 Sets A set is a collection of unique objects. Sets A set is a collection of unique objects.

More information

Recap from previous lectures. Information Retrieval. Recap from previous lectures. Topics for Today. Dictionaries & Tolerant Retrieval.

Recap from previous lectures. Information Retrieval. Recap from previous lectures. Topics for Today. Dictionaries & Tolerant Retrieval. Recap from previous lectures nformation Retrieval Dictionaries & Tolerant Retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University nverted indexes

More information

Teddy Mantoro.

Teddy Mantoro. Teddy Mantoro Email: teddy@ieee.org 1. Title and Abstract 2. AI Method 3. Induction Approach 4. Writing Abstract 5. Writing Introduction What should be in the title: Problem, Method and Result The title

More information

CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence CS 540: Introduction to Artificial Intelligence Mid Exam: 7:15-9:15 pm, October 25, 2000 Room 1240 CS & Stats CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages

More information

IBM Research Report. Audits and Business Controls Related to Receipt Rules: Benford's Law and Beyond

IBM Research Report. Audits and Business Controls Related to Receipt Rules: Benford's Law and Beyond RC24491 (W0801-103) January 25, 2008 Other IBM Research Report Audits and Business Controls Related to Receipt Rules: Benford's Law and Beyond Vijay Iyengar IBM Research Division Thomas J. Watson Research

More information

Precoding and Signal Shaping for Digital Transmission

Precoding and Signal Shaping for Digital Transmission Precoding and Signal Shaping for Digital Transmission Robert F. H. Fischer The Institute of Electrical and Electronics Engineers, Inc., New York WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Teddy Mantoro.

Teddy Mantoro. Teddy Mantoro Email: teddy@ieee.org Marshal D Carper Hannah Heath The secret of good writing is rewriting The secret of rewriting is rethinking 1. Title and Abstract 2. AI Method 3. Induction Approach

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Grounding into bits: the semantics of virtual worlds

Grounding into bits: the semantics of virtual worlds Grounding into bits: the semantics of virtual worlds CHRIS QUIRK /// UW MSR SUMMER INSTITUTE /// 2013 JULY 23 JOINT WORK WITH BILL DOLAN, CHRIS BROCKETT, PALLAVI CHOUDHURY, LUKE ZETTLEMOYER, SVITLANA VOLKOVA,

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Introduction. Description of the Project. Debopam Das

Introduction. Description of the Project. Debopam Das Computational Analysis of Text Sentiment: A Report on Extracting Contextual Information about the Occurrence of Discourse Markers Debopam Das Introduction This report documents a particular task performed

More information

REPORT ITU-R BO Multiple-feed BSS receiving antennas

REPORT ITU-R BO Multiple-feed BSS receiving antennas Rep. ITU-R BO.2102 1 REPORT ITU-R BO.2102 Multiple-feed BSS receiving antennas (2007) 1 Introduction This Report addresses technical and performance issues associated with the design of multiple-feed BSS

More information

Detection of Compound Structures in Very High Spatial Resolution Images

Detection of Compound Structures in Very High Spatial Resolution Images Detection of Compound Structures in Very High Spatial Resolution Images Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara, Turkey saksoy@cs.bilkent.edu.tr Joint work

More information

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises ELT-44006 Receiver Architectures and Signal Processing Fall 2014 1 Mandatory homework exercises - Individual solutions to be returned to Markku Renfors by email or in paper format. - Solutions are expected

More information

The fundamentals of detection theory

The fundamentals of detection theory Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

Learning Structured Predictors

Learning Structured Predictors Learning Structured Predictors Xavier Carreras Xerox Research Centre Europe Supervised (Structured) Prediction Learning to predict: given training data { (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Techniques for Sentiment Analysis survey

Techniques for Sentiment Analysis survey I J C T A, 9(41), 2016, pp. 355-360 International Science Press ISSN: 0974-5572 Techniques for Sentiment Analysis survey Anu Sharma* and Savleen Kaur** ABSTRACT A Sentiment analysis is a technique to analyze

More information

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best Elementary Plots Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best More importantly, it is easy to lie

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

A Bit of network information theory

A Bit of network information theory Š#/,% 0/,94%#(.)15% A Bit of network information theory Suhas Diggavi 1 Email: suhas.diggavi@epfl.ch URL: http://licos.epfl.ch Parts of talk are joint work with S. Avestimehr 2, S. Mohajer 1, C. Tian 3,

More information

Cheap, Fast and Good Enough: Speech Transcription with Mechanical Turk. Scott Novotney and Chris Callison-Burch 04/02/10

Cheap, Fast and Good Enough: Speech Transcription with Mechanical Turk. Scott Novotney and Chris Callison-Burch 04/02/10 Cheap, Fast and Good Enough: Speech Transcription with Mechanical Turk Scott Novotney and Chris Callison-Burch 04/02/10 Motivation Speech recognition models hunger for data ASR requires thousands of hours

More information

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis by Chih-Ping Wei ( 魏志平 ), PhD Institute of Service Science and Institute of Technology Management National Tsing Hua

More information

Wireless Network Coding with Local Network Views: Coded Layer Scheduling

Wireless Network Coding with Local Network Views: Coded Layer Scheduling Wireless Network Coding with Local Network Views: Coded Layer Scheduling Alireza Vahid, Vaneet Aggarwal, A. Salman Avestimehr, and Ashutosh Sabharwal arxiv:06.574v3 [cs.it] 4 Apr 07 Abstract One of the

More information

THE goal of Speaker Diarization is to segment audio

THE goal of Speaker Diarization is to segment audio SUBMITTED TO IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 The ICSI RT-09 Speaker Diarization System Gerald Friedland* Member IEEE, Adam Janin, David Imseng Student Member IEEE, Xavier

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu Lecture 2: SIGNALS 1 st semester 1439-2017 1 By: Elham Sunbu OUTLINE Signals and the classification of signals Sine wave Time and frequency domains Composite signals Signal bandwidth Digital signal Signal

More information

Progress in the BBN Keyword Search System for the DARPA RATS Program

Progress in the BBN Keyword Search System for the DARPA RATS Program INTERSPEECH 2014 Progress in the BBN Keyword Search System for the DARPA RATS Program Tim Ng 1, Roger Hsiao 1, Le Zhang 1, Damianos Karakos 1, Sri Harish Mallidi 2, Martin Karafiát 3,KarelVeselý 3, Igor

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2004 Reference: - Steve Young et al. The HTK Book. Version 3.2, 2002. Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing

More information

Introduction to probability

Introduction to probability Introduction to probability Suppose an experiment has a finite set X = {x 1,x 2,...,x n } of n possible outcomes. Each time the experiment is performed exactly one on the n outcomes happens. Assign each

More information

NLP, Games, and Robotic Cars

NLP, Games, and Robotic Cars NLP, Games, and Robotic Cars [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] So Far: Foundational

More information

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su Lecture 5: Pitch and Chord (1) Chord Recognition Li Su Recap: short-time Fourier transform Given a discrete-time signal x(t) sampled at a rate f s. Let window size N samples, hop size H samples, then the

More information

Application of QAP in Modulation Diversity (MoDiv) Design

Application of QAP in Modulation Diversity (MoDiv) Design Application of QAP in Modulation Diversity (MoDiv) Design Hans D Mittelmann School of Mathematical and Statistical Sciences Arizona State University INFORMS Annual Meeting Philadelphia, PA 4 November 2015

More information

A Bandit Approach for Tree Search

A Bandit Approach for Tree Search A An Example in Computer-Go Department of Statistics, University of Michigan March 27th, 2008 A 1 Bandit Problem K-Armed Bandit UCB Algorithms for K-Armed Bandit Problem 2 Classical Tree Search UCT Algorithm

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Chapter 1. Probability

Chapter 1. Probability Chapter 1. Probability 1.1 Basic Concepts Scientific method a. For a given problem, we define measures that explains the problem well. b. Data is collected with observation and the measures are calculated.

More information

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1.

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1. EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code Project #1 is due on Tuesday, October 6, 2009, in class. You may turn the project report in early. Late projects are accepted

More information

Two Bracketing Schemes for the Penn Treebank

Two Bracketing Schemes for the Penn Treebank Anssi Yli-Jyrä Two Bracketing Schemes for the Penn Treebank Abstract The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative

More information

Introduction to Source Coding

Introduction to Source Coding Comm. 52: Communication Theory Lecture 7 Introduction to Source Coding - Requirements of source codes - Huffman Code Length Fixed Length Variable Length Source Code Properties Uniquely Decodable allow

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence NLP, Games, and Autonomous Vehicles Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI

More information

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM Samuel Thomas 1, George Saon 1, Maarten Van Segbroeck 2 and Shrikanth S. Narayanan 2 1 IBM T.J. Watson Research Center,

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.2 Probability Copyright Cengage Learning. All rights reserved. Objectives What Is Probability? Calculating Probability by

More information

A Static Power Model for Architects

A Static Power Model for Architects A Static Power Model for Architects J. Adam Butts and Guri Sohi University of Wisconsin-Madison {butts,sohi}@cs.wisc.edu 33rd International Symposium on Microarchitecture Monterey, California December,

More information

Review: Theorem of irrelevance. Y j φ j (t) where Y j = X j + Z j for 1 j k and Y j = Z j for

Review: Theorem of irrelevance. Y j φ j (t) where Y j = X j + Z j for 1 j k and Y j = Z j for Review: Theorem of irrelevance Given the signal set { a 1,..., a M }, we transmit X(t) = j k =1 a m,jφ j (t) and receive Y (t) = j=1 Y j φ j (t) where Y j = X j + Z j for 1 j k and Y j = Z j for j>k. Assume

More information

DSP First Lab 08: Frequency Response: Bandpass and Nulling Filters

DSP First Lab 08: Frequency Response: Bandpass and Nulling Filters DSP First Lab 08: Frequency Response: Bandpass and Nulling Filters Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in the

More information

Database Normalization as a By-product of MML Inference. Minimum Message Length Inference

Database Normalization as a By-product of MML Inference. Minimum Message Length Inference Database Normalization as a By-product of Minimum Message Length Inference David Dowe Nayyar A. Zaidi Clayton School of IT, Monash University, Melbourne VIC 3800, Australia December 8, 2010 Our Research

More information

CHANNEL MEASUREMENT. Channel measurement doesn t help for single bit transmission in flat Rayleigh fading.

CHANNEL MEASUREMENT. Channel measurement doesn t help for single bit transmission in flat Rayleigh fading. CHANNEL MEASUREMENT Channel measurement doesn t help for single bit transmission in flat Rayleigh fading. It helps (as we soon see) in detection with multi-tap fading, multiple frequencies, multiple antennas,

More information

Transcribing Continuous Speech Using Mismatched Crowdsourcing

Transcribing Continuous Speech Using Mismatched Crowdsourcing Transcribing Continuous Speech Using Mismatched Crowdsourcing Preethi Jyothi 1, Mark Hasegawa-Johnson 1,2 1 Beckman Institute, University of Illinois at Urbana-Champaign, US 2 Department of ECE, University

More information

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003 MAS160: Signals, Systems & Information for Media Technology Problem Set 4 DUE: October 20, 2003 Instructors: V. Michael Bove, Jr. and Rosalind Picard T.A. Jim McBride Problem 1: Simple Psychoacoustic Masking

More information

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei The Case for Optimum Detection Algorithms in MIMO Wireless Systems Helmut Bölcskei joint work with A. Burg, C. Studer, and M. Borgmann ETH Zurich Data rates in wireless double every 18 months throughput

More information

Electric Guitar Pickups Recognition

Electric Guitar Pickups Recognition Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly

More information

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots Elementary Plots Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools (or default settings) are not always the best More importantly,

More information

Wavelets and wavelet convolution and brain music. Dr. Frederike Petzschner Translational Neuromodeling Unit

Wavelets and wavelet convolution and brain music. Dr. Frederike Petzschner Translational Neuromodeling Unit Wavelets and wavelet convolution and brain music Dr. Frederike Petzschner Translational Neuromodeling Unit 06.03.2015 Recap Why are we doing this? We know that EEG data contain oscillations. Or goal is

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

Simple Measures of Visual Encoding. vs. Information Theory

Simple Measures of Visual Encoding. vs. Information Theory Simple Measures of Visual Encoding vs. Information Theory Simple Measures of Visual Encoding STIMULUS RESPONSE What does a [visual] neuron do? Tuning Curves Receptive Fields Average Firing Rate (Hz) Stimulus

More information

SELECTING RELEVANT DATA

SELECTING RELEVANT DATA EXPLORATORY ANALYSIS The data that will be used comes from the reviews_beauty.json.gz file which contains information about beauty products that were bought and reviewed on Amazon.com. Each data point

More information

Player Speed vs. Wild Pokémon Encounter Frequency in Pokémon SoulSilver Joshua and AP Statistics, pd. 3B

Player Speed vs. Wild Pokémon Encounter Frequency in Pokémon SoulSilver Joshua and AP Statistics, pd. 3B Player Speed vs. Wild Pokémon Encounter Frequency in Pokémon SoulSilver Joshua and AP Statistics, pd. 3B In the newest iterations of Nintendo s famous Pokémon franchise, Pokémon HeartGold and SoulSilver

More information

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1 LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 2 STORAGE SPACE Uncompressed graphics, audio, and video data require substantial storage capacity. Storing uncompressed video is not possible

More information

Th P6 01 Retrieval of the P- and S-velocity Structure of the Groningen Gas Reservoir Using Noise Interferometry

Th P6 01 Retrieval of the P- and S-velocity Structure of the Groningen Gas Reservoir Using Noise Interferometry Th P6 1 Retrieval of the P- and S-velocity Structure of the Groningen Gas Reservoir Using Noise Interferometry W. Zhou* (Utrecht University), H. Paulssen (Utrecht University) Summary The Groningen gas

More information

HANDS-ON TRANSFORMATIONS: RIGID MOTIONS AND CONGRUENCE (Poll Code 39934)

HANDS-ON TRANSFORMATIONS: RIGID MOTIONS AND CONGRUENCE (Poll Code 39934) HANDS-ON TRANSFORMATIONS: RIGID MOTIONS AND CONGRUENCE (Poll Code 39934) Presented by Shelley Kriegler President, Center for Mathematics and Teaching shelley@mathandteaching.org Fall 2014 8.F.1 8.G.1a

More information

Dyck paths, standard Young tableaux, and pattern avoiding permutations

Dyck paths, standard Young tableaux, and pattern avoiding permutations PU. M. A. Vol. 21 (2010), No.2, pp. 265 284 Dyck paths, standard Young tableaux, and pattern avoiding permutations Hilmar Haukur Gudmundsson The Mathematics Institute Reykjavik University Iceland e-mail:

More information

CIS 2033 Lecture 6, Spring 2017

CIS 2033 Lecture 6, Spring 2017 CIS 2033 Lecture 6, Spring 2017 Instructor: David Dobor February 2, 2017 In this lecture, we introduce the basic principle of counting, use it to count subsets, permutations, combinations, and partitions,

More information

CMath 55 PROFESSOR KENNETH A. RIBET. Final Examination May 11, :30AM 2:30PM, 100 Lewis Hall

CMath 55 PROFESSOR KENNETH A. RIBET. Final Examination May 11, :30AM 2:30PM, 100 Lewis Hall CMath 55 PROFESSOR KENNETH A. RIBET Final Examination May 11, 015 11:30AM :30PM, 100 Lewis Hall Please put away all books, calculators, cell phones and other devices. You may consult a single two-sided

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Your Name and ID. (a) ( 3 points) Breadth First Search is complete even if zero step-costs are allowed.

Your Name and ID. (a) ( 3 points) Breadth First Search is complete even if zero step-costs are allowed. 1 UC Davis: Winter 2003 ECS 170 Introduction to Artificial Intelligence Final Examination, Open Text Book and Open Class Notes. Answer All questions on the question paper in the spaces provided Show all

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Combining Voice Activity Detection Algorithms by Decision Fusion

Combining Voice Activity Detection Algorithms by Decision Fusion Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland

More information

Advanced Digital Design

Advanced Digital Design Advanced Digital Design Introduction & Motivation by A. Steininger and M. Delvai Vienna University of Technology Outline Challenges in Digital Design The Role of Time in the Design The Fundamental Design

More information

Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of

Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of SETI@home Bahman Javadi 1, Derrick Kondo 1, Jean-Marc Vincent 1,2, David P. Anderson 3 1 Laboratoire

More information

AM Antenna Computer Modeling Course

AM Antenna Computer Modeling Course AM Antenna Computer Modeling Course Course Description The FCC now permits moment method computer modeling of many AM directional arrays as an alternative to traditional cut-and-try adjustments and field

More information

Notes 15: Concatenated Codes, Turbo Codes and Iterative Processing

Notes 15: Concatenated Codes, Turbo Codes and Iterative Processing 16.548 Notes 15: Concatenated Codes, Turbo Codes and Iterative Processing Outline! Introduction " Pushing the Bounds on Channel Capacity " Theory of Iterative Decoding " Recursive Convolutional Coding

More information