Yu Chen Andreas Eisele Martin Kay

Similar documents
Statistical Machine Translation. Machine Translation Phrase-Based Statistical MT. Motivation for Phrase-based SMT

Challenges in Statistical Machine Translation

The revolution of the empiricists. Machine Translation. Motivation for Data-Driven MT. Machine Translation as Search

Findings of the Second Shared Task on Multimodal Translation and Multilingual Image Description

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Natural Language for Visual Reasoning

INIS: the world s largest nuclear information system

Noticias de Eckert Aldine Meadows Road, Houston, Texas BIENVENIDOS ALA TIERRA DE LAS AGUILAS!!! PROXIMOS EVENTOS:

Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences

Attacking Quality Limitations:

Scientific Certification

Capturing and Classifying Ontology Evolution in News Media Archives

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Master Program Computer Science (new > old) Last update / Stand:

Artificial Intelligence Bedrohung oder Lösung. Welche Möglichkeiten bietet sie und welche Grenzen hat diese Technolgieform

Two Bracketing Schemes for the Penn Treebank

Machine Translation - Decoding

Rule Filtering by Pattern for Efficient Hierarchical Translation

Exploring the New Trends of Chinese Tourists in Switzerland

Teddy Mantoro.

Cable drag chain systems MP 72

Controlling vehicle functions with natural body language

Statement on the Separation of Safety I&C and Operational I&C

Chapter 4 Human Evaluation

Teddy Mantoro.

STOA Workshop State of the art Machine Translation - Current challenges and future opportunities 3 December Report

A Neural Attention Model for Abstractive Sentence Summarization

2. After saying yes, sometimes there s no choice but to say no... From the tone you can tell that no means No! nein

Extracting On-Die Terminators

V1 clauses aren't V2 clauses 'in disguise' General view on V2: The finite verb moves to C and some XP moves to its specifier

A Case Study of Machine Translation in Financial Sentiment Analysis

Cooperation and Technological Endowment in International Joint Ventures: German Industrial Firms in China

Fachbereich Sprache, Literatur, Medien Institut für Germanistik Prof. Dr. Jan Christoph Meister

Learn In Your Car: Spanish, The Complete Language Course By uncredited, Henry N. Raymond

Zur Bedeutung von Spielen im Kindesalter (German Edition)

Natural Language for Visual Reasoning

Automaten und Formale Sprachen alias Theoretische Informatik. Sommersemester 2014

Communication & Computation A need for a new unifying theory. Madhu Sudan Microsoft Research + MIT

The challenge of simultaneous speech translation

Leverage always-on voice trigger IP to reach ultra-low power consumption in voicecontrolled

Software-Update Bluetooth hands free FISCON

ASSESSING THE QUALITY OF ONLINE NEWS ARTICLES AS REFERENCES FOR AN ENCYCLOPAEDIA ENTRY

Peter Heinig. August 2007 TUM

Learning for success program

White paper The Quality of Design Documents in Denmark

QUEMDISSE?: Reported speech in Portuguese

Digital Humanities 2009

REAL-TIME GPS ATTITUDE DETERMINATION SYSTEM BASED ON EPOCH-BY-EPOCH TECHNOLOGY

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Behind the Test Challenges of Automotive Radar Systems

Introduction to Markov Models

Entertainment Computing (EC) Topics WS 2018/19

2 Development of multilingual content and systems

Bachelor Thesis. German Title: Untersuchung eines 'ray tracing' Programms auf seine Eignung für die Kurzwellenfunk-Ausbreitungsprognose

Cambridge International Examinations Cambridge Secondary 1 Checkpoint

(51) Int Cl.: H02M 1/32 ( ) H05K 5/02 ( ) H02M 5/45 ( ) H02M 5/458 ( ) H02M 7/00 ( )

Local Language Computing Policy in Korea

Fields of Study at the University of Copenhagen

Grant agreement for: RESEARCH FOR THE BENEFIT OF SMEs

A Comparison of Chinese Parsers for Stanford Dependencies

Large Scale Topic Detection using Node-Cut Partitioning on Dense Weighted-Graphs

Outcast 1 & 2: A Dark Fantasy Novel By M. Keep, J.E. Keep READ ONLINE

Below are four problems which are comparable in organization, complexity and length to the four problems on the upcoming Ling 100 final.

The Walking Dead 21: Krieg (Teil 2) (German Edition) [Kindle Edition] By Robert Kirkman

Unterrichtsmaterialien in digitaler und in gedruckter Form. Auszug aus: Genial! Geschichte 2 - Bilingual: Prehistory

THEORY: NASH EQUILIBRIUM

ILNAS-EN 14136: /2004

Graphism Recognition in Goya s Work

International Nuclear Information System (INIS)

Language, Context and Location

UNTERWERFUNG BY MICHEL HOUELLEBECQ DOWNLOAD EBOOK : UNTERWERFUNG BY MICHEL HOUELLEBECQ PDF

Formal Accountability for Biometric Surveillance: A Case Study

Do hunter-gatherers have illusions?

Can Linguistics Lead a Digital Revolution in the Humanities?

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

Persepolis Unit Guide READ ONLINE

INNOVATION NETWORKS IN THE GERMAN LASER INDUSTRY

Specimen 2018 Morning Time allowed: 1 hour

VERSAPRINT 2 The next generation

DOWNLOAD OR READ : NIKON D7000 KEN ROCKWELL USER GUIDE PDF EBOOK EPUB MOBI

Knowledge Management for Command and Control

Level Crossing Test Methodology. Carla Eickmann, Markus Pelz, Michael Meyer zu Hörste (DLR FS)

ECKMANNPSYCH ECKMANNPSYCH ECKMANNPSYCH OHNO ECKMANNPSYCH THREE OPTICAL SIZES DESIGNED BY JAMES EDMONDSON IN 2018 AND OTTO ECKMANN IN 1900

Schwarz: Der dunkle Turm 1 (German Edition)

Pons German - English / English - German Law Dictionary : Fachworterbuch Recht Englisch - Deutsch / Deutsch - Englisch (English And German Edition)

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Emotion analysis using text mining on social networks

Essential Software Architecture

Music Recommendation using Recurrent Neural Networks

PRE-WRITING TASKS. Writing a note for your host mother

PURELY NEURAL MACHINE TRANSLATION

the transla*on studies guide to disrup*on Dorothy Kenny Dublin City University Ireland

Big data for the analysis of digital economy & society Beyond bibliometrics

Relationship to theory: This activity involves the motion of bodies under constant velocity.

How Can I Practice? $20,000 < SALARY < $50, years. 24 More than Total. i. 12 years of education and makes more than $100,000.

COMPUTATIONAL LINGUISTIC CREATIVITY

Brief Contents PART 1 FRAMEWORK 1

Carding Products and services for the carding process

Interim report. Development of national tools for the codification of occupations according to ISCO 08. Grant agreement No

Content-Based Multimedia Analytics: Rethinking the Speed and Accuracy of Information Retrieval for Threat Detection

Transcription:

LREC 2008: Marrakech, Morocco Department of Computational Linguistics Saarland University May 29, 2008

Outline 1 2 3 4 5

Outline 1 2 3 4 5

SMT architecture To build a phrase-based SMT system: Parallel corpus Moses Toolkit, etc. Parallel Corpus Alignment, Phrase Extraction Counting Smoothing Monolingual Corpus Translation Model Language Model Source Text SMT Decoder Target Text

Problems with Translation Models nicht.. 0.00035137 0.00703986 0.000659631 0.0023873 2.718 nicht. s fault. 0.5 0.0095052 0.000659631 2.87847e-08 2.718 nicht. t. 0.111111 0.418755 0.000659631 0.000876442 2.718 nicht. t do! 1 0.0249022 0.000659631 2.52495e-08 2.718 nicht. t stick. 1 0.418755 0.000659631 2.87473e-08 2.718 nicht., as did 0.0102041 6.20073e-05 0.000659631 3.20962e-08 2.718 nicht., as 3.29272e-05 6.20073e-05 0.000659631 7.5149e-05 2.718 nicht., no. 0.0714286 0.168673 0.000659631 0.00317554 2.718 nicht., they do not. 1 0.288859 0.000659631 4.94651e-07 2.718 nicht., would not. 1 0.70589 0.000659631 0.000160212 2.718 nicht., 4.89461e-06 6.20073e-05 0.00329815 0.0094167 2.718

Problems with Translation Models nicht.. 0.00035137 0.00703986 0.000659631 0.0023873 2.718 nicht. s fault. 0.5 0.0095052 0.000659631 2.87847e-08 2.718 nicht. t. 0.111111 0.418755 0.000659631 0.000876442 2.718 nicht. t do! 1 0.0249022 0.000659631 2.52495e-08 2.718 nicht. t stick. 1 0.418755 0.000659631 2.87473e-08 2.718 nicht., as did 0.0102041 6.20073e-05 0.000659631 3.20962e-08 2.718 nicht., as 3.29272e-05 6.20073e-05 0.000659631 7.5149e-05 2.718 nicht., no. 0.0714286 0.168673 0.000659631 0.00317554 2.718 nicht., they do not. 1 0.288859 0.000659631 4.94651e-07 2.718 nicht., would not. 1 0.70589 0.000659631 0.000160212 2.718 nicht., 4.89461e-06 6.20073e-05 0.00329815 0.0094167 2.718 Mit The Hunting Party und Mörderischer Frieden beschäftigen sich wieder zwei Filme mit dem Balkankrieg. Doch beide überzeugen nicht. The Hunting Party and Mörderischer Frieden make two more films dealing with the Balkan War. But neither of them is convincing.

Problems with Translation Models nicht.. 0.00035137 0.00703986 0.000659631 0.0023873 2.718 nicht. s fault. 0.5 0.0095052 0.000659631 2.87847e-08 2.718 nicht. t. 0.111111 0.418755 0.000659631 0.000876442 2.718 nicht. t do! 1 0.0249022 0.000659631 2.52495e-08 2.718 nicht. t stick. 1 0.418755 0.000659631 2.87473e-08 2.718 nicht., as did 0.0102041 6.20073e-05 0.000659631 3.20962e-08 2.718 nicht., as 3.29272e-05 6.20073e-05 0.000659631 7.5149e-05 2.718 nicht., no. 0.0714286 0.168673 0.000659631 0.00317554 2.718 nicht., they do not. 1 0.288859 0.000659631 4.94651e-07 2.718 nicht., would not. 1 0.70589 0.000659631 0.000160212 2.718 nicht., 4.89461e-06 6.20073e-05 0.00329815 0.0094167 2.718 Mit The Hunting Party und Mörderischer Frieden beschäftigen sich wieder zwei Filme mit dem Balkankrieg. Doch beide überzeugen nicht. The Hunting Party and Mörderischer Frieden make two more films dealing with the Balkan War. But neither of them is convincing. The Hunting Party and Mörderischer peace deal another two films with the Balkans war. But both cases.

Outline 1 2 3 4 5

What s triangulation? In the social sciences the use of multiple cross-checked sources and methodology In qualitative research combining methods for more accurate and credible research

What s triangulation? In the social sciences the use of multiple cross-checked sources and methodology In qualitative research combining methods for more accurate and credible research In the context of machine translation making use of resources in languages other than the two involving in translation

Is triangulation possible in MT? Vauquois... Interlingua semantic structure syntactic structure words bridge text source text target text

Is triangulation possible in MT? Vauquois... Interlingua semantic structure syntactic structure words bridge text source text target text

Is triangulation possible in MT? Vauquois... Interlingua semantic structure syntactic structure words bridge text source text target text

Is triangulation possible in MT? Vauquois... Interlingua semantic structure syntactic structure words bridge text source text target text

in Machine Translation Need for lack of resources for the direct language pairs rich resources for frequent languages difficult language pairs Advantages solve ambiguity help with word orders increase lexical coverage

in Machine Translation Need for lack of resources for the direct language pairs rich resources for frequent languages difficult language pairs Advantages solve ambiguity help with word orders increase lexical coverage Existing approaches Sentence alignment [Simard, 1999] Word alignment [Kumar et al., 2007] Translation model [Cohn and Lapata, 2007] Hypothesis reranking [Och and Ney, 2001]

Outline 1 2 3 4 5

General description Motivation Phrase tables contain a lot of noise Size of a phrase table is critical for decoder Data in a third language convey extra information Use the extra information to filter a phrase table phrases in the 3rd language as linking evidence only keep the most probable phrase pairs

Procedure Additional translation models 1 From source language to bridge language 2 From target language to bridge language Examine the phrase table by entry for each phrase pair: search both additional tables for a common link in the third language corresponding to both source phrase and target phrase in the pair Method 1 exact phrase matching Method 2 word overlap remove the entry when such a link does not exist keep the probabilities use the reduced table as the original

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers. auch automobilhersteller, auch automobilhersteller autohersteller autoherstellern automobilhersteller automobilherstellern bedeutende hersteller computerhersteller damit ihren schnitt damit ihren damit.

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers. auch automobilhersteller, auch automobilhersteller autohersteller autoherstellern automobilhersteller automobilherstellern bedeutende hersteller computerhersteller damit ihren schnitt damit ihren damit. die fahrzeughersteller haben

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers. auch automobilhersteller, auch automobilhersteller autohersteller autoherstellern automobilhersteller automobilherstellern bedeutende hersteller computerhersteller damit ihren schnitt damit ihren damit. die fahrzeughersteller haben

Method 1: exact match Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Method 2: word overlap Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Method 2: word overlap Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Method 2: word overlap Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers automobilhersteller, hersteller, herstellern,arzneimittelhersteller, arzneimittelproduzenten,auch, automobilhersteller,autohersteller, autoherstellern, vor,gehen, haben, ist, verknüpfen, verknüpfen, zu, bedeutende, den,computerhersteller, und den,hersteller, durch, ein, einem

Method 2: word overlap Example fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers automobilhersteller, hersteller, herstellern,arzneimittelhersteller, arzneimittelproduzenten,auch, automobilhersteller,autohersteller, autoherstellern, vor,gehen, haben, ist, verknüpfen, verknüpfen, zu, bedeutende, den,computerhersteller, und den,hersteller, durch, ein, einem

Method 2: word overlap Example W S F (s) W F E (e) min(w S F (s),w F E (e)) fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers automobilhersteller, hersteller, herstellern,arzneimittelhersteller, arzneimittelproduzenten,auch, automobilhersteller,autohersteller, autoherstellern, vor,gehen, haben, ist, verknüpfen, verknüpfen, zu, bedeutende, den,computerhersteller, und 2 5 = 0.4 den,hersteller, durch, ein, einem

Method 2: word overlap Example W S F (s) W F E (e) min(w S F (s),w F E (e)) fabricantes, manufacturers fabricantes a manufacturer fabricantes battalions 2 5 = 0.4 fabricantes car manufacturers have fabricantes car manufacturers fabricantes makers fabricantes manufacturer fabricantes manufacturers fabricantes producers are fabricantes producers need fabricantes producers fabricantes suppliers

Outline 1 2 3 4 5

Experiment Setup Language pair: Spanish-English Bridge languages: German, French Training data: Europarl subsets Max. Sent. Len. Sent. Num. 40 950,000 50 1,100,000 Testset: Europarl testset from WMT 2008 Baselines: built with Moses, MERTed for BLEU Filtering method: 1 & 2 + baseline weights Evaluation: Sizes of the phrase tables Translation quality

Results Size of filtered phrase-tables Model Entries PT(Byte) RT(Byte) Removed Europarl-40 19M 2.5G 1.9G 1:French 8M 1.1G 741M 55.21% 2:French 15M 1.9G 1.3G 23.52% 1:German 6M 725M 492M 69.16% 2:German 14M 1.8G 1.2G 29.16% Europarl-50 54M 7.1G 5.4G 1:French 24M 3.0G 2.3G 55.77% 2:French 42M 5.5G 4.2G 24.10% 1:German 16M 1.9G 1.5G 70.70% 2:German 38M 5.0G 3.8G 30.42%

Results Size of filtered phrase-tables Model Entries PT(Byte) RT(Byte) Removed Europarl-40 19M 2.5G 1.9G 1:French 8M 1.1G 741M 55.21% 2:French 15M 1.9G 1.3G 23.52% 1:German 6M 725M 492M 69.16% 2:German 14M 1.8G 1.2G 29.16% Europarl-50 54M 7.1G 5.4G 1:French 24M 3.0G 2.3G 55.77% 2:French 42M 5.5G 4.2G 24.10% 1:German 16M 1.9G 1.5G 70.70% 2:German 38M 5.0G 3.8G 30.42%

Results Translation Quality (BLEU) Method 1 None French German Europarl-40 31.43 28.27 31.58 Europarl-50 31.65 31.73 31.92 Method 2 None French German Europarl-40 31.43 28.20 31.38 Europarl-50 31.65 31.69 31.75

Results Translation Quality (BLEU) Method 1 None French German Europarl-40 31.43 28.27 31.58 Europarl-50 31.65 31.73 31.92 Method 2 None French German Europarl-40 31.43 28.20 31.38 Europarl-50 31.65 31.69 31.75

Example src ref Baseline 1:French 2:French 1:German 2:German Como ha señalado el Sr. de Soto, no esperamos que el progreso sea tarea fácil, y el éxito del proceso de las Naciones Unidas no está ni mucho menos garantizado. As Mr de Soto noted, we do not expect progress to be easy, and the success of the UN process is far from assured. As has been pointed out by Mr de Soto, we hope that progress is not an easy task, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed.

Example src ref Baseline 1:French 2:French 1:German 2:German Como ha señalado el Sr. de Soto, no esperamos que el progreso sea tarea fácil, y el éxito del proceso de las Naciones Unidas no está ni mucho menos garantizado. As Mr de Soto noted, we do not expect progress to be easy, and the success of the UN process is far from assured. As has been pointed out by Mr de Soto, we hope that progress is not an easy task, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed. As Mr de Soto, we do not expect that progress is easy, and the success of the UN process is far from guaranteed.

Outline 1 2 3 4 5

Summary Summary More possibility of triangulation Filtering reduces the size of phrase tables Filtering preserves the translation quality The approaches work better for larger models Different bridge languages have different effect Future Work More thorough experiments Integration with other triangulation approaches

Thank you! Any questions?

References I Cohn, T. and Lapata, M. (2007). Machine Translation by : Making Effective Use of Multi-Parallel Corpora. In the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech. Kumar, S., Och, F. J., and Macherey, W. (2007). Improving word alignment with bridge languages. In the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pages 42 50, Prague, Czech. Och, F. J. and Ney, H. (2001). Statistical multi-source translation. In MT Summit VIII, Santiago de Compostela, Spain. Simard, M. (1999). Text-translation alignment: Three languages are better than two. In Proceedings of EMNLP/VLC-99, College Park, MD.