A Comparison of Chinese Parsers for Stanford Dependencies

Similar documents
Two Bracketing Schemes for the Penn Treebank

Natural Language Processing: An Introduction

Robust Conversion of CCG Derivations to Phrase Structure Trees

Midterm for Name: Good luck! Midterm page 1 of 9

Treebanks. LING 5200 Computational Corpus Linguistics Nianwen Xue

CUDA-Accelerated Satellite Communication Demodulation

Statistical Parsing and CKY Algorithm

Challenges in Statistical Machine Translation

NLP, Games, and Robotic Cars

Outline. Grammar Formalisms Combinatorial Categorial Grammar (CCG) What is CCG? In a nutshell

CURRENT SITUATION OF FEMALE EMPOLYMENT IN CHINA

Building a Business Knowledge Base by a Supervised Learning and Rule-Based Method

Application Areas of AI Artificial intelligence is divided into different branches which are mentioned below:

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Textual Characteristics based High Quality Online Reviews Evaluation and Detection

Optimization of On-line Appointment Scheduling

Dependency-based Convolutional Neural Networks for Sentence Embedding

Shuhua Liu Senior Research Fellow, Docent Arcada Universitty of Applied Sciences. KaTuMetro Kickoff Seminar, University of Helsinki

Tan-Hsu Tan Dept. of Electrical Engineering National Taipei University of Technology Taipei, Taiwan (ROC)

Extracting and Visualising Biographical Events from Wikipedia

CS 343: Artificial Intelligence

ANAPHORA RESOLUTION FOR PRACTICAL TASKS

Deep Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Script Visualization (ScriptViz): a smart system that makes writing fun

Measuring the performance of Knowledge Transfer from Universities to Industry in China. ZHONG Wei Renmin Univ

The Enriched TreeTagger System

Neural Architectures for Named Entity Recognition

Automated Generation of Timestamped Patent Abstracts at Scale to Outsmart Patent-Trolls

Connected Car Networking

Ecological Characteristics of Information and Its Scientific Research 1

The Role of Communication Technologies in Connected and Automated Vehicles

1 ST BELT ROAD INITIATIVE SUMMIT PROVISIONAL PROGRAMME. One Belt One Road Programme, University of Oxford Day 2: 14 th September 2017

Neuro-Fuzzy and Soft Computing: Fuzzy Sets. Chapter 1 of Neuro-Fuzzy and Soft Computing by Jang, Sun and Mizutani

Block Permutations in Boolean Space to Minimize TCAM for Packet Classification

Exploring the Political Agenda of the Greek Parliament Plenary Sessions

Extracting Actionable Findings of Appendicitis from Radiology Reports Using Natural Language Processing

Introduction. Description of the Project. Debopam Das

Speech Processing. Simon King University of Edinburgh. additional lecture slides for

University of Szeged (An introduction)

Enhancing the societal value of Research Infrastructures Three Face to of Huairou National Science Center

Region-wide Microsimulation-based DTA: Context, Approach, and Implementation for NFTPO

Knowledge-based Collaborative Design Method

Simple Large-scale Relation Extraction from Unstructured Text

Gameplay as On-Line Mediation Search

A Signal Integrity Measuring Methodology in the Extraction of Wide Bandwidth Environmental Coefficients

Analysis of Competition in Chinese Automobile Industry based on an Opinion and Sentiment Mining System

Advanced Functional Programming in Industry

The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition

THE SECRET HISTORY OF THE TOTAL WARSERIES

A Measuring Method about the Bus Insulation Resistance of Power Battery Pack

Soar Agents in Government Applications

An improved strategy for solving Sudoku by sparse optimization methods

Opinion Mining and Emotional Intelligence: Techniques and Methodology

More Semantics. Image removed for copyright reasons.

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan

NLP Researcher: Snigdha Chaturvedi. Xingya Zhao, 12/5/2017

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

CSE 255 Assignment 1: Helpfulness in Amazon Reviews

Mining Social Data to Extract Intellectual Knowledge

Classification Experiments for Number Plate Recognition Data Set Using Weka

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

The Study and Implementation of Agricultural Information Service System Based on Addressable Broadcast

Adaptation of Sentiment Analysis to New Linguistic Features, Informal Language Form and World Knowledge

Lecture 4: n-grams in NLP. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Comparison of Simulation-Based Dynamic Traffic Assignment Approaches for Planning and Operations Management

Exploring the New Trends of Chinese Tourists in Switzerland

Mobile Virtual Reality what is that and how it works? Alexey Rybakov, Senior Engineer, Technical Evangelist at DataArt

Social Media Sentiment Analysis using Machine Learning Classifiers

Media Kit GLOBAL PERSPECTIVE local opportunities

A Multilingual Personal Name Treebank to Assist Genealogical Name Processing

Implementation of Text to Speech Conversion

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Relation Extraction, Neural Network, and Matrix Factorization

The Color Application of the Representative Pop Art in Modern Design Illustrated by the Case of MAOS Design i

Can Innovations be Educated in Agricultural Universities: Evidence from Venture Capital Backed Entrepreneurial Firms in China 大学之创新教育与中国农业创投

Below are four problems which are comparable in organization, complexity and length to the four problems on the upcoming Ling 100 final.

Concept hierarchies and Credibility

A virtual On Board Control Unit for system tests

PARTNERSHIP FOR INVESTMENT AND GROWTH IN AFRICA (PIGA)

License Plate Localisation based on Morphological Operations

Ben Baker. Sponsored by:

Global Journal of Engineering Science and Research Management

INVESTMENT PROMOTION AGENCY MINISTRY OF COMMERCE OF THE PEOPLE S REPUBLIC OF CHINA 28, ANDINGMENWAI DONGHOUXIANG, DONGCHENG DISTRICT, BEIJING, P.

Image Analysis ECSS projects update

Team Description Paper 2017

Detection of License Plates of Vehicles

User Goal Change Model for Spoken Dialog State Tracking

Ordinal Common-sense Inference

Chapter 8 Expanding abroad: from emerging markets

Effect of Antenna Placement and Diversity on Vehicular Network Communications

Institute of Information Systems Hof University

WPF CHARTS PERFORMANCE BENCHMARK Page 1 / 16. February 18, 2013

Can Linguistics Lead a Digital Revolution in the Humanities?

Introduction to cognitive science Session 3: Cognitivism

Abstract. Most OCR systems decompose the process into several stages:

Automatic Relation Extraction for Building Smart City Ecosystems using Dependency Parsing

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

NLP course project Automatic headline generation. ETH Spring Semester 2014

RECOMENDACIÓN DE VIDEOJUEGOS BASADO EN ANÁLISIS SEMÁNTICO Y MINERÍA DE OPINIÓN DANIEL YELAMOS TUTOR: ALEJANDRO BELLOGIN PONENTE: PABLO CASTELLS

Transcription:

A Comparison of Chinese Parsers for Stanford Dependencies Wanxiang Che, Valentin I. Spitkovsky and Ting Liu Harbin Institute of Technology Stanford University ACL 2012 July 11, 2012 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 1 / 19

Outline Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 2 / 19

Introduction Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 3 / 19

Introduction Stanford Dependencies A simple description of relations between pairs of words in a sentence A kind of semantically-oriented dependency representation Converted from constituent trees by rules 53 binary relations for English, 46 for Chinese Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 4 / 19

Introduction Stanford Dependencies A simple description of relations between pairs of words in a sentence A kind of semantically-oriented dependency representation Converted from constituent trees by rules 53 binary relations for English, 46 for Chinese root nsubj dobj det rcmod -Root- I saw the man who loves you ROOT SUB VMOD NMOD nsubj SUB dobj VMOD CLF Figure: Stanford dependencies (above) vs. CoNLL style (below) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 4 / 19

Introduction Stanford Dependencies Applications Intuitive and easy to apply, requires little linguistic expertise Biomedical text mining (Kim et al., 2009) Textual entailment (Androutsopoulos and Malakasiotis, 2010) Information extraction (Wu and Weld, 2010; Banko et al., 2007) Sentiment analysis (Meena and Prabhakar, 2007; Wu et al., 2011) root nsubj dobj det rcmod -Root- I saw the man who loves you ROOT SUB VMOD NMOD nsubj SUB dobj VMOD CLF Figure: Stanford dependencies (above) vs. CoNLL style (below) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 5 / 19

Introduction Parsing Methods Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) Sentence Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) IP NP VP NR VV NP IP 中国 鼓励 ADJP NP VP JJ NN VV NP Sentence 民营企业家投资 NN NN NN 国家基础建设 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) IP NP VP NR VV NP IP 中国鼓励 ADJP NP VP Sentence JJ NN VV NP 民营企业家投资 NN NN NN 国家基础建设 nsubj root dobj dep amod 中国鼓励民营企业家投资国家基础建设 China encourages private entrepreneurs invest national infrastructure construction dobj nn nn Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) IP NP VP NR VV NP IP Sentence 中国 鼓励 ADJP JJ 民营 NP NN 企业家 VV 投资 VP NN 国家 NP NN 基础 NN 建设 nsubj root dobj dep amod 中国鼓励民营企业家投资国家基础建设 China encourages private entrepreneurs invest national infrastructure construction Stanford dependency parser s original implementation dobj nn nn Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Parsing Methods Constituent Parsing (indirect) IP NP VP NR VV NP IP Sentence 中国 鼓励 ADJP JJ 民营 NP NN 企业家 VV 投资 VP NN 国家 NP NN 基础 NN 建设 nsubj root dobj dep amod 中国鼓励民营企业家投资国家基础建设 China encourages private entrepreneurs invest national infrastructure construction Stanford dependency parser s original implementation dobj nn nn Dependency Parsing (direct) Sentence nsubj root dobj dep amod 中国鼓励民营企业家投资国家基础建设 China encourages private entrepreneurs invest national infrastructure construction dobj nn nn Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 6 / 19

Introduction Motivation Which method is better for Chinese Stanford Dependencies? Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 7 / 19

Introduction Motivation Which method is better for Chinese Stanford Dependencies? Comparison for English (Cer et al., 2010) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 7 / 19

Introduction Motivation Which method is better for Chinese Stanford Dependencies? Comparison for English (Cer et al., 2010) Constituent parsers systematically outperform direct methods Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 7 / 19

Introduction Motivation Which method is better for Chinese Stanford Dependencies? Comparison for English (Cer et al., 2010) Constituent parsers systematically outperform direct methods Did not explore more sophisticated (higher-order) dependency parsers Did not explore more consistent (n-way jackknifing of) POS tags Small bug in evaluation of MSTParser Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 7 / 19

Methodology Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 8 / 19

Methodology Open Source Parsers Parsers Information Open Source Parsers Type Parser Version Algorithm Constituent Berkeley 1.1 PCFG Bikel 1.2 PCFG Charniak Nov. 2009 PCFG Stanford 2.0 Factored Dependency MaltParser 1.6.1 Arc-Eager Mate 2.0 2nd-order MST MSTParser 0.5 MST Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 9 / 19

Methodology Settings Settings Corpus Latest Chinese TreeBank (CTB) 7.0 Number of \in Train Dev Test Total files 2,083 160 205 2,448 sentences 46,572 2,079 2,796 51,447 tokens 1,039,942 59,955 81,578 1,181,475 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 10 / 19

Methodology Settings Settings Corpus Latest Chinese TreeBank (CTB) 7.0 Number of \in Train Dev Test Total files 2,083 160 205 2,448 sentences 46,572 2,079 2,796 51,447 tokens 1,039,942 59,955 81,578 1,181,475 Software and Hardware Parsers: all default options Hardware: Intel s Xeon E5620 2.40GHz CPU and 24GB RAM Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 10 / 19

Methodology Features for Dependency Parsers Features for Dependency Parsers POS tags Stanford POS tagger Automatic tags for training data (via 10-way jackknifing) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 11 / 19

Methodology Features for Dependency Parsers Features for Dependency Parsers POS tags Stanford POS tagger Automatic tags for training data (via 10-way jackknifing) Lemmas The last character of each Chinese word E.g., bicycle ( 自行车 ), car ( 汽车 ) and train ( 火车 ) are all various kinds of vehicle ( 车 ) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 11 / 19

Results Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 12 / 19

Results Chinese Results Dev Test Type Parser UAS LAS UAS LAS Time Constituent Berkeley 82.0 77.0 82.9 77.8 45:56 Bikel 79.4 74.1 80.0 74.3 6,861:31 Charniak 77.8 71.7 78.3 72.3 128:04 Stanford 76.9 71.2 77.3 71.4 330:50 Dependency MaltParser (liblinear) 76.0 71.2 76.3 71.2 0:11 MaltParser (libsvm) 77.3 72.7 78.0 73.1 556:51 Mate (2nd-order) 82.8 78.2 83.1 78.1 87:19 MSTParser (1st-order) 78.8 73.4 78.9 73.1 12:17 Bold: best results. Dark Red: worst results. Blue: best results of constituent parsers. Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 13 / 19

Analysis Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 14 / 19

Analysis Comparison between Mate and Berkeley parsers Mate is slightly better than Berkeley (but not significantly, p > 0.05) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 15 / 19

Analysis Comparison between Mate and Berkeley parsers Mate is slightly better than Berkeley (but not significantly, p > 0.05) Performance (F 1 ) comparison on different relations Relation Count Mate Berkeley nn 7,783 91.3 89.3 dep 4,651 69.4 70.3 nsubj 4,531 87.1 85.5 advmod 4,028 94.3 93.8 dobj 3,990 86.0 85.0 conj 2,159 76.0 75.8 prep 2,091 94.3 94.1 root 2,079 81.2 82.3 nummod 1,614 97.4 96.7 assmod 1,593 86.3 84.1 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 15 / 19

Analysis More Analysis Feature Effect 10-way jackknifing POS tags for training data Gold Jackknifing Mate 75.4 78.2 Berkeley 77.0 76.5 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 16 / 19

Analysis More Analysis Feature Effect 10-way jackknifing POS tags for training data Gold Jackknifing Mate 75.4 78.2 Berkeley 77.0 76.5 Lemmas for Mate 77.8 (w/o) vs. 78.2 (with) Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 16 / 19

Analysis More Analysis Feature Effect 10-way jackknifing POS tags for training data Gold Jackknifing Mate 75.4 78.2 Berkeley 77.0 76.5 Lemmas for Mate 77.8 (w/o) vs. 78.2 (with) English vs. Chinese Chinese English Berkeley 77.0 87.9 Charniak 71.7 87.8 CJ (Charniak + Reranking) 89.1 Mate 78.2 88.6 Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 16 / 19

Conclusion Outline 1 Introduction 2 Methodology 3 Results 4 Analysis 5 Conclusion Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 17 / 19

Conclusion Conclusion For Chinese, direct approach comparable to using constituents Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 18 / 19

Conclusion Conclusion For Chinese, direct approach comparable to using constituents Which parser to use in practice? Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 18 / 19

Conclusion Conclusion For Chinese, direct approach comparable to using constituents Which parser to use in practice? Most accurate: Mate parser Fastest: MaltParser (liblinear) Trade-off: Berkeley parser Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 18 / 19

Conclusion Conclusion For Chinese, direct approach comparable to using constituents Which parser to use in practice? Most accurate: Mate parser Fastest: MaltParser (liblinear) Trade-off: Berkeley parser We prefer dependency parsers which more easily admit richer features Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 18 / 19

Conclusion Conclusion For Chinese, direct approach comparable to using constituents Which parser to use in practice? Most accurate: Mate parser Fastest: MaltParser (liblinear) Trade-off: Berkeley parser We prefer dependency parsers which more easily admit richer features n-way jackknifing of POS tags and lemma features can help Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 18 / 19

Conclusion Thanks and QA Che, Spitkovsky, and Liu (HIT, Stanford) Comparison of Chinese Parsers July 11, 2012 19 / 19