Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

Similar documents
Learning to rank search results

Search results fusion

Information Retrieval Evaluation

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

3D-Assisted Image Feature Synthesis for Novel Views of an Object

Contents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements

Lecture 3 - Regression

Inverted Indexes: Alternative Queries

The Game-Theoretic Approach to Machine Learning and Adaptation

Outcome Forecasting in Sports. Ondřej Hubáček

Radio Deep Learning Efforts Showcase Presentation

Move Evaluation Tree System

MEASURING PRIVACY RISK IN ONLINE SOCIAL NETWORKS. Justin Becker, Hao Chen UC Davis May 2009

CLASSLESS ASSOCIATION USING NEURAL NETWORKS

Sentiment Analysis of User-Generated Contents for Pharmaceutical Product Safety

RESTRICTED PERMUTATIONS AND POLYGONS. Ghassan Firro and Toufik Mansour Department of Mathematics, University of Haifa, Haifa, Israel

Game Theory and Algorithms Lecture 19: Nim & Impartial Combinatorial Games

Kernels and Support Vector Machines

SMILe: Shuffled Multiple-Instance Learning

Olympiad Combinatorics. Pranav A. Sriram

Name that sculpture. Relja Arandjelovid and Andrew Zisserman. Visual Geometry Group Department of Engineering Science University of Oxford

Greedy Algorithms and Genome Rearrangements

Huffman-Compressed Wavelet Trees for Large Alphabets

Permutations and codes:

Music Recommendation using Recurrent Neural Networks

Common Core Math Tutorial and Practice

Heuristic Search with Pre-Computed Databases

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

Multivariate Permutation Tests: With Applications in Biostatistics

CCO Commun. Comb. Optim.

Lecture 4: Spatial Domain Processing and Image Enhancement

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Global Journal of Engineering Science and Research Management

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 6 Lecture - 37 Divide and Conquer: Counting Inversions

Confidence-Based Multi-Robot Learning from Demonstration

Lossy Compression of Permutations

Lecture 3 Presentations and more Great Groups

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Ma/CS 6a Class 16: Permutations

17. Symmetries. Thus, the example above corresponds to the matrix: We shall now look at how permutations relate to trees.

Large Scale Topic Detection using Node-Cut Partitioning on Dense Weighted-Graphs

The Effects of Entrainment in a Tutoring Dialogue System. Huy Nguyen, Jesse Thomason CS 3710 University of Pittsburgh

Predicting the movie popularity using user-identified tropes

CRF and Structured Perceptron

An improvement to the Gilbert-Varshamov bound for permutation codes

IBM SPSS Neural Networks

Permutation Groups. Every permutation can be written as a product of disjoint cycles. This factorization is unique up to the order of the factors.

Chpt 2. Frequency Distributions and Graphs. 2-3 Histograms, Frequency Polygons, Ogives / 35

FOURTEEN SPECIES OF SKEW HEXAGONS

Exercises to Chapter 2 solutions

Lecture 2.3: Symmetric and alternating groups

5.1 State-Space Search Problems

Bead Sort: A Natural Sorting Algorithm

Application of Multi Layer Perceptron (MLP) for Shower Size Prediction

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

The Art of Neural Nets

Sampling distributions and the Central Limit Theorem

COMP219: Artificial Intelligence. Lecture 17: Semantic Networks

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

LECTURE 19 - LAGRANGE MULTIPLIERS

GPU ACCELERATED DEEP LEARNING WITH CUDNN

arxiv: v2 [eess.sp] 10 Sep 2018

Some results on Su Doku

Know how to represent permutations in the two rowed notation, and how to multiply permutations using this notation.

Empirical Assessment of Classification Accuracy of Local SVM

COMP219: Artificial Intelligence. Lecture 17: Semantic Networks

Algorithms for Bioinformatics

MATH 259 FINAL EXAM. Friday, May 8, Alexandra Oleksii Reshma Stephen William Klimova Mostovyi Ramadurai Russel Boney A C D G H B F E

Dyck paths, standard Young tableaux, and pattern avoiding permutations

X = {1, 2,...,n} n 1f 2f 3f... nf

N. Papadakis, N. Reynolds, C.Ramirez-Jimenez, M.Pharaoh

arxiv: v1 [math.co] 16 Aug 2018

The Originative Statistical Regression Models: Are They Too Old and Untenable? To Fit or Not to Fit Data to a Model: That is the Question.

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

DECISION TREE TUTORIAL

Permutation graphs an introduction

Applications of Music Processing

Finding the Best Panoramas

Human Reconstruction of Digitized Graphical Signals

AVA: A Large-Scale Database for Aesthetic Visual Analysis

Enumeration of Two Particular Sets of Minimal Permutations

Permutation Tableaux and the Dashed Permutation Pattern 32 1

Bayesian Positioning in Wireless Networks using Angle of Arrival

Pattern Avoidance in Unimodal and V-unimodal Permutations

Spectral Transition-Based Playlist Prediction

Research Challenges in Forecasting Technical Emergence. Dewey Murdick, IARPA 25 September 2013

A Survey of Automated Hierarchical Classification of Patents

SSB Debate: Model-based Inference vs. Machine Learning

Superpatterns and Universal Point Sets

Structure and Synthesis of Robot Motion

Deep Learning for Autonomous Driving

Document Content-Based Search Using Topic Modeling

SCHEDULING Giovanni De Micheli Stanford University

Mechanism Design without Money II: House Allocation, Kidney Exchange, Stable Matching

A GRAPH THEORETICAL APPROACH TO SOLVING SCRAMBLE SQUARES PUZZLES. 1. Introduction

Practice problems from old exams for math 233

MA/CSSE 473 Day 13. Student Questions. Permutation Generation. HW 6 due Monday, HW 7 next Thursday, Tuesday s exam. Permutation generation

Searching Optimal Movements in Multi-Player Games with Imperfect Information

SketchNet: Sketch Classification with Web Images[CVPR `16]

Transcription:

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007) Qin Huazheng 2014/10/15

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

Outline Motivation Framework Experiments

Motivation Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms.

Motivation Semantically speaking, word order and word dependence do matter. Mary is quicker than John and John is quicker than Mary are clearly different phrases. Yet, their bag-of-word representation is the same. Nevertheless, the impact of the term order has been a popular issue and relationships between the terms in general is claimed to play an important role in text processing. This motivated us to find a representation that would capture these relationships while being as efficient as the traditional one at query time.

Outline Motivation Framework Experiments

Graph-of-word Framework We represent a textual document (typically a Web page)as a graph-of-word that corresponds to an unweighted directed graph whose vertices represent unique terms, whose edges represent co-occurrences between the terms within a fixed-size sliding window and whose edge direction represents term order.

Graph-of-word Example: Framework

Graph-of-word Example: Framework

TF-IDF and TW-IDF TF-IDF( ) Framework TW-IDF where tw(t, d) is the weight of the vertex associated with the term t in the graph-of-word representation of the document d. In the experiments of section 6, the weight is the indegree.

Outline Motivation Framework Experiments

Experiments

Experiments

Thank you

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

Outline What is Learning to Rank? (L2R) Three Approaches of L2R The Listwise Approach: ListNet

What is learning to rank? Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way which is "similar" to rankings in the training data in some sense.

What is learning to rank? Learning to Rank Framework

Outline What is Learning to Rank? (L2R) Three Approaches of L2R The Listwise Approach: ListNet

Three approaches of L2R Pointwise Approach In this case it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then learning-to-rank problem can be approximated by a regression problem given a single query-document pair, predict its score. This approach does not consider the relationship between the documents.

Three approaches of L2R Pairwise Approach In this case learning-to-rank problem is approximated by a classification problem learning a binary classifier that can tell which document is better in a given pair of documents. The goal is to minimize average number of inversions in ranking.

Three approaches of L2R Pairwise Approach The objective of learning is formalized as minimizing errors in classification of document pairs, rather than minimizing errors in ranking of documents. The assumption of that the document pairs are generated i.i.d. is also too. The number of generated document pairs varies largely from query to query, which will result in training a model biased toward queries with more document pairs

Three approaches of L2R Listwise Approach In the listwise approach, the document lists instead of document pairs are used as instances in learning.

Outline What is Learning to Rank? (L2R) Three Approaches of L2R The Listwise Approach: ListNet

The major question then is how to define a listwise loss function, rep resenting the difference between the ranking list output by a ranking model and the ranking list given as ground truth. T i = (d 1, s 1, d 2, s 2, d 3, s 3, d 4, s 4, d 5, s 5 } O i = (d 2, s 2, d 1, s 1, d 3, s 3, d 5, s 5, d 4, s 4 }

Notations A set of queries: Each query is associated with a list of documents: denotes the sizes of Each list of documents is associated with a list of judgments (scores):

Notations A feature vector: The list of features: The corresponding list of scores: The training set:

Notations Ranking function: The scores of a list of feature vectors: The total losses with respect to the training data: Where is a listwise loss function.

The Loss Function How about defining the loss function as MAP or ndcg? Namely: Or: = 1 MAP(y i, z (i) ) = 1 ndcg(y i, z (i) )

MAP(Mean Average Precision) AP: P(i) is the accepted proportion of top-i items and M is the total number of accepted items in all topm items.

MAP(Mean Average Precision) AP Example: z (i) y i A A C B D B C D AP = 1 1 + 1 2 0 + 1 3 0 + 1 2 1 + 3 5 1 3 E E

MAP(Mean Average Precision) MAP: MAP = 1 n n i=1 AP(y i, z (i) ) z (1) y 1 z (n) y n

ndcg(normailzed Discounted Cumulative Gain ) Z = m i=1 2 1 1 log 2 (i + 1) A C B D E z (i) A B C D E y i ndcg = 1 Z 1 log 2 (1+1) * 0 log 2 (2+1) 0 log 2 3+1 1 log 2 4+1 1 log 2 (5+1) Z = 1 * 1 1 1 1 log 2 (1+1) log 2 (2+1) log 2 3+1 log 2 4+1 log 2 (5+1)

The Loss Function Why not to optimize the following: n i=1 L(y i, z (i) ) n = [1 MAP y i, z i ] i=1 n i=1 L(y i, z (i) ) n = [1 ndcg y i, z i ] i=1 They are not continuous! Many optimization methods (e.g. Gradient Descent, Lagrange) can not be applied.

Probability Models We propose using two probability models to calculate the listwise loss function. We map a list of scores to a probability distribution using one of the two probability models and then take any metric between probability distributions as a loss function. The two models are referred to as permutation probability and top k probability.

Probability Models map distribution1 map distribution2 The difference between two distributions

Permutation Probability permutation: We use s to denote the list of scores: where is the score of the j-th object.

Permutation Probability The probability of permutation π given the list of scores s is defined as: where denotes the score of object at position j of permutation π.

Permutation Probability

Permutation Probability Theorem 3 indicates that, for a permutation in which an object with a larger score is ranked ahead of another object with a smaller score, if we exchange their positions, the permutation probability of the resulting permutation will be lower than that of the original permutation.

Permutation Probability Theorem 4 indicates given the scores of n objects, the list of objects sorted in descending order of the scores has the highest permutation probability, while the list of objects sorted in ascending order has the lowest permutation probability.

Permutation Probability

Permutation Probability Given two lists of scores, we can first calculate the two corresponding permutation probability distributions, and then take the metric between the two distributions as the listwise loss function. Since the number of permutations is of order O(n!), the calculation might be intractable in practice. To cope with the problem, we consider the use of top k probability.

Top-k Probability Before giving the definition of top k probability, we first define the top k subgroup: A n k

Top-k Probability

Top-k Probability

The Listwise Loss Function Given two lists of scores, we can define the metric between the corresponding top k probability distributions as the listwise loss function. For example, when we use Cross Entropy as metric, the listwise loss function is:

Learning Method: ListNet We propose a new learning method for optimizing the listwise loss function based on top k probability, with Neural Network as model and Gradient Descent as optimization algorithm. We refer to the method as ListNet.

Learning Method: ListNet

Learning Method: ListNet

Learning Method: ListNet

Learning Method: ListNet

Learning Method: ListNet

Experimental Results Data Collections: TREC, OHSUMED, Csearch Ranking Accuracy:

Experimental Results ndcg

Experimental Results The pairwise loss: Similar trends were observed on the results evaluated in terms of MAP.

Thank you