Multi-Example Search in Rich Information Graphs Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis ml@disi.unitn.eu- ICDE 2018 Paris
https://unsplash.com/photos/ypcy9hep6v8 What are you looking for? 2 ICDE 2018 Paris Matteo Lissandrini
Search by a list of specifications 1. A Movie and an Actor 2. From the Movie return the Director 3. The Director has won an Award 4. The Movie is adapted from a Book 5. From the Book return the Author Hard to Specify! Too many options! What is the schema? Which specification is important? 3 ICDE 2018 Paris Matteo Lissandrini
Search by Example Lord of the Rings E. Wood P.Jackson Tolkien 4 ICDE 2018 Paris Matteo Lissandrini
Search by Example director LotR P.Jackson actor E. Wood Tolkien 5 ICDE 2018 Paris Matteo Lissandrini
Search by Example LotR actor director P.Jackson Harry Potter, actor director E. Wood Tolkien 6 ICDE 2018 Paris Matteo Lissandrini
Search by Example LotR actor director P.Jackson Harry Potter, James Bond, actor director E. Wood Tolkien 7 ICDE 2018 Paris Matteo Lissandrini
Search by Example LotR actor director P.Jackson Harry Potter, James Bond, actor director E. Wood Tolkien More Intuitive! Avoid list of specifications 8 ICDE 2018 Paris Matteo Lissandrini
ONE EXAMPLE IS NOT ENOUGH When known examples are only Partial Specifications 9 ICDE 2018 Paris Matteo Lissandrini
Search by Multiple Examples actor spouse spouse 10 ICDE 2018 Paris Matteo Lissandrini
ONE EXAMPLE IS NOT ENOUGH When Results have Different Structures 11 ICDE 2018 Paris Matteo Lissandrini
Search by Multiple Examples director actor spouse spouse director spouse 12 ICDE 2018 Paris Matteo Lissandrini
Multi-Example Search Multiple Simple Examples Each Example describes an Aspect Results are Combinations of aspects Results have possibly Multiple Structures 13 ICDE 2018 Paris Matteo Lissandrini
Ambiverse GmbH INFORMATION GRAPHS Nodes Edges 14 ICDE 2018 Paris Matteo Lissandrini
Edge-labelled Multigraphs G: V, E, L, l 15 ICDE 2018 Paris Matteo Lissandrini
Exemplar Queries SINGLE EXAMPLE Q e Input: Q e, an example element of interest Output: set of elements in the desired result set Exemplar Query Evaluation match Q e to sample S in the graph G find the set of elements A similar to S given a similarity relation [OPTIONAL] return only the top-k subset A K A A S A : { a D a S } similarity 16 ICDE 2018 Paris Matteo Lissandrini
Multi-Exemplar Queries Our Problem Formulation: Q e MULTIPLE EXAMPLE Input: Q e, a set of example elements of interest Output: set of elements in the desired result set Exemplar Query Evaluation match each q Q e to the set of samples S:{ s 1, s 2, } in the graph G find the set of elements A similar to each element in S given a similarity relation [OPTIONAL] return only the top-k subset A K A : { a D s S. s a } similarity 17 ICDE 2018 Paris Matteo Lissandrini
(Multi-)Exemplar Queries on Graphs Single Sample A : { a D a S } Similarity ( ) : graph isomorphism A: { a subgraph of G s isomorphic to a} Challenge : find ALL isomorphic graphs Graph Isomorphism is Transitive and Symmetric! A is an Equivalence Class Multiple Samples A : { a D s S. s a } s i,s j S. s i s j A= Similarity ( ) :?Subgraph-Isomorphism A: { a subgraph of G s S. s subgraph isomorphic to a} IS THIS CHARACTERIZATION ENOUGH? 18 ICDE 2018 Paris Matteo Lissandrini Answers Are Subgraphs That Contain Structures Similar To Each Sample
Multi-Exemplar Answers on Graphs Graph Similarity ( ) Subgraph-Isomorphism { a G s S. s subgraph isomorphic to a } What constitutes a good answer? With No Restrictions the Entire Graph Is Accepted as Answer 19 ICDE 2018 Paris Matteo Lissandrini
Multi-Exemplar Answers on Graphs Each answer should be correct, complete and non redundant: Ensure all Aspects are present & Limit Size of Answer Graphs Answers: WEAKLY CONNECTED SUBGRAPHS with NO SUPERFLOUS NODES or EDGES 1. Connectedness n 1,n 2 V A undirected path that connects n 1 to n 2 CORRECT 2. Consistency n A V A s S, n s V s Such that n A maps to n s COMPLETE NON-REDUNDANT 20 ICDE 2018 Paris Matteo Lissandrini
CHALLENGE! To find Multi-Exemplar answers 1. Find ALL isomorphic graphs to ALL samples 2. Find which samples combine into one connected answer For each sample needs to perform Subgraph-Isomorphism Search Candidate space = Cartesian Product for all samples 21 ICDE 2018 Paris Matteo Lissandrini
Search Framework S1 S3 S2 Exploit Localized Search 22 ICDE 2018 Paris Matteo Lissandrini
Search Framework Optimizations 1. Find CANDIDATE REGIONS a. Remove Unused Edges b. Identify SEEDs c. Expand Around each seed 2. SEARCH within each region a. Avoid Cartesian Product b. Fast Merge of Partial Aswers Naïve Algorithm 1 single region Retrieves ALL Isomorphic-subgraphs Hash-JOIN for Fast merge of Partial Answers 2 Advanced Algorithms Fast & Fast+ 23 ICDE 2018 Paris Matteo Lissandrini
Find Candidate Regions - Fast Identify SEED: S1 S2 S3 Min # of matches S1: 4 S2: 6 S3: 7 EXPAND around each seed: Retrieve candidate Regions 24 ICDE 2018 Paris Matteo Lissandrini
<latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> Find Candidate Regions - Fast Identify SEED: S1 S2 S3 Min # of matches S1: 4 S2: 6 S3: 7 EXPAND Precomputed around Statistics: each seed: Retrieve candidate Regions Label frequency Label Pair frequency Star cardinality I star (l, c) = {G 0 G G 0 : hv 0,E 0, `i is a star ^ E 0 = c ^9(v 1,v 2 ) 2 E 0 s.t. `(v 1,v 2 )=l} Approximate Cardinality Estimation 25 ICDE 2018 Paris Matteo Lissandrini
Find Candidate Regions - Fast Identify SEED: S1 S2 S3 Min # of matches S1: 4 S2: 6 S3: 7 EXPAND around each seed: Retrieve candidate Regions Seed search requires Isomorphic-Search Some Regions do not contain all structures 26 ICDE 2018 Paris Matteo Lissandrini
Find Candidate Regions Fast+ Identify SEED: S1 S2 S3 With cardinality Estimation Select SINGLE NODE With bitset-mapping EXPAND around each seed: Retrieve candidate Regions DISCARD incomplete regions With bitset-mapping & before ISO-search 27 ICDE 2018 Paris Matteo Lissandrini
Fast Pruning with Bit-Vectors d=1 S 1 actor actor director spouse author Barack Obama (S2) - - 1 -... Example: Detect JOIN-Node S 2 spouse director Quentin Tarantino (S3) - 1 - - S 3 Steven Spielberg (A1) - 1 1 - Kate Capshaw (A2) - - 1 - BO QT - 1 1 - (union) (BO QT) KC - 1 - - ( 0) (BO QT) SS - - - - (=0) spouse 28 ICDE 2018 Paris Matteo Lissandrini
Top-K Weight Function for Nodes Scoring Function Skip Regions that will not produce answers with score High Enough 29 ICDE 2018 Paris Matteo Lissandrini
Top-K Weight Function for Nodes Scoring Function 1. Sort regions (output of Partial) 2. Estimate Upper Bound Score for each Skip Regions that will not produce answers with score High Enough 3. Find Top-K & prune 30 ICDE 2018 Paris Matteo Lissandrini
Experimental Evaluation 3 Algorithms: a) Naïve b) Fast (iso-graphs as seeds) c) Fast+ (nodes as seeds) Tests: 100 Queries 2-5 Samples Count isomorphic computations Running Time 2 Large Real Datasets: a) YAGO +16.7M Edges b) Freebase +300M Edges 31 ICDE 2018 Paris Matteo Lissandrini
Evaluation Results Datasets: Freebase (300M Edges) Yago (16.7M edges) 100 queries, from 2 to 5 samples mq-naïve mq-fast mq-fast+ Saving Isomorphic Computations # Iso. subgraphs (k) 300 200 100 40-60% Reduction in Iso-Search Computations! a) 0 2 3 4 5 #Query Samples 32 ICDE 2018 Paris Matteo Lissandrini
Evaluation Results Datasets: Freebase (300M Edges) Yago (16.7M edges) 100 queries, from 2 to 5 samples mq-naïve mq-fast mq-fast+ Running Times Total time (sec) b) 100 10 1 AVERAGE MEDIAN 2 3 4 5 #Query Samples Faster 70% of queries Saves up to 25secs on AVG In some cases the Fast algorithm wastes computations 33 ICDE 2018 Paris Matteo Lissandrini
Conclusions Output: Search via Multiple-Examples Find structures similar to a SET of input examples. Composite Results Containing characteristics from each Example Thank You! Questions? p.s. Now you can hire me! Useful When: Optimizations: a Complete Example with all the desired characteristics is not known. Characteristics combine in Multiple ways. 1. Exploit Localized Search 2. Bitset Pruning 3. Cardinality Estimation 4. Top-K Optimizations Localized Search Is less effective With a Dense Graph! 34 ICDE 2018 Paris Matteo Lissandrini
bit.ly / icde18 Thank you! Questions? 35 ICDE 2018 Paris Matteo Lissandrini - ml@disi.unitn.eu
There is more... 36 ICDE 2018 Paris Matteo Lissandrini
Search Framework Optimizations (bis) 1. Find CANDIDATE REGIONS a. Remove Unused Edges b. Identify SEEDs c. Expand Around each seed 2. SEARCH within each region a. Avoid Cartesian Product b. Fast Merge of Partial Aswers Naïve Algorithm 1 single region Retrieves ALL Isomorphic-subgraphs Hash-JOIN for Fast merge of Partial Answers 2 Advanced Algorithms Fast & Fast+ 37 ICDE 2018 Paris Matteo Lissandrini
<latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> <latexit sha1_base64="wvtdkcvi7sy1gxm4eu+ielopiow=">aaacpxicbvfba9swfja9w5ddmm6pezk0re4hblsmvgabshhspqza0asfkbhzoulf5cskowuq/c/2k/a2fzm7mwnrd0di03duot+jmim08f1fjvvg4apht3aetp49f/fyt733aqltxhec81sm6ipigqvicgyekxivkwrxjpeyuvlu+y9xqlrikwuzznaws2uifoizu1fh+wenmbmoivu5dk02tjvd2eohg4lakud1hmk0+a1gubzayptajuuwesfovljnh/vfucosqno6qmfby0foyfcxgxka0hyc0o84xyiuq6+aaxbo3hrvqye1dfdh0fufr4dujdd0mpxtmd03fsjrpn+ibhjowyttjt/3nwb3qdcadmnslgz/ppou5zemhkum9ttwmzoztbnbjzytmmvmgl9hs5xwmgex6pndqfzc24qzwyjv1ukmbni/myyltv7hurvza6rv+mryf75pbhbhmyusldey8g2jrs7bpfcvdozcitdyxqhglaj+cvyakczntdhwjujwd+t7yhlud/x+cp6uc/kxkwohvch7pesc8p6ckfnyrsaeo/voqxpufhu994t74u62oa7t5lwm/5gb/gabv8mn</latexit> Cardinality Estimation Precomputed Statistics: Label frequency Label Pair frequency Star cardinality I star (l, c) = {G 0 G G 0 : hv 0,E 0, `i is a star ^ E 0 = c ^9(v 1,v 2 ) 2 E 0 s.t. `(v 1,v 2 )=l} Exact Number of ISO Subgraphs (Log2) 25 20 15 10 5 0 Pearson Corr. 0.77 Spearman Corr. 0.81 0 5 10 15 20 25 30 Estimated Number of ISO Subgraphs (Log2) 38 ICDE 2018 Paris Matteo Lissandrini
Example Query/Answers John Belushi Steven Tyler sibling child Steven Spielberg directed Maryl Streep awardin Jim Belushi Liv Tyler Hook The iron lady Nick Clooney child George Clooney directed Good night & good luck Frank Sinatra child Frank Sinatra Jr. sibling awardin awardin sibling Rosemary Clooney ER Fred Zinnemann directed From here to eternity Nancy Sinatra Alfred Newman child Thomas Newman sibling Lionel Newman awardin Doctor Dolittle directed Richard Fleischer 39 ICDE 2018 Paris Matteo Lissandrini
Example Query/Answers Examples Richard Hilton Has Child Paris Hilton Harrison Ford ismarriedto Calista Flockhart Acted In Born In Return Of the Jedi Chicago Dustin Hoffman Won Prize David di Donatello Acted In Rain Man Answers WonPrize Ronald Reagan Has Child George H. W.Bush Federal Cross Of Merit ismarriedto actedin Hellcats of The Navy Nancy Reagan Born In New York City Golden Won Prize Robert Globe Duvall Acted In Arnold Schwarzenegger Acted In The 6 th Day IsMarriedTo Born In Has Child Gray Davis Maria Shriver Thal, Styria Cowboy from Acted In Brooklyn Ronald Reagan Acted In HasChild Dick Powell Ron Reagan ismarriedto Born In Won Prize June Allyson Arkansas Federal Cross Of Merit 40 ICDE 2018 Paris Matteo Lissandrini