Scalable Methods for the Analysis of Network-Based Data

Similar documents
A proposal for the analysis of disaster-related network data. Miruna Petrescu-Prahova

Sociology Social Network Analysis for Social Scientists

Detection of Compound Structures in Very High Spatial Resolution Images

What is the UC Irvine Data Science Initiative?

Paper Presentation. Steve Jan. March 5, Virginia Tech. Steve Jan (Virginia Tech) Paper Presentation March 5, / 28

Community Detection and Labeling Nodes

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Location Discovery in Sensor Network

Machine Learning for Computational Sustainability

November 6, Keynote Speaker. Panelists. Heng Xu Penn State. Rebecca Wang Lehigh University. Eric P. S. Baumer Lehigh University

The Intel Science and Technology Center for Pervasive Computing

Link State Routing. Brad Karp UCL Computer Science. CS 3035/GZ01 3 rd December 2013

Wireless Network Delay Estimation for Time-Sensitive Applications

MAE 298 June 6, Wrap up

Outline. Tracking with Unreliable Node Sequences. Abstract. Outline. Outline. Abstract 10/20/2009

MACCCS (MAX) Kickoff Meeting Welcome!

The Role and Design of Communications for Automated Driving

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function

Link State Routing. Stefano Vissicchio UCL Computer Science CS 3035/GZ01

A Secure Transmission of Cognitive Radio Networks through Markov Chain Model

Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Antonis Panagakis, Athanasios Vaios, Ioannis Stavrakakis.

Semi-Automatic Antenna Design Via Sampling and Visualization

Games, Privacy and Distributed Inference for the Smart Grid

Learning from Hints: AI for Playing Threes

Increasing Broadcast Reliability for Vehicular Ad Hoc Networks. Nathan Balon and Jinhua Guo University of Michigan - Dearborn

Cognitive Radio Techniques

Social Network Analysis and Its Developments

Tracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Internet of Things Cognitive Radio Technologies

Time Synchronization and Distributed Modulation in Large-Scale Sensor Networks

Mission-focused Interaction and Visualization for Cyber-Awareness!

Fourier Analysis and Change Detection. Dynamic Network Analysis

The world s first collaborative machine-intelligence competition to overcome spectrum scarcity

ENERGY consumption is a key issue in wireless sensor. Distributed Estimation of Channel Gains in Wireless Sensor Networks

COMP Online Algorithms. Paging and k-server Problem. Shahin Kamali. Lecture 11 - Oct. 11, 2018 University of Manitoba

DISCIPLINARY AND INTERDISCIPLINARY RESEARCH AT NSF

Recent Advances in Image Deblurring. Seungyong Lee (Collaboration w/ Sunghyun Cho)

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Alternation in the repeated Battle of the Sexes

UCI Knowledge Management Meeting March 28, David Redmiles

I. INTRODUCTION II. LITERATURE SURVEY. International Journal of Advanced Networking & Applications (IJANA) ISSN:

Graph Formation Effects on Social Welfare and Inequality in a Networked Resource Game

Identifying Scatter Targets in 2D Space using In Situ Phased Arrays for Guided Wave Structural Health Monitoring

Sharing Multiple Messages over Mobile Networks! Yuxin Chen, Sanjay Shakkottai, Jeffrey G. Andrews

Review of Cooperative Localization with Factor Graphs. Aggelos Bletsas ECE TUC. Noptilus Project Sept. 2011

Mathematical Problems in Networked Embedded Systems

CS221 Project Final Report Gomoku Game Agent

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Kalman Filtering, Factor Graphs and Electrical Networks

Distributed estimation and consensus. Luca Schenato University of Padova WIDE 09 7 July 2009, Siena

SSB Debate: Model-based Inference vs. Machine Learning

Proposed Graduate Course at ANU: Statistical Communication Theory

Romantic Partnerships and the Dispersion of Social Ties

A survey on broadcast protocols in multihop cognitive radio ad hoc network

JAMES M. CALVIN. 15 Montgomery Avenue Associate Professor

Changjiang Yang. Computer Vision, Pattern Recognition, Machine Learning, Robotics, and Scientific Computing.

Gene coancestry in pedigrees and populations

A Factor Graph Based Dynamic Spectrum Allocation Approach for Cognitive Network

Dynamically Configured Waveform-Agile Sensor Systems

Practical Big Data Science

V.S.B. ENGINEERING COLLEGE, KARUR. Department of Computer Science and Engineering

State-Space Models with Kalman Filtering for Freeway Traffic Forecasting

Multi robot Team Formation for Distributed Area Coverage. Raj Dasgupta Computer Science Department University of Nebraska, Omaha

Cricket: Location- Support For Wireless Mobile Networks

Proposers Day Workshop

2.6.1: Program Outcomes

2007 Census of Agriculture Non-Response Methodology

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Rm 211, Department of Mathematics & Statistics Phone: (806) Texas Tech University, Lubbock, TX Fax: (806)

Spectra of UWB Signals in a Swiss Army Knife

Meme Tracking. Abhilash Chowdhary CS-6604 Dec. 1, 2015

Social Network Theory and Applications

Collaborative transmission in wireless sensor networks

CONTROL OF SENSORS FOR SEQUENTIAL DETECTION A STOCHASTIC APPROACH

Bayesian Positioning in Wireless Networks using Angle of Arrival

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Wireless Network Security Spring 2012

From ProbLog to ProLogic

Robust Location Detection in Emergency Sensor Networks. Goals

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48

Privacy at the communication layer

Cognitive Green Communications: From Concept to Practice

Tutorial of Reinforcement: A Special Focus on Q-Learning

Applications & Theory

Visualizing Sensor Data

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Opportunistic Communications under Energy & Delay Constraints

Innovation-Based Economic Development Strategy for Holyoke and the Pioneer Valley

Spectrum Sensing Brief Overview of the Research at WINLAB

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Sequential Multi-Channel Access Game in Distributed Cognitive Radio Networks

The Self-Avoiding Walk (Probability And Its Applications) By Neal Madras;Gordon Slade

Transcription:

Scalable Methods for the Analysis of Network-Based Data MURI Project: University of California, Irvine Annual Review Meeting December 8 th 2009 Principal Investigator: Padhraic Smyth

Today s Meeting Goals Review our research progress Feedback from project sponsors (ONR) Format Introduction Tutorial talks Research updates from each PI Poster session by graduate students Discussion and feedback Butts P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 2

Project Dates Project Timeline Start date: May 1 2008 End date: April 30 2011/2013 Meetings Kickoff Meeting, November 2008 Working Meeting, April 2009 Working Meeting, August 2009 Annual Review, December 2009 [meeting slides online at www.datalab.uci.edu/muri ] P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 3

MURI Investigators Padhraic Smyth UCI David Eppstein UCI Carter Butts UCI Michael Goodrich UCI Mark Handcock U Washington Dave Mount U Maryland Dave Hunter Penn State P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 4

Collaboration Network Mike Goodrich David Eppstein Carter Butts Dave Hunter Dave Mount Padhraic Smyth Mark Handcock P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 5

Collaboration Network Lowell Trott Maarten Loffler Darren Strash Emma Spiro Chris Marcum Lorien Jasny Zack Almquist Sean Fitzhugh Ryan Acton Mike Goodrich David Eppstein Carter Butts Dave Hunter Duy Vu Michael Schweinberger Dave Mount Padhraic Smyth Ruth Hummel Mark Handcock Eunhui Park Minkyoung Cho Arthur Asuncion Romain Thibaux Chris DuBois Drew Frank Miruna Petrescu-Prahova P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 6

Data Statistical Models Scalable Algorithms Evaluation Software and Applications P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 7

Limitations of Existing Methods Computational intractability Current statistical network modeling algorithms can scale exponentially in the number of nodes N Network data over time Relatively little work on statistical models for dynamic network data Heterogeneous data e.g., few techniques for incorporating text, spatial information, etc, into network models P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 8

G = {V, E} Example V = set of N nodes E = set of directed binary edges Exponential random graph (ERG) model P(G θ) = f( G ; θ ) / normalization constant The normalization constant = sum over all possible graphs How many graphs? 2 N(N-1) e.g., N = 20, we have 2 380 ~ 10 38 graphs to sum over P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 9

P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 10

Key Themes of our MURI Project Foundational research on new statistical models and methods for social network data e.g., decision-theoretic foundations of social networks Efficient estimation algorithms E.g., efficient data structures for very large data sets New algorithms for heterogeneous network data Incorporating time, space, text, other covariates Software Make network inference software publicly-available (in R) P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 11

Efficient Algorithms New Statistical Methods Richer models Complex Data Sets New Applications Software P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 12

Complex Network Data Data types Actors and ties Temporal events (Posters by DuBois, Almquist, Jasny, Marcum) Spatial information (Poster by Acton) Text data (Poster by Asuncion, talk by Smyth) Actor and tie covariates Structure Hierarchies and clusters (Talk by Petrescu-Prahova, Poster by DuBois) Measurement issues Sampling Missing data P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 13

350 300 Enron Email Data messages per week (total) number of senders Poster by Chris DuBois 250 200 150 100 50 0 1999 2000 2001 2002 P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 14

Spatial Network Data Poster by Ryan Acton P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 15

Missing Data Handcock and Gile, 2008 P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 16

Statistical Models for Network Data Exponential random graph models (Talks by Hunter, Eppstein, Petrescu-Prahova) Relational event models (Posters by Marcum, Jasny) Latent-variable models (Talks by Mount, Smyth, Petrescu-Prahova) (Posters by Asuncion, DuBois) Decision-theoretic frameworks for social networks (Talk by Butts) P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 17

Estimation Algorithms We seek P(parameters data) Exact algorithms are rare Approximate search E.g., Markov chain Monte Carlo (talks by Hunter, poster by Hummel) Exact solution of simpler objective function E.g., pseudolikelihood v. likelihood (talks by Hunter) P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 18

Computational Efficiency Parameter estimation can scale from O(Ne) to O(2 N(N-1) ) Data structures for efficient computation: H-index for change-score statistics (talk by Eppstein, posters by Spiro and by Trott) Nets and net-trees (talk by Mount, poster by Park) - Priority range trees (poster by Strash) P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 19

h-index Data Structures Eppstein and Spiro, 2009 Maximum number of nodes such that h nodes each have at least h neighbors P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 20

Evaluation and Prediction Evaluation on real-world data sets Katrina communication networks World Trade Center disaster response data Political blogs Facebook egonets Facebook UNC Enron email data and more Metrics Assessment of model fit, e.g., BIC criterion Predictive accuracy on test data, e.g., for temporal events P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 21

Poster by Almquist P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 22

Publications C. T. Butts, Revisiting the foundations of network analysis, Science, 325, 414-416, 2009 R. Hummel, M. Handcock, D. Hunter, A steplength algorithm for fitting ERGMS, winner of the American Statistical Association (Statistical Computing and Statistical Graphics Section) student paper award, presented at the ASA Joint Statistical Meeting, 2009. D. Eppstein and E. S. Spiro, The h-index of a graph and its application to dynamic subgraph statistics, Algorithms and Data Structures Symposium, Banff, Canada, August 2009 D. Newman, A. Asuncion, P. Smyth, M. Welling, Distributed algorithms for topic models, Journal of Machine Learning Research, in press, 2009 M. Cho, D. M. Mount, and E. Park, Maintaining nets and net trees under incremental motion, in Proceedings of the 20 th International Symposium on Algorithms and Computation, 2009. M. Gjoka, M. Kurant, C. T. Butts, A. Markopoulou, A walk in Facebook: uniform sampling of users in online social networks, electronic preprint, IEEE Infocom, to appear. P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 23

Preprints R.M. Hummel, M.S. Handcock, D.R. Hunter, A steplength algorithm for fitting ERGMs, submitted, 2009 C. T. Butts, A behavioral micro-foundation for cross-sectional network models, preprint, 2009 C. T. Butts, A perfect sampling method for exponential random graph models, preprint, 2009 A. Asuncion and M. Goodrich, Turning privacy leaks into floods: Surreptitious discovery of Facebook friendships and other sensitive binary attribute vectors, submitted, 2009. A. Asuncion, Q. Liu, A. Ihler, P. Smyth, Learning with blocks: composite likelihood and contrastive divergence, submitted, 2009. P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 24

Morning Session I 9:00 Introduction and Overview Padhraic Smyth, UC Irvine 9:20 Principles of Statistical Network Modeling Carter Butts, UC Irvine 9:50 Estimation Methods for Statistical Network Modeling David Hunter, Pennsylvania State University 10:15 Break P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 25

Morning Session II 10:40 Efficient Computation of Change-Graph Scores David Eppstein, UC Irvine 11:05 Decision-Theoretic Foundations of Statistical Network Models Carter Butts, UC Irvine 11:30 Privacy Leaks and Floods in Social Networks Michael Goodrich, UC Irvine 12:00 Break for lunch - PIs + ONR visitors at the University Club - Students and postdocs, lunch in 6011 P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 26

Graduate Student Poster Session (1:15 to 2:30, in this room, 6011) Lorien Jasny: Chris Marcum: Zack Almquist: Sean Fitzhugh: Ryan Acton: Emma Spiro: Darren Strash: Lowell Trott: Chris DuBois: Arthur Asuncion: Ruth Hummel: Eunhui Park: Using Egocentric Relational Event Models to Predict Improvisation Complex Sequence Terms for Egocentric Relational Event Models Logistic Model for Network Evolution (Katrina Case) Effects of Individual and Group-level Properties on World Trade Center Radio Network Robustness Geographical Models of Large-scale Social Networks Assessing the Degree h-index Distribution for Social Networks Priority Range Trees Extended Dynamic Subgraph Statistics using the h-index Stochastic Blockmodels for Network-based Event Data Joint Statistical Models for Text and Social Networks A Steplength Algorithm for Fitting ERGMs A Dynamic Data Structure for Approximate Range Searching P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 27

Afternoon Session I 2:30 Algorithms and Data Structures for Embedded Network Data David Mount, University of Maryland 2:55 Latent Variable Models for Text, Event, and Network Data Padhraic Smyth, UC Irvine 3:15 COFFEE BREAK P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 28

Afternoon Session II 3:40 Scalable Estimation Algorithms for Large Network Data Sets David Hunter, Pennsylvania State University 4:05 Statistical Inference for Latent Degree-Class Models with Applications to Disaster Networks Miruna Petrescu-Prahova, University of Washington and Michael Schweinberger, Pennsylvania State University 4:30 OPEN DISCUSSION 5:15 ADJOURN P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 29

Logistics Meals Lunch at University Club - for visitors and PIs Refreshment breaks at 10:30 and 3:15 Wireless Should be able to get 24-hour guest access from UCI network Online Slides and Schedule www.datalab.uci.edu/muri Reminder to speakers: leave time for questions and discussion! P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 30

Questions? P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 31

Nets and Net Trees Cho, Mount, Park, 2009 P. Smyth: Networks MURI Project Meeting, Dec 8 th 2009: 32