Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress

Similar documents
Understanding the Open Source Software Community

Evolution in Free and Open Source Software: A Study of Multiple Repositories

Introduction. Tuomi-01.qxd 6/21/02 11:46am Page 1 CHAPTER

A FORWARD- LOOKING VIEW on how analytics will solve some pressing business, consumer and social insight problems.

Science of Science & Innovation Policy and Understanding Science. Julia Lane

INNOVATION NETWORKS IN THE GERMAN LASER INDUSTRY

Small World Problem. Web Science (VU) ( ) Denis Helic. Mar 16, KTI, TU Graz. Denis Helic (KTI, TU Graz) Small-World Mar 16, / 50

Small World Problem. Web Science (VU) ( ) Denis Helic. Mar 16, KTI, TU Graz. Denis Helic (KTI, TU Graz) Small-World Mar 16, / 51

Progress in Network Science. Chris Arney, USMA, Network Mathematician

This list supersedes the one published in the November 2002 issue of CR.

Information Sociology

MSc(CompSc) List of courses offered in

Computer Studies. Resources

PAF: The Bazaar in the Cathedral 1 by Ulrike Melzwig and Conrad Noack

THE GAME THEORY OF OPEN-SOURCE SOFTWARE

technologies, Gigaom provides deep insight on the disruptive companies, people and technologies shaping the future for all of us.

What is the UC Irvine Data Science Initiative?

Recommender Systems TIETS43 Collaborative Filtering

Predicting Content Virality in Social Cascade

Transportation and The Small World

Social Network Analysis and Its Developments

ROGUEWOLF. SmartCities: Anticipating Agents of Change. Adam Amos-Binks Colleen Stacy Lucia Titus Kathleen Vogel Lori Wachter.

What is Tableau and Why Should I Care? Karen Rahmeier and Melissa Perry, Codecinella Madison WI, June 26, 2018

Social Network Theory and Applications

Social Network Analysis in HCI

Intelligent Agents. Introduction to Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 23.

User Research in Fractal Spaces:

CSTA K- 12 Computer Science Standards: Mapped to STEM, Common Core, and Partnership for the 21 st Century Standards

Methodology. Ben Bogart July 28 th, 2011

New developments in the philosophy of AI. Vincent C. Müller. Anatolia College/ACT February 2015

TETRIS approach. Computing and Technology. On Campus - Full time May 2005

Session 3: Position Papers (14:30 16:00)

A collaboration between Maryland Virtual High School and the Pittsburgh Supercomputing Center

Introduction to the Special Section General Theories of Software Engineering: New advances and implications for research

1) Evaluating Internet Resources

lecture 7 Informatics luis rocha 2017 I501 introduction to informatics INDIANA UNIVERSITY

Does the Increase of Patent in China Means the Improvement of Innovation Capability?

The Impact of Computational Science on the Scientific Method

Single-Server Queue. Hui Chen, Ph.D. Dept. of Engineering & Computer Science Virginia State University Petersburg, VA 23806

A Complex Adaptive Model of Information Foraging and Preferential Attachment Dynamics in Global Participatory Science

Higher Education Institutions and Networked Knowledge Societies

Introduction to Computer Science - PLTW #9340

3 rd December AI at arago. The Impact of Intelligent Automation on the Blue Chip Economy

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

CPE/CSC 580: Intelligent Agents

A Software Engineering approach to Libre Software

Indicators from the web - making the invisible visible?

Hardcore Classification: Identifying Play Styles in Social Games using Network Analysis

Using Dynamic Capability Evaluation to Organize a Team of Cooperative, Autonomous Robots

Job Title: DATA SCIENTIST. Location: Champaign, Illinois. Monsanto Innovation Center - Let s Reimagine Together

Agent Models of 3D Virtual Worlds

A Review of Related Work on Machine Learning in Semiconductor Manufacturing and Assembly Lines

Lecture 1 What is AI? EECS 348 Intro to Artificial Intelligence Doug Downey

ty of solutions to the societal needs and problems. This perspective links the knowledge-base of the society with its problem-suite and may help

Economic Clusters Efficiency Mathematical Evaluation

Diffusion of Innovation Across a National Local Health Department Network: A Simulation Approach to Policy Development Using Agent- Based Modeling

Presentation on the Panel Public Administration within Complex, Adaptive Governance Systems, ASPA Conference, Baltimore, MD, March 2011

Single-Server Queue. Hui Chen, Ph.D. Department of Engineering & Computer Science. Virginia State University. 1/23/2017 CSCI Spring

Horizon Scanning. Why & how to launch it in Lithuania? Prof. Dr. Rafael Popper

Knowledge Management for Command and Control

Graph Formation Effects on Social Welfare and Inequality in a Networked Resource Game

Foundations of Distributed Systems: Tree Algorithms

Opportunities and Challenges for Open Innovation

Chapter 1 Basic Concepts and Preliminaries

2/6/2006 Team #7: Pez Project: Empty Clip Members: Alan Witkowski, Steve Huff, Thos Swallow, Travis Cooper Document: SRS

Journal Title ISSN 5. MIS QUARTERLY BRIEFINGS IN BIOINFORMATICS

Project Example: wissen.de

Puppet State of DevOps Market Segmentation Report. Contents

Why Artificial Intelligence will Revolutionize Healthcare including the Behavioral Health Workforce.

Understanding User Privacy in Internet of Things Environments IEEE WORLD FORUM ON INTERNET OF THINGS / 30

Architecting Systems of the Future, page 1

Objectives. Game AI: Collaborative Diffusion. Project: The Sims. Advance from simple game to very sophisticated games

AUTOMATION ACROSS THE ENTERPRISE

Meta Scientific Discovery Beyond Search CHAN ZUCKERBERG INITIATIVE

Using Online Communities as a Research Platform

The Ecology of Participants in Co-Evolving Socio- Technical Environments

Indiana K-12 Computer Science Standards

Chapter 1: About Science

URBAN WIKI AND VR APPLICATIONS

Individual based simulation for online marketplace diffusion among trading small medium enterprises: A conceptual framework

Engineering Scenarios for the Reinforcement of Global Business Intelligence:

Boundary Work for Collaborative Water Resources Management Conceptual and Empirical Insights from a South African Case Study

Semiotics in Digital Visualisation

A Virtual World Distributed Server developed in Erlang as a Tool for analysing Needs of Massively Multiplayer Online Game Servers

Adapting the Staged Model for Software Evolution to FLOSS

PUBLICATIONS BY THE STAFF Springer Vol 32, Issue 2, Dec Ms.S.Sujatha

Modeling Enterprise Systems

The Technology Economics of the Mainframe, Part 3: New Metrics and Insights for a Mobile World

Elements of Artificial Intelligence and Expert Systems

Transforming Sales Teams

Visualizations of personal social networks on Facebook and community structure: an exploratory study

Academies outline principles of good science publishing

Are your company and board ready for digital transformation?

TERMS OF REFERENCE FOR CONSULTANTS

MEDIA AND INFORMATION

TECHNOLOGY, ARTS AND MEDIA (TAM) CERTIFICATE PROPOSAL. November 6, 1999

The Future of e-tourism Research

Introduction: What are the agents?

Analog Custom Layout Engineer

Individual Test Item Specifications

Transcription:

Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science North Carolina State University Renee Tynan Chris Hoffman Department of Management University of Notre Dame AMCIS2003 Tampa, FL August 2003 Supported in part by the National Science Foundation - Digital Society & Technology Program

Outline Definitions: Agents, models, simulations, collaborative social networks, computer experiments Phenomenon: Free/Open Source Software (F/OSS) Conceptual models ER model BA model BA model with constant fitness BA model with dynamic fitness Experiments and results Summary Some discussion questions

Agent-Based Modeling and Simulation Conceptual models of a phenomenon Simulations are computer implementations of the conceptual models Agents in models and simulations are distinct entities (instantiated objects) Tend to be simple, but with large numbers of them (thousands, or more) - i.e., swarm intelligence Contrasted with higher level intelligent agents Foundations in complexity theory Self-organization Emergence

Collaborative Social Networks Research-paper co-authorship, small world phenomenon, e.g., Erdos number (Barabasi 2001, Newman 2001) Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts 1999, 2003) Interlocking corporate directorships Open-source software developers (Madey et al, AMCIS 2002) Collaborators are nodes in a graph, and collaborative relationship are the edges of the graph

Classical Scientific Method 1. Observe the world a) Identify a puzzling phenomenon 2. Generate a falsifiable hypothesis (K. Popper) 3. Design and conduct an experiment with the goal of disproving the hypothesis a) If the experiment fails,, then the hypothesis is accepted (until replaced) b) If the experiment succeeds,, then reject hypothesis, but additional insight into the phenomenon may be obtained and steps 2-3 repeated

The Computer Experiment

Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Observation Agent -Based Simulation (Experiment)

Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Social Network Model of F/OSS Observation Analysis of SourceForge Data Agent -Based Simulation (Experiment) Grow Artificial SourceForge

Open Source Software (OSS) GNU Savannah Free to view source to modify to share of cost Examples Apache Perl GNU Linux Sendmail Python KDE GNOME Mozilla Thousands more Linux

Free Open Source Software (F/OSS) Development Mostly volunteer Global teams Virtual teams Self-organized - often peer-based meritocracy Self-managed - but often a charismatic leader Often large numbers of developers, testers, support help, end user participation Rapid, frequent releases Mostly unpaid

F/OSS Developers Larry Wall Perl Linus Tolvalds Linux Eric Raymond Cathedral and Bazaar Richard Stallman GNU GNU Manifesto

F/OSS: A Puzzling Phenomenon Contradicts traditional wisdom: Software engineering Coordination, large numbers Motivation of developers Quality Security Business strategy Almost everything is done electronically and available in digital form Opportunity for IS Research -- large amounts of online data available Research issues: Understanding motives Understanding processes Intellectual property Digital divide Self-organization Government policy Impact on innovation Ethics Economic models Cultural issues International factors

SourceForge VA Software Part of OSDN Started 12/1999 Collaboration tools 58,685 Projects 80,000 Developers 590,00 Registered Users

Savannah Uses SourceForge Software Free Software Foundation 1,508 Projects 15,265 Registered Users

F/OSS: Importance Major Component of e-technology Infrastructure with major presence in e-commerce e-science e-government e-learning Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet e-mail runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects

Free/Open Source Software Seems to challenge traditional economic assumptions Model for software engineering New business strategies Cooperation with competitors Beyond trade associations, shared industry research, and standards processes shared product development! Virtual, self-organizing and self-managing teams Social issues, e.g., digital divide, international participation Government policy issues, e.g., US software industry, impact on innovation, security, intellectual property

Research Model Cross Validation Conceptual Explanatory Model of OSS: Agent-Based Modeling and Simulation Combined Data Mining Parameter Values Parameter Values Structural Features Understanding the Social and Task Dynamics that Predict Developer Behaviors Social Network Analysis: Longitudinal Study of Preferential Attachment and Dynamic Attachment Structural Features Parameter Values

Observations Web mining Web crawler (scripts) Python Perl AWK Sed Monthly Since Jan 2001 ProjectID DeveloperID Almost 2 million records Relational database PROJ DEVELOPER 8001 dev378 8001 dev8975 8001 dev9972 8002 dev27650 8005 dev31351 8006 dev12509 8007 dev19395 8007 dev4622 8007 dev35611 8008 dev8975

Models of the F/OSS Social Network (Alternative Hypotheses) General model features Agents are nodes on a graph (developers or projects) Behaviors: Create, join, abandon and idle Edges are relationships (joint project participation) Growth of network: random or types of preferential attachment, formation of clusters Fitness Network attributes: diameter, average degree, degree distribution, clustering coefficient Four specific models ER (random graph) - (1960) BA (preferential attachment) - (1999) BA ( + constant fitness) - (2001) BA ( + dynamic fitness) - (2003)

F/OSS Developers - Collaboration Social Network Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster Project 7597 dev[64] Project 6882 dev[72] dev[67] dev[47] 6882 dev[47] dev[52] 6882 dev[47] dev[55] 6882 dev[47] 6882 dev[58] dev[79] dev[47] dev[79] dev[52] dev[55] dev[58] dev[83] Project 15850 Project 7028 dev[99] dev[51] 15850 dev[46] dev[58] dev[57] 7597 dev[46] 7028 dev[46] dev[70] 7028 dev[46] dev[57] dev[99] 7028 dev[46] dev[51] dev[46] 15850 dev[46] 15850 dev[46] dev[56] dev[83] 15850 dev[46] dev[48] dev[48] dev[70] 7597 dev[46] dev[72] dev[56] 7597 dev[46] dev[64] 7597 dev[46] dev[67] 7597 dev[46] dev[55] 7597 dev[46] dev[45] 7597 dev[46] dev[61] 7597 dev[46] dev[58] 9859 dev[46] dev[54] 9859 dev[46] 9859 dev[46] dev[49] dev[53] 9859 dev[46] dev[59] dev[53] dev[54] dev[58] dev[59] dev[49] Project 9859 dev[65] dev[45] dev[61]

Computer Experiments Agent-based simulations Java programs using Swarm class library Validation (docking) exercises using Java/Repast Grow artificial SourceForge SourceForge s (Epstein & Axtell, 1996) Parameterized with observed data, e.g., developer behaviors Join rates New project additions Leave projects Evaluation of four models (hypotheses) Verification/validation

Four Cycles of Modeling & Simulation Modeling (Hypothesis) Social Network Models ER => BA => BA+Fitness => BA+Dynamic Fitness Observation Analysis of SourceForge Data Degree Distribution Average Degree Diameter Clustering Coefficient Cluster Size Distribution Agent -Based Simulation (Experiment) Grow Artificial SourceForge

ER model degree distribution Degree distribution is binomial distribution while it is power law in empirical data Fit fails

ER model - diameter Average degree is decreasing while it is increasing in empirical data Diameter is increasing while it is decreasing in empirical data Fit fails

ER model clustering coefficient Clustering coefficient is relatively low around 0.4 while it is around 0.7 in empirical data. Clustering coefficient is decreasing while it is increasing in empirical data Fit fails

ER model cluster distribution Cluster distribution in ER model also have power law distribution with R 2 as 0.6667 (0.9953 without the major cluster) while R 2 in empirical data is 0.7457 (0.9797 without the major cluster) The actual distribution is different from empirical data The later models (BA and further models) have similar behaviors Fit fails

BA model degree distribution Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 of 0.9798 and empirical data has R 2 of 0.9712. Fit succeeds For project distribution: simulated data has R 2 of 0.6650 and empirical data has R 2 of 0.9815. Fit fails

BA model diameter and CC Small diameter and high clustering coefficient like empirical data Diameter and clustering coefficient are both decreasing like empirical data Fit succeeds

BA model with constant fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as 0.9742 and empirical data has R 2 as 0.9712. Fit succeeds For project distribution: simulated data has R 2 as 0.7253 and empirical data has R 2 as 0.9815. Fit fails Diameter and CC are similar to simple BA model. Fit succeeds

Discovery: BA with dynamic fitness Problem with BA with constant fitness Intuition: Project fitness might change with time. Data mining observation: project life cycle property - fitness generally decreases with time New model not in the literature Hypothesis: BA with dynamic fitness of projects Computer experiment

BA model with dynamic fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as 0.9695 and empirical data has R 2 as 0.9712. Fit succeeds (as before) For project distribution: simulated data has R 2 as 0.8051 and empirical data has R 2 as 0.9815. Fit is better, but more work needed

Agent-Based Modeling and Simulation as Components of the Scientific Method Hypothesis Observation Experiment

Summary Why Agent-Based Modeling and Simulation? Can be used as components of the Scientific Method A research approach for studying socio-technical systems Case study: F/OSS - Collaboration Social Networks SourceForge conceptual models: ER, BA, BA with constant fitness and BA with dynamic fitness. Simulations Computer experiments that tested conceptual models Provided insight into the phenomenon under study and guided data mining of collected observations

Discussion The social sciences are, in fact, the hard sciences, Herbert Simon (1987) Computational social science: agent-based modeling and simulation Kuhn s periods of Normal Science punctuated by Paradigm shifts Karl Popper s theory-testing through falsification Relevant literature on the role of simulation in the process of scientific discovery

Thank you