Truthy: Enabling the Study of Online Social Networks

Similar documents
Ethical, Epistemological, Methodological, Social and Other

Advanced Analytics for Intelligent Society

Collective behaviors and networks

Polarization Analysis of Twitter Users Using Sentiment Analysis

Social Network Analysis in HCI

WHITE PAPER. NLP TOOL (Natural Language Processing) User Case: isocialcube (Social Networks Campaign Management)

Sentiment Analysis of User-Generated Contents for Pharmaceutical Product Safety

THE CHALLENGES OF SENTIMENT ANALYSIS ON SOCIAL WEB COMMUNITIES

Predicting Content Virality in Social Cascade

THE DEEP WATERS OF DEEP LEARNING

General Briefing v.1.1 February 2016 GLOBAL INTERNET POLICY OBSERVATORY

BHL Moves Forward 2014 an update

BI TRENDS FOR Data De-silofication: The Secret to Success in the Analytics Economy

Latest trends in sentiment analysis - A survey

MODULE 5 FACEBOOK PROMOTION AND MARKETING STRATEGIES

STOA Workshop State of the art Machine Translation - Current challenges and future opportunities 3 December Report

Learning Dota 2 Team Compositions

A STUDY ON THE DOCUMENT INFORMATION SERVICE OF THE NATIONAL AGRICULTURAL LIBRARY FOR AGRICULTURAL SCI-TECH INNOVATION IN CHINA

Levels of Trace Data for Social and Behavioural Science Research

LAB #5: GETTING STARTED WITH SOCIAL MEDIA. ERIKA DEBLASI, CRC President

KELLER REALTY WILLIAMS. Getting Started on Twitter. Brought to you by Keller Williams Realty

Exploring the New Trends of Chinese Tourists in Switzerland

digital marketing launch process

STRATEGIC FRAMEWORK Updated August 2017

Social Big Data. LauritzenConsulting. Content and applications. Key environments and star researchers. Potential for attracting investment

SOCIAL MEDIA SUPERHERO the workbook

Information Evolution in Social Networks

FPGA-Based Accelerator Development for Non-Engineers


Social Data Analytics Tool (SODATO)

Twitter Quick Start Guide

Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology

Make easy money with Twitter!

Framework for Participative and Collaborative Governance using Social Media Mining Techniques

QS Spiral: Visualizing Periodic Quantified Self Data

POLICY SIMULATION AND E-GOVERNANCE

computational social networks 5th pdf Computational Social Networks Home page Computational Social Networks SpringerLink

Accessing Census statistics relating to your parish

Blogging on the MRes: Getting started

Enabling ICT for. development

Image Finder Mobile Application Based on Neural Networks

The Uses of Big Data in Social Research. Ralph Schroeder, Professor & MSc Programme Director

THE DIGITAL MUSEUM AS PLATFORM

A Study of Emergent Norm Formation in Online Crowds

Keywords Big Data; digital devices; Interdisciplinarity; social life of methods; transactional data

Esports Betting Service Reach the next generation of customers with the #1 esports betting provider

Disaster Prevention System Utilizing Social Media Information

Data Collection: Christmas Bird Count Counting Started: 1899

THE ULTIMATE GUIDE TWITTER CHATS

Marketing and Publicity at the MIT Press Our guide to what you can do to help us make your book a success

From the Twitter Stream to your Stats Screen:

Module 4. Session 2: 10 Tools To Maximize

25) Click Save Changes

Tourism network analysis 1

The University of Sheffield Research Ethics Policy Note no. 14 RESEARCH INVOLVING SOCIAL MEDIA DATA 1. BACKGROUND

BIG IDEAS. Personal design choices require self-exploration, collaboration, and evaluation and refinement of skills. Learning Standards

Mapping, Illuminating, and Interacting with Science (sap_0116) Mapping, Illuminating, and Interacting with Science (sap_0116)

Future of Strategic Foresight

Researchers and new tools But what about the librarian? mendeley.com

Reflections on Design Methods for Underserved Communities

OptiSystem. Optical Communication System and Amplifier Design Software

OptiSystem. Optical Communication System and Amplifier Design Software

Increase Your Twitter Traffic Worksheet

Outline. Collective Intelligence. Collective intelligence & Groupware. Collective intelligence. Master Recherche - Université Paris-Sud

ArkPSA Arkansas Political Science Association

Towards a Magna Carta for Data

Twitter Tips for small business by Trudy Thompson

Your fundraising toolkit

A social networking-based approach to information management in construction

Smartkarma FAQ. Smartkarma Innovations Pte Ltd Singapore Co. Reg. No G

Erwin Mlecnik 1,2. Keywords: Renovation, Supply Chain Collaboration, Innovation, One Stop Shop, Business models. 1. Introduction

Text Analysis of Kurt Vonnegut with the HathiTrust & Voyant. Tassie Gniady David Kloster Guangchen Ruan Robert McDonald

Mastering Facebook Advertising... 3 Section 1 Choose Your Facebook Offer... 4 Find Your Niche... 4 The Big Three... 4 Google Trends...

Classification Experiments for Number Plate Recognition Data Set Using Weka

Wordpress Wizard... 3 Section 1 Wordpress Getting Your Domain... 4 Get Your Hosting Plan... 5 Updating Your Name Servers in NameCheap...

Accessibility on the Library Horizon. The NMC Horizon Report > 2017 Library Edition

COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES

ABC-TRIZ: INTRODUCTION TO CREATIVE DESIGN THINKING WITH MODERN TRIZ MODELING BY MICHAEL A. ORLOFF

[Workshop 3 Part 2] You can see here on this post, I just posted this article yesterday and I ve already had 10 Google+1 s on there.

Predicting the movie popularity using user-identified tropes

IBM SPSS Neural Networks

Hence analysing the sentiments of the people are more important. Sentiment analysis is particular to a topic. I.e.,

Meme Tracking. Abhilash Chowdhary CS-6604 Dec. 1, 2015

College of Information Science and Technology

Social Media and Networking

Chapter 7 Information Redux

Leveraging IP Data to Drive Breakthrough Innovation

2. What is Text Mining? There is no single definition of text mining. In general, text mining is a subdomain of data mining that primarily deals with

Additive Manufacturing: A New Frontier for Simulation

Board/Authority Authorized Course Framework Template

Title of your Blog. Your Name

Institute of Information Systems Hof University

SHORTY AWARDS TEAM USA MIRACLE ON ICE LIVE BEST USE OF TWITTER

Haodong Yang, Ph.D. Candidate

Social Network Analysis and Its Developments

UX Aspects of Threat Information Sharing

Analysis of the electrical disturbances in CERN power distribution network with pattern mining methods

Global Journal of Engineering Science and Research Management

3-5 TA TEKS Content Integration

EDUCATION EMPLOYMENT. 2009: Elected to Member of IBM Academy of Technology.

Transcription:

arxiv:1212.4565v2 [cs.si] 20 Dec 2012 Karissa McKelvey Filippo Menczer Center for Complex Networks and Systems Research Indiana University Bloomington, IN, USA Truthy: Enabling the Study of Online Social Networks Abstract The broad adoption of online social networking platforms has made it possible to study communication networks at an unprecedented scale. Digital trace data can be compiled into large data sets of online discourse. However, it is a challenge to collect, store, filter, and analyze large amounts of data, even by experts in the computational sciences. Here we describe our recent extensions to Truthy, a system that collects Twitter data to analyze discourse in near real-time. We introduce several interactive visualizations and analytical tools with the goal of enabling citizens, journalists, and researchers to understand and study online social networks at multiple scales. Author Keywords Research Data Management; Visualization; HCID; Collective Intelligence; Online Social Networks Copyright is held by the author/owner(s). CSCW 13 Companion, Feb. 23 27, 2013, San Antonio, TX, USA. ACM 978-1-4503-1332-2/13/02. ACM Classification Keywords H.5 [Human-centered computing]: Information visualization; Collaborative and social computing devices; H.3.7 [Information systems]: Digital libraries and archives; Data stream mining; Collaborative filtering.

Introduction Reseachers can study commnication networks on a larger scale than has been possible before in human history due to the high availability and use of the Internet as a central communication platform [9]. Recent studies have demonstrated that digital trace data can be combined with sophisticated statistical tools to produce insights into the behavior and interactivity patterns of hundreds of thousands of individual actors [4, 1]. With these new information and communication technologies, researchers are afforded with many opportunities to further human knowledge and understanding of community discourse and deliberation. Despite the promises of these advances, there are various limitations to using social networks as a primary data source. It is often difficult or expensive for researchers trained in the social sciences to utilize the techonological expertise required to collect, store, filter, and analyze large amounts of data. Spam and misinformation add noise to results, often originating from compromised accounts of otherwise legitimate users [2, 3, 7], and should be flagged for removal. Research is also difficult to reproduce when performed on a wide variety of datasets gathered with custom toolkits. Thus, it is beneficial for researchers to utilize a centralized platform. It is also important that any such platform be free or inexpensive, unlike for-profit social media analytics services such as GNIP (http://gnip.com/), as researchers often operate within limited budgets. We demonstrate the following contributions: interactive visualizations enabling users to better understand social networks and online trends; tools allowing users to freely download derived data from our large historical repository of online discourse; interfaces that allow users to tag content in order to improve automatic classifiers that detect behavior such as spam and misinformation. The Truthy System The Truthy system (http://truthy.indiana.edu) was originally designed to analyze and detect the emergence of coordinated misinformation campaigns on Twitter [7]. Now tasked with advancing the study of social networks in general, Truthy monitors a real-time feed of 140-character messages known as tweets and clusters them into groups of related messaged called memes. Memes typically correspond to discussion topics, communication channels, or information resources shared among Twitter users, so that one can focus attention to understandable units of information transfer. We define each meme as the set of all tweets containing a common hashtag (e.g., #bahrain), mentioned username (e.g., @BarackObama), hyperlink, or phrase. Memes are extracted from tweets that match lists of hand-picked keywords ( themes ); users can browse memes according to these themes (Fig. 1). We refer the reader to previous work for a detailed description of the algorithms utilized for data collection, storage and filtering [6]. For each meme, the user is presented with an interactive dashboard containing a crowdsourced definition from Tagdef (http://tagdef.com), a high-resolution image of the meme s information diffusion network (Fig. 2), and various interactive visualizations and statistics. Available information includes the number of users and tweets, meme diffusion network statistics such as mean degree and largest connected component size, as well as user-specific statistics such as predicted political partisanship, sentiment score, language, and activity. Users can download the aforementioned derived data,

recent tweets, and the network graphs themselves for use in a spreadsheet or an analysis application such as Network Workbench [8]. We ensure that this data download function abides by the Twitter Terms of Service. We provide various interactive visualizations allowing users to interrogate the network structure, the characteristics of geography and time, and meme-meme co-occurrence patterns [5]. Another important new contribution of the Truthy system is in enabling active improvement of our algorithms by facilitating the tagging of suspicious content. Through the meme detail interface (Fig. 3), one can tweet about a meme (Fig. 4) or user (Fig. 5) in a syntax that can be automatically parsed by our system. We collect these posts for future studies analyzing the reliability of crowdsourced data in identifying persuasion and spam campaigns. Future Work In future work, we would like to provide a broader set of historical data while improving our visualizations via collaboration with the public. These initiatives include: providing a public REST API for derived data about individual tweets, memes, and users; expanding the scope to include all of the tweets we have collected since September 2010; facilitating user-defined themes to expand visualizations to customized content. Conclusion Tools like Truthy stimulate the study of online social networks. In particular, we aim to support social scientists who currently find this research difficult, expensive, or impossible to reproduce. We hope that open data and open tools like ours will advance efforts to leverage large data streams from the Internet as a primary source in the social sciences. Furthermore, we hope that our interactive visualizations facilitate the navigation and understanding of online discourse, whether for research, journalism, or general use cases. Acknowledgements We would like to thank Clayton Davis, Michael Conover, Jacob Ratkiewicz, Bruno Goncalves, Mark Meiss, Alessandro Flammini, Johan Bollen, Alessandro Vespignani, and other current and past members of the Truthy group at Indiana University for helpful discussions and contributions to the Truthy Project. We gratefully acknowledge support from the National Science Foundation (grant CCF-1101743), DARPA (grant W911NF-12-1-0037), and the McDonnell Foundation. References [1] Conover, M. D., Ratkiewicz, J., Gonçalves, B., Flammini, A., and Menczer, F. Political polarization on twitter. In International Conference on Weblogs and Social Media 2011 (2011). [2] Gonçalves, B., Conover, M., and Menczer, F. Abuse of social media and political manipulation. In The Death of The Internet, M. Jakobsson, Ed. Wiley, 2012. [3] Grier, Chris and Thomas, Kurt and Paxson, Vern and Zhang, M. @ spam : The Underground on 140 Characters or Less Categories and Subject Descriptors. In Proceedings of the 17th ACM conference on Computer and communications security, ACM (Chicago, IL, USA, 2010), 27 37. [4] Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., and Van Alstyne, M. Computational social science. Science 323, 5915 (2009), 721 723. [5] McKelvey, K., Rudnick, A., Conover, M. D., and

Menczer, F. Visualizing Communication on Social Media: Making Big Data Accessible. In Collective Intelligence as Community Discourse and Action Workshop, ACM CSCW 12 (Feb. 2012). [6] Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Patil, S., Flammini, A., and Menczer, F. Truthy: Mapping the spread of astroturf in microblog streams. In Proc. 20th Intl. World Wide Web Conf. Companion (WWW) (2011). [7] Ratkiewicz, J., Conover, M. D., Meiss, M., Goncalves, B., Flammini, A., and Menczer, F. Detecting and tracking political abuse in social media. In Fifth International AAAI Conference on Weblogs and Social Media (2011). [8] Team, N. W. B. Network workbench tool. Indiana University, Northeastern University, and University of Michigan, 2006. [9] Vespignani, A. Predicting the behavior of techno-social systems. Science 325, 5939 (2009), 425 428. Figures Figure 1: Theme detail page for Social Movements. One can click a meme to reach its meme detail page.

Figure 2: Diffusion networks associated with multiple Twitter memes. Nodes represent individual users and edges show how these memes spread from user to user by way of mentions (orange) and retweets (blue). In clockwise order: @whitehouse, #nightclub, #tcot, #syria, #rsvp, @michelleobama Figure 4: Push-button tool which allows one to tweet about a particular meme. Options include Truthy and Spam. Figure 5: Interface one sees after clicking on a user in the interactive network.

Figure 3: Meme detail page for #p2, focusing on the interactive visualization depicting the most active users. Node size is a function of the number of retweets. Left-leaning users are colored blue while right-leaning are colored red. One can click any node in order to learn more about that user.