Social Media Intelligence in Practice: The NEREUS Experimental Platform. Dimitris Gritzalis & Vasilis Stavrou June 2015

Similar documents
Understanding the city to make it smart

Latest trends in sentiment analysis - A survey

Sentiment Analysis of User-Generated Contents for Pharmaceutical Product Safety

INTELLIGENT SOFTWARE QUALITY MODEL: THE THEORETICAL FRAMEWORK

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

A Machine Learning Based Approach for Predicting Undisclosed Attributes in Social Networks

Liangliang Cao *, Jiebo Luo +, Thomas S. Huang *

Personal Data Protection Competency Framework for School Students. Intended to help Educators

Towards EU-US Collaboration on the Internet of Things (IoT) & Cyber-physical Systems (CPS)

Machines can learn, but what will we teach them? Geraldine Magarey

Volume 3, Number 3 The Researcher s Toolbox, Part II May 2011

MOBILE DATA INTEROPERABILITY ALGORITHM USING CHESS GAMIFICATION

CS231A Final Project: Who Drew It? Style Analysis on DeviantART

Violent Intent Modeling System

DELIVERABLE SEPE Exploitation Plan

First analysis applicants and applications

ISSN: (Online) Volume 2, Issue 4, April 2014 International Journal of Advance Research in Computer Science and Management Studies

Privacy in Mini-drone Based Video Surveillance

FUTURE TECHNOLOGIES FUTURE PRIVACY CHALLENGES

encompass - an Integrative Approach to Behavioural Change for Energy Saving

Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning

Machine Learning for Antenna Array Failure Analysis

AN AUTONOMOUS SIMULATION BASED SYSTEM FOR ROBOTIC SERVICES IN PARTIALLY KNOWN ENVIRONMENTS

GamECAR JULY ULY Meetings. 5 Toward the future. 5 Consortium. E Stay updated

Generating Groove: Predicting Jazz Harmonization

Knowledge discovery & data mining Classification & fraud detection

Comment on Providing Information Promotes Greater Public Support for Potable

The Key to the Internet-of-Things: Conquering Complexity One Step at a Time

Current Technologies in Vehicular Communications

Techniques for Sentiment Analysis survey

TECHNOLOGY FOR HUMAN TRAFFICKING & SEXUAL EXPLOITATION TRACE PROJECT FINDINGS & RECENT UPDATES

TF-IDF

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Web 2.0 in social science research

HSX: ROLE OF BIG DATA

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

ENVISIONING TORONTO S LOW- CARBON FUTURE. Mark Bekkering Mary Pickering

Integrated Detection and Tracking in Multistatic Sonar

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Comparative Study of various Surveys on Sentiment Analysis

Getting started Guide

Privacy preserving data mining multiplicative perturbation techniques

A Review of Related Work on Machine Learning in Semiconductor Manufacturing and Assembly Lines

An Agent-based Heterogeneous UAV Simulator Design

Photo Quality Assessment based on a Focusing Map to Consider Shallow Depth of Field

Workshop on anonymization Berlin, March 19, Basic Knowledge Terms, Definitions and general techniques. Murat Sariyar TMF

WHITE PAPER. NLP TOOL (Natural Language Processing) User Case: isocialcube (Social Networks Campaign Management)

Duplication and/or selling of the i-safe copyrighted materials, or any other form of unauthorized use of this material, is against the law.

Human-Swarm Interaction

Learning with Confidence: Theory and Practice of Information Geometric Learning from High-dim Sensory Data

UX Aspects of Threat Information Sharing

General Questionnaire

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Automated FSM Error Correction for Single Event Upsets

MatMap: An OpenSource Indoor Localization System

Mobile Crowdsensing enabled IoT frameworks: harnessing the power and wisdom of the crowd

Wireless Sensor Network Assited Fire Detection And Prevention With Classification Algorithms

Using Deep Learning for Sentiment Analysis and Opinion Mining

Learning and Using Models of Kicking Motions for Legged Robots

Metrology in the Digital Transformation

SELECTING RELEVANT DATA

networked Youth Research for Empowerment in the Digital society MANIFESTO

Big Data Framework for Synchrophasor Data Analysis

Front Digital page Strategy and Leadership

STRATEGO EXPERT SYSTEM SHELL

SSB Debate: Model-based Inference vs. Machine Learning

Machine Learning and Data Mining Course Summary

A Profile-based Trust Management Scheme for Ubiquitous Healthcare Environment

Learning and Using Models of Kicking Motions for Legged Robots

Knowledge-based Reconfiguration of Driving Styles for Intelligent Transport Systems

Workshop on Intelligent System and Applications (ISA 17)

Socialization and Intimacy in Digital Environments

Potential areas of industrial interest relevant for cross-cutting KETs in the Electronics and Communication Systems domain

Liquid Benchmarks. Sherif Sakr 1 and Fabio Casati September and

Genbby Technical Paper

Local and Low-Cost White Space Detection

Initial communication and dissemination plan. Elias Alevizos, Alexander Artikis, George Giannakopoulos. Scalable Data Analytics Scalable Algorithms,

Fraunhofer ISI Seite 1

Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

A Platform for the Development and Evaluation of Passive Safety Applications*

An Un-awarely Collected Real World Face Database: The ISL-Door Face Database

GTC Todd Bacastow, DigitalGlobe Radiant Todd Stavish, In-Q-Tel CosmiQ Works

Global Standards Symposium. Security, privacy and trust in standardisation. ICDPPC Chair John Edwards. 24 October 2016

Performance of Specific vs. Generic Feature Sets in Polyphonic Music Instrument Recognition

IEEE Systems, Man, and Cybernetics Society s Perspectives and Brain-Related Technical Activities

Predicting the movie popularity using user-identified tropes

Support Vector Machine Classification of Snow Radar Interface Layers

Machine Learning for Language Technology

Data Quality Measures for Identity Resolution

When Players Quit (Playing Scrabble)

Challenging the Situational Awareness on the Sea from Sensors to Analytics. Programme Overview

Analogy Engine. November Jay Ulfelder. Mark Pipes. Quantitative Geo-Analyst

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

MEASURING PRIVACY RISK IN ONLINE SOCIAL NETWORKS. Justin Becker, Hao Chen UC Davis May 2009

Biometric Authentication for secure e-transactions: Research Opportunities and Trends

Front Digital page Strategy and leadership

Context-Aware Movie Recommendations: An Empirical Comparison of Pre-filtering, Post-filtering and Contextual Modeling Approaches

Smart Cards in the Public Sector

Online Large Margin Semi-supervised Algorithm for Automatic Classification of Digital Modulations

Transcription:

Social Media Intelligence in Practice: The NEREUS Experimental Platform Dimitris Gritzalis & Vasilis Stavrou June 2015

Social Media Intelligence in Practice: The NEREUS Experimental Platform 3 rd Hellenic Forum for Science, Technology & Innovation Athens, June 2015 Dimitris Gritzalis & Vasilis Stavrou Information Security & Critical Infrastructure Protection Laboratory Dept. of Informatics Athens University of Economics & Business

Presentation outline Web 2.0 and Online Social Networks Open Source and Social Media Intelligence The NEREUS Framework SOCMINT and behavior prediction capabilities Conclusions

Web 2.0 and Online Social Networks (OSN) Source: http://socialmediatoday.com/

Open Source & Social Media Intelligence Open Source Intelligence (OSINT) is produced from publicly available information, which is: Collected, exploited and disseminated in a timely manner Offered to an appropriate audience Used for the purpose of addressing a specific intelligence requirement Publicly available information refers to (not only): Traditional media (e.g. television, newspapers, radio, magazines) Web-based communities (e.g. social networking sites, blogs) Public data (e.g. government reports, official data, public hearings) Amateur observation/reporting (e.g. amateur spotters, radio monitors) Social Media Intelligence (SOCMINT) is produced from Online Social Networks and the Web 2.0

Revealing attitude towards law enforcement/infringement OSINT Means utilized for the analysis OSN: YouTube Science Computing Sociology Theory Machine Learning Data Mining Social Learning Theory Applications: (a). Assist in detecting attitude towards law enforcement/ infringement (b). Assist in detecting deviant behavior of minors

NEREUS: Architecture in a nutshell Flat data path Comments classification path

The utmost importance of the social context Authoritarian Regimes Revealing personal attitude towards law enforcement/ infringement will be used by the Regime against resisting procivic rights movements. Pro-civic rights movements should prevent such platforms from being used by the Regime, using any available means. Democratic States Revealing personal attitude towards law enforcement/ infringement may be used to protect Democracy from its opponents. Democratic States may resist to social changes supported by, for example, grassroots political rights movements. Democratic States may make use of such intrusive platforms, provided they are put under strict democratic control.

Revealing attitude towards law enforcement/infringement Attitude towards law infringement Individuals tend to transfer online their offline behavior Study: Motive, anger, frustrations, predisposition towards law enforcement/ infringement Means: Machine Learning, comment classification, flat data classification. Identify users attitude towards law enforcement/infringement Assist in detecting delinquent behavior Assist in predicting deviant behavior of minors

Dataset description

Machine Learning (1/2) Comment classified into categories of interest: Process performed as text classification Machine trained with text examples and the categoryeach one belongs to Excessive support by field expert (Sociologist) Test set used to evaluate efficiency of resulting classifier: Contains pre-labeled data fed to machine, labeled by field expert Check if initial assigned label is equal to predicted one Testing set labels assigned by field expert Most comments written in Greek/greeklish Convertion of greeklish text to Greek Categories of content defined: Users with a negative attitude towards law enforcement (Predisposed negatively (P)) Users with a not negative attitude towards law enforcement (Not-predisposed negatively (N))

Machine Learning (2/2) Comment Video classification: classification using: Naïve Examination Bayes (NB) of a video on the basis of its comments Support Voter process Vector Machines to determine (SVM) category classification (Video) Logistic Regression Lists classification: (LR) Classifiers Voter process efficiency to determine comparison: category classification (same threshold) Conclusions Metrics (on % about basis): Precision, user behavior: Recall, F-Score, Accuracy Logistic If there Regression is at least one algorithm: category P attribute then the user is classified into LR category classifies Pa comment with 81% accuracy Comments Metrics Classifier NBM SVM LR Classes P N P N P N Precision 71% 70% 83% 77% 86% 76% Recall 72% 68% 75% 82% 74% 88% Uploads User F-Score 71% 69% 79% 79.5% 80% 81% Accuracy 70% 80% 81% Favorites Playlists Precision: Measures the classifier exactness. Higher and lower precision means less and more false positive classifications, respectively. Recall: Measures the classifier completeness. Higher and lower recall means less and more false negative classifications, respectively. F-Score: Weighted harmonic mean of both metrics. Accuracy: No. of correct classifications performed by the classifier. Equals to the quotient of good classifications by all data.

Analysis based on flat data Connection Addressing between the problem users from of category a different P and perspective: confidence of accuracy assumption-free of comments and easy-to-scale belonging method, to category P. Blue: verify Users (or of not) category the P results classified of the on the Machine basis of Learning the comment-oriented approach, tuple (Flat Data). Red: machine Users of trained category by P classified a set of on users the of basis categories of their comments-only P and N. (Machine Learning). Data transformation: User represented by a tuple (username, content of comment, video ID the comment refers to, country, age, genre, # of subscribers, # of video views). Machine trained by a user test set (Sociologist served as field expert). 1721 users are (almost certainly) negatively predisposed towards law enforcement/infringement Metrics Approach Machine Learning Flat Data Classifier Logistic Regression Naïve Bayes Classes P N P N Precision 86% 76% 72% 93% Recall 74% 88% 92% 73% F-Score 80% 81% 81% 82% Accuracy 81% 81%

Selected observations 6% of comments (among 2.000.000 collected) express negative attitude towards respecting the law (i.e., positive to law infringement) 3.5% of videos (among 200.000 collected) classified into a specific category of interest 14% of users (among 13.000 collected) express negative attitude towards respecting the law (i.e., positive to law infringement) Ability to assist in predicting delinquent behaviour of minors o Violent behaviour o Cyber bullying o Emotional or sexual harassment

General conclusions SOCMINT can transform into intelligence the vast amount of data produced by Web 2.0. SOCMINT is an intrusive technology and could put in danger civic rights. SOCMINT utilization is not - and should not be considered as - a solely technical issue. SOCMINT could assist in predicting attitude towards law infringement. SOCMINT could assist in predicting delinquent behavior of minors.

References 1. Gritzalis D., Stavrou V., Kandias M., Stergiopoulos G., Insider Threat: Εnhancing BPM through Social Media, in Proc. of the 6th IFIP International Conference on New Technologies, Mobility and Security (NMTS-2014), Springer, UAE, 2014. 2. Gritzalis D., Insider threat prevention through Οpen Source Intelligence based on Online Social Networks, Κeynote address, 13 th European Conference on Cyber Warfare and Security (ΕCCWS-2014), Greece, 2014. 3. Gritzalis D., Kandias M., Stavrou V., Mitrou L., "History of Information: The case of Privacy and Security in Social Media", in Proc. of the History of Information Conference, Law Library Publications, Athens, 2014. 4. Kandias M., Mitrou L., Stavrou V., Gritzalis D., Which side are you on? A new Panopticon vs. privacy, in Proc. of the 10 th International Conference on Security and Cryptography (SECRYPT-2013), pp. 98-110, Iceland, 2013. 5. Kandias M., Galbogini K., Mitrou L., Gritzalis D., "Insiders trapped in the mirror reveal themselves in social media", in Proc. of the 7 th International Conference on Network and System Security (NSS-2013), pp. 220-235, Springer (LNCS 7873), Spain, June 2013. 6. Kandias M., Virvilis N., Gritzalis D., "The Insider Threat in Cloud Computing", in Proc. of the 6 th International Conference on Critical Infrastructure Security (CRITIS-2011), pp. 93-103, Springer (LNCS 6983), United Kingdom, 2013. 7. Kandias M., Stavrou V., Bozovic N., Mitrou L., Gritzalis D., "Can we trust this user? Predicting insider s attitude via YouTube usage profiling", in Proc. of 10 th IEEE International Conference on Autonomic and Trusted Computing (ATC-2013), pp. 347-354, IEEE Press, Italy, 2013. 8. Kandias M., Stavrou V., Bosovic N., Mitrou L., Gritzalis D., Proactive insider threat detection through social media: The YouTube case, in Proc. of the 12 th Workshop on Privacy in the Electronic Society (WPES-2013), pp. 261-266, ACM Press, Germany, 2013. 9. Kandias M., Virvilis N., Gritzalis D., The Insider Threat in Cloud Computing, in Proc. of the 6 th International Workshop on Critical Infrastructure Secu-rity (CRITIS-2011), Bologna S., et al (Eds.), pp. 93-103, Springer (LNCS 6983), Switzerland, 2011. 10. Κandias M., Mylonas A., Virvilis N., Theoharidou M., Gritzalis D., An Insider Threat Prediction Model, in Proc. of the 7 th International Conference on Trust, Privacy, and Security in Digital Business (TrustBus-2010), pp. 26-37, Springer (LNCS-6264), Spain, 2010. 11. Mitrou L., Kandias M., Stavrou V., Gritzalis D., "Social media profiling: A Panopticon or Omniopticon tool?", in Proc. of the 6 th Conference of the Surveillance Studies Network, Spain, 2014. 12. Stavrou V., Kandias M., Karoulas G., Gritzalis D., "Business Process Modeling for Insider threat monitoring and handling", in Proc. of the 11 th International Conference on Trust, Privacy & Security in Digital Business (TRUSTBUS-2014), pp. 119-131, Springer (LNCS 8647), Germany, September 2014.