What is Big Data? Jaakko Hollmén. Aalto University School of Science Helsinki Institute for Information Technology (HIIT) Espoo, Finland

Similar documents
Notes from a seminar on "Tackling Public Sector Fraud" presented jointly by the UK NAO and H M Treasury in London, England in February 1998.

EDUCATION EMPLOYMENT. 2009: Elected to Member of IBM Academy of Technology.

AI powering Corporate Communications

Horizon Scanning. Why & how to launch it in Lithuania? Prof. Dr. Rafael Popper

Machine Learning and Data Mining Course Summary

Applied Applied Artificial Intelligence - a (short) Silicon Valley appetizer

computational social networks 5th pdf Computational Social Networks Home page Computational Social Networks SpringerLink

The Next Era of Global Technological Development

Regionaal Platform. 19 oktober 2016

URI Imagine the Future

COM C. Rozwell

«Digital transformation of Pharma and API Plants: a way to create value for long term sustainability» G. Burba

The A.I. Revolution Begins With Augmented Intelligence. White Paper January 2018

Overview of Research toward Realization of Intelligent Society

MORE POWER TO THE ENERGY AND UTILITIES BUSINESS, FROM AI.

Keynotes. Visual Mining Interpreting Image and Video. Stefan Rüger Professor Knowledge Media Institute, The Open University, UK

Advanced Analytics for Intelligent Society

The Potential of IBM s Watson to Improve Diagnostic Accuracy Through Unstructured Data Analysis

Innovation Crossover Research Life Sciences/Biomedical Health Informatics. Distribution Statement A: Approved for Public Release

Haodong Yang, Ph.D. Candidate

Special issue on behavior computing

The future of work. Nav Singh Managing Partner, Boston McKinsey & Company

Collaboration with Huawei towards research and educational excellence. Professor Archie Johnston Dean Engineering and Information Technologies

BE THE FUTURE THE WORLD S LEADING EVENT ON AI IN MEDICINE & HEALTHCARE

PHARMACEUTICALS: WHEN AI ADOPTION HAS GATHERED MOST MOMENTUM.

The Modern Design Organization. Leah Buley, UX London May 2016

Front Digital page Strategy and Leadership

How Machine Learning and AI Are Disrupting the Current Healthcare System. Session #30, March 6, 2018 Cris Ross, CIO Mayo Clinic, Jim Golden, PwC

e-science Acknowledgements

Developing an Embedded Digital Twin for HVAC Device Diagnostics

Artificial Intelligence: Why businesses need to pay attention to artificial intelligence?

Navigating the AI Adoption Minefield Pitfalls, best practices, and developing your own AI roadmap April 11

Human Factors in Control

HAPPY JUNE! QUOTES. Biostatistics and Bioinformatics Department. Biostatistics and Bioinformatics. Inside This Issue

Big Data Best Practice

Medical Intelligence:

The PRACE Scientific Steering Committee

Enabling a Smarter World. Dr. Joao Schwarz da Silva DG INFSO European Commission

Making Precision Medicine A Reality: Molecular Diagnostics, Remote Health Status Monitoring and the Big Data Challenge

KÜNSTLICHE INTELLIGENZ JOBKILLER VON MORGEN?

Find and analyse the most relevant patents for your research


Copyright: Conference website: Date deposited:

Heriot-Watt University

Evolution and scientific visualization of Machine learning field

Trends in the European Location Market

This list supersedes the one published in the November 2002 issue of CR.

Analysis of Data Mining Methods for Social Media

Decision Support System EBMeDS. Timo Haikonen

Front Digital page Strategy and leadership

User Research in Fractal Spaces:

THE C-SUITE TECHNOLOGY AGENDA

Banning Garrett, PhD

Computer Log Anomaly Detection Using Frequent Episodes

Keeping up with the times Tensions between workflow, status quo, and technology

Innovation at TCS. Sharmila Mande Principal Scientist and Head- Bio Sciences R&D TCS Innovation Labs- Hyderabad

Social Data Analytics Tool (SODATO)

BOLD: Exponential Growth and the Democratization of the World

AIMed Artificial Intelligence in Medicine

IBM Research Zurich. A Strategy of Open Innovation. Dr. Jana Koehler, Manager Business Integration Technologies. IBM Research Zurich

From Sensor to Data Driven Operation

Technology Trends with Digital Transformation

Adopting Standards For a Changing Health Environment

How machines learn in healthcare

USTGlobal. Internet of Medical Things (IoMT) Connecting Healthcare for a Better Tomorrow

Cellular-based Vehicle to Pedestrian (V2P) Adaptive Communication for Collision Avoidance

Copenhagen IMIA Board and General Assembly Meetings August 19-20, 2013 Meeting Room 17 (Board); Meeting Room 19 (General Assembly) Bella Center

Iowa State University Library Collection Development Policy Computer Science

The Long Tail of Research Data

End-User Innovation in Evidence-Based Medicine: AND IMPLICATIONS FOR HEALTH CARE POLICIES AND PRACTICES

Development and Integration of Artificial Intelligence Technologies for Innovation Acceleration

Computer Science at James Madison University

Artificial Intelligence & Manufacturing 4.0

The Key to the Internet-of-Things: Conquering Complexity One Step at a Time

DRAFT AGENDA. A Unique Education-only Event for Anyone Needing to Better Understand AI and Machine Learning!

How Smart is your city?

Advances and Perspectives in Health Information Standards

INVESTMENT OPPORTUNITY HALAX HEALTH TECHNOLOGY BIGGEST TOILET INNOVATIONS SINCE 1917

& Medical Tourism. DIHTF - Dubai 20 th -21 st Feb 2018 V S Venkatesh -India

8/30/2016. Preparing Students for Their Future. Bill Daggett Founder and Chairman September 7, What has changed.

Development of Innovation Strategy and Patent Systems. Paik Saber Assistant General Counsel, IP Law IBM Asia Pacific

SUNG-UK PARK THE 4TH INDUSTRIAL REVOLUTION AND R&D POLICY

Transer Learning : Super Intelligence

Operations & Technology OPERATIONS & TECHNOLOGY CONFERENCE & EXHIBITION SEPTEMBER 11-13, 2017 THE WESTIN CHARLOTTE CHARLOTTE, NC

Artificial Intelligence for Social Impact. February 8, 2018 Dr. Cara LaPointe Senior Fellow Georgetown University

Indiana State University Job Growth Report

THE TECH MEGATRENDS Christina CK Kerley

MAKING IOT SENSOR SOLUTIONS FUTURE-PROOF AT SCALE

Geocoding DoubleCheck: A Unique Location Accuracy Assessment Tool for Parcel-level Geocoding

The Eleventh Advanced International Conference on Telecommunications (AICT 2015) June 21-26, Brussels, Belgium

Academia to Data Science. Faye Zheng Program Director Insight Data Science

Mobile Sensing Data for Urban Mobility Analysis: A Case Study in Preprocessing

Disrupting our way to a Very Human City

PoS(ISGC 2013)025. Challenges of Big Data Analytics. Speaker. Simon C. Lin 1. Eric Yen

TTÜ infotehnoloogiateaduskond Informaatikainstituut. Enn Õunapuu Vanemteadur

Army Research Laboratory -Orlando TSIS 2017

Digitalisation of the medicines lifecycle: challenges and opportunities

How much are you worth? Paxus 2016/17 Technology and Digital Salary Guide

Artificial Intelligence and Robotics Getting More Human

Digital Health, Technology and Life Sciences. Skip Fleshman

Transcription:

What is Big Data? Jaakko Hollmén Aalto University School of Science Helsinki Institute for Information Technology (HIIT) Espoo, Finland 6.2.2014

Speaker profile Jaakko Hollmén, senior researcher, D.Sc.(Tech.) Department of Information and Computer Science, Aalto University School of Science and Helsinki Institute for Information Technology (HIIT), Finland Industrial and university research in data analysis related topics since 1995, D.Sc.(Tech.) in computer science in 2000 Current research scope: machine learning, data mining, predictive analytics, time series analysis and prediction Exposure to various application areas: process industry, telecommunications, biology, medicine, environmental informatics, analysis of built environment Contact information in the end of the presentation

Speaker, short biography Jaakko Hollmén (b. 1970) received the degrees of M.Sc. (Tech.) in 1996, Lic.Sc. (Tech.) in 1999, and D.Sc. (Tech.) in 2000, all at the Department of Computer Science and Engineering at the Helsinki University of Technology in Finland. Since 2000, he has worked at the Department of Information and Computer Science (formerly Laboratory of Computer and Information Science) at the Aalto University School of Science in Finland. Currently, he is a Chief Research Scientist at Aalto University School of Science. He leads a research group Parsimonious Modelling at Helsinki Institute for Information Technology. The research group develops computational methods for data analysis and applies these methods on two particular application fields: cancer genomics and environmental informatics. Jaakko Hollmén's research interests include theory and practice of machine learning and data mining, especially their applications in bioinformatics and environmental time series analysis. He has served in program committees of conferences, such as SIGKDD, ICDM, ECML/PKDD, PAKDD, UAI. DS, and IDA. In 2011, he was the program chair for 14 th International Conference on Discovery Science (DS 2011) in Porto, Portugal and Tenth International Symposium on Intelligent Data Analysis (IDA 2011), held in Espoo, Finland. In 2012, he was General Chair of the Eleventh International Symposium on Intelligent Data Analysis (IDA 2012), held in Helsinki, Finland. He is an author of over 30 journal articles and 50 conference contributions. He is the volume editor of 3 conference proceedings and holds editorial positions in three journals in his areas of interest. He is an inventor in two patents. Jaakko Hollmén is a Senior Member of IEEE.

Before Big Data 1000+ years: Data Analysis 100+ years: Statistics 50 years: Transistor, Computers, Artificial Intelligence 40 years: Internet 30 years: microcomputers 20 years: World Wide Web 20 years: Search engines for the Web Sensor technology, massive deployment 10 years: Social networks 2-3 years: Big Data

Of photographs, large and small Think of a simple photo and think of questions you can easily answer: how many persons? etc. Then, take the largest photograph in the world and see how the situation changes: how to answer most questions becomes non-obvious!

Of photographs, large and small World s largest photograph, size 320 Gigapixels

Of photographs, large and small Largest photograph of the world, taken from BT Telecom Tower in London in 2012 by 360Cities A single photograph: 320 Gigapixels The printed photo would be about 25 m by 100 m Photo consists of 46000 high-resolution photos, stitched together with computational techniques Web application with 1 million image patches, see the site: http://btlondon2012.co.uk/pano.html Posing and answering questions is non-obvious and may be a huge amount of work!

Big Data definitions First approach to defining Big Data: Processing of data becomes non-obvious and we end up in difficulties with normal tools and analytic processes Wikipedia definition: Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. A lasting definition: there will always be difficult things to solve with standard tools, however, the meaning of the definition will change over time

3V definition of Big Data Most often, Big Data is defined in terms of the 3 V s: Volume (amount of the data) Velocity (speed of information collection, processing, analysis) Variety (types of measurements, structured data vs. unstructured data) Other V s added (minor twists of lesser importance)

Big Data: Volume Unit of information: one bit, one byte A4 paper with written typewriter text: 5 kilobytes Pile of papers printed on both sides: 1 kilometer of papers is 100 Gigabytes, 10 km of papers is 1 Terabyte Facebook: 2.7 billion likes a day and 300 million photos in one day, resulting in 105 (NY Times, December, 2012) 105 Terabytes, a pile of papers equal to length of Finland (about 1000 km) Storage and retrieval problem, resource allocation

Big Data: Velocity The speed with which the data is created High sampling rates: mobile phone 30-50 Hz, that is, 30-50 samples per second, multiple sensors Extremely high sampling rates in scientific measurements settings Smart Cities proposal by IBM: world is intelligent, instrumented, and interconnected Data needs to be analyzed immediately, predictions provided within milliseconds

Big Data: Variety Many measurement modalities: video, numbers, text Structured data vs. non-structured data Images, unstructured text, likes, places, names, product names How to structure the unstructured data? Areas: Text mining, Sentiment analysis This may be the most difficult dimension of the 3V definition: how to combine different sources of data with differing modalities?

Big Data definitions, remarks Big Data has more to it than mere size, name is a misnomer and oversimplifies what is important for the topic Volume is not the most difficult to handle, seen from the analytics point of view Variety may the most challenging, how to combine the different modes of measurement? Prediction: name Big Data will be obsolete by 2014 (or 2015), the topic will still be important Maturity of Big Data? Over-expectations in the past

Maturity of Big Data Gartner Hype Cycle: Big Data is currently in the trough of disillusionment (blogs.gartner.com)

Big Data: Need for analytics Twitter feed @IBMSPSS, August, 2012: #BigData without #analytics is like a flashlight without batteries. Analytics shines the light on where to go next. Analytics serve situations best, when there is a lot data and little understanding (J.H.) Data has been compared to being the new oil, as a new kind of raw material: what you do with the data and how you refine it makes the difference!

Data analysis problems in general Prediction Classification Pointing out the relevant variables Profiling and finding natural groups These problems illustrated through case studies from the personal research of the presenter

Time series prediction Data: daily electricity consumption Model: predict the future with the knowledge of the past Benefits: capacity planning, pricing

Classifying patients Data: patient profile, genetic markers Model: probablity (risk) model for disease classification Benefits: diagnostics, prioritizing care, personalized medicine

Selection of important variables Data: monitored nutrients in time Model: prediction model, use only relevant information Benefits: improve understanding, pinpoint important variables

Profiling a patient database Data: DNA amplification data from cancer patients Model: profile the patients with probabililty models Benefits: improve understanding, classify patients Unstructured

Profiling a patient database Data: DNA amplification data from cancer patients Model: profile the patients with probability models Benefits: improve understanding, classify patients Structured

Summary of the illustrated analyses Benefits come from the combination of data and the analysis work Illustrated analyses: prediction, classification, profiling, pointing out the relevant variables, profiling and finding natural groups Plenty of challenges when applying to Big Data scenarios

Big Data Application areas Smarter healthcare Finance, trading analysis Telecom Log Analysis Traffic control Search quality Fraud detection, risk analysis Retail, churn detection Process industry Natural sciences

Summary and Conclusions Big Data will inevitably change a lot of business practices Many will take two steps: to Data and to Big Data Inflated expectations, hype, still an important area for a longer period of time, no turning back Non-obvious solutions using standard tools, existing divide-and-conquer solutions for Big Data problems How would Big Data support your business goals? Businesses must know the important questions, analysis and modelling help in providing the answers Benefits from Big Data vs. investments in capabilities and resources

Contact information Jaakko Hollmén, D.Sc.(Tech.), Chief Research Scientist Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland Helsinki Institute for Information Technology (HIIT) Web: http://users.ics.aalto.fi/jhollmen/ Twitter: @jhollmen E-mail: Jaakko.Hollmen@aalto.fi Telephone: +358-50-3260110