CptS 483:04 Introduction to Data Science

Similar documents
CptS 475/575: Data Science. What is Data Science? Fall 2018

13 Dec 2pm-5pm Olin Hall 218 Final Exam Topics

Info 2950, Lecture 26

Overview of the NSF Programs

Proposers Day Workshop

Electrical, Computer and Software Engineering

ArkPSA Arkansas Political Science Association

Pure Versus Applied Informatics

L ESSONS FROM THE C REATION OF THE G EORGIA TECH COLLEGE

Computing Disciplines & Majors

INTERNET OF THINGS IOT ISTD INFORMATION SYSTEMS TECHNOLOGY AND DESIGN

Computational Sciences and Engineering (CSE): A New Paradigm in Scientific Research & Education. Abul K. M. Fahimuddin

Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

Options in Computing Education in the United States

TRAINING THE NEXT GENERATION OF QUANTITATIVE BIOLOGISTS IN THE ERA OF BIG DATA

SJSU Annual Program Assessment Form Academic Year

President Barack Obama The White House Washington, DC June 19, Dear Mr. President,

What Is Computing? Bridging the Gap Between Teenagers Perceptions and Graduate Students Experiences

PROGRAMME SYLLABUS Sustainable Building Information Management (master),

END EXAMINATION TIME TABLE OF II-B.TECH-I-SEM-R07-SUPPLE-NOV-DEC 2016 Examination Timings: A.M. To P.M.

ENSURING READINESS WITH ANALYTIC INSIGHT

Wheel Health Monitoring Using Onboard Sensors

Computer & Information Science & Engineering (CISE)

Engineering Fundamentals and Problem Solving, 6e

Introduction to Computer Engineering

Bringing Wireless Communications Classes into the Modern Day

Mathematics for Data Science

BSc in Music, Media & Performance Technology

An Oral History of Computer Science. Cornell University.

Data Science and its role in Big Data analytics

PhD Non-Academic Careers and Job Search. Deb Agarwal Laura M. Haas Rita H. Wouhaybi

JNTUH COLLEGE OF ENGINEERING, HYDERABAD (AUTONOMOUS) III Year B.Tech. I semester (Regular / Supply) EXAMINATIONS, NOVEMBER 2014 REVALUATION RESULTS

Machine Learning for Hardware Design. Elyse Rosenbaum University of Illinois at Urbana- Champaign Oct. 18, 2017

Intro to Systems Theory and STAMP John Thomas and Nancy Leveson. All rights reserved.

How the analysis of structural holes in academic discussions helps in understanding genesis of advanced technology

Hypernetworks in the Science of Complex Systems Part I. 1 st PhD School on Mathematical Modelling of Complex Systems July 2011, Patras, Greece

Can we better support and motivate scientists to deliver impact? Looking at the role of research evaluation and metrics. Áine Regan & Maeve Henchion

Design and Creation. Ozan Saltuk & Ismail Kosan SWAL. 7. Mai 2014

Micaela Serra Dept. of Computer Science University of Victoria

2.6.1: Program Outcomes

2. What is Text Mining? There is no single definition of text mining. In general, text mining is a subdomain of data mining that primarily deals with

EPD ENGINEERING PRODUCT DEVELOPMENT

Whiting School of Engineering Interdisciplinary Centers and Institutes. Education. Research. Translation.

Computer Science and Philosophy Information Sheet for entry in 2018

Powering Human Capability

arxiv: v1 [cs.lg] 2 Jan 2018

THE NATIONAL INSTITUTE OF ENGINEERING, Mysore UG - Semester End Examination Schedule - December 2014

Computer & Information Science & Engineering What s All This?

Mechanical Engineering

Master s Programme. in Embedded and Intelligent Systems, 120 credits.

Statistical Pulse Measurements using USB Power Sensors

Evidence Engineering. Audris Mockus University of Tennessee and Avaya Labs Research [ ]

History and Perspective of Simulation in Manufacturing.

Computational Science and Engineering Introduction

Статистическая обработка сигналов. Введение

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

1 Educational Experiment on Generative Tool Development in Architecture PatGen: Islamic Star Pattern Generator

Centre for Doctoral Training: opportunities and ideas

On the Diversity of the Accountability Problem

Defining analytics: a conceptual framework

Computer Science at James Madison University

Great Minds. Internship Program IBM Research - China

Introduction To Automata Theory Languages And Computation Addison Wesley Series In Computer Science

Intelligent Infrastructures Systems for Sustainable Urban Environment

A.I in Automotive? Why and When.

SHOULD YOU STUDY ENGINEERING?

TANGIBLE IDEATION: HOW DIGITAL FABRICATION ACTS AS A CATALYST IN THE EARLY STEPS OF PRODUCT DEVELOPMENT

PURELY NEURAL MACHINE TRANSLATION

Information in Command and Control: Connecting Mission Command and Social Network Analysis

STUDENT FOR A SEMESTER SUBJECT TIMETABLE MAY 2018

EE482: Digital Signal Processing Applications

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

I. INTRODUCTION A. CAPITALIZING ON BASIC RESEARCH

FP7 ICT Call 6: Cognitive Systems and Robotics

GUIDELINES SOCIAL SCIENCES AND HUMANITIES RESEARCH MATTERS. ON HOW TO SUCCESSFULLY DESIGN, AND IMPLEMENT, MISSION-ORIENTED RESEARCH PROGRAMMES

Feature analysis of EEG signals using SOM

BI TRENDS FOR Data De-silofication: The Secret to Success in the Analytics Economy

Bowling Green Perspective (BGP) Assessment Data Humanities & The Arts (HA)

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT)

Information and Communication Technology

Crafting a 21 st Century Undergraduate Engineering Programme for Sub-Saharan Africa

How Explainability is Driving the Future of Artificial Intelligence. A Kyndi White Paper

Introduction to the X PRIZE Foundation

Master in Computer Science & Business Technology Your gateway to build the tech of the future

On Intelligence Jeff Hawkins

CS8678_L1. Course Introduction. CS 8678 Introduction to Robotics & AI Dr. Ken Hoganson. Start Momentarily

Application of Soft Computing Techniques in Water Resources Engineering

Electrical Engineering

Chitika Insights The Value of Google Result Positioning

Data processing framework for decision making

Institute of Information Systems Hof University

League of Legends: Dynamic Team Builder

ARTIFICIAL INTELLIGENCE (AI): HYPE OR HOPE?

Engineering at a Games Company: What do we do?

Anticipation/Reaction Guide

Accessing NASA Earth Science Data / Open Data Policy

Job Title: DATA SCIENTIST. Location: Champaign, Illinois. Monsanto Innovation Center - Let s Reimagine Together

Transforming while performing Deep Dive: Artificial Intelligence. Hype or not?

The Nature of Informatics

Affordable Real-Time Vision Guidance for Robot Motion Control

Transcription:

CptS 483:04 Introduction to Data Science What Is Data Science? Assefaw Gebremedhin Fall 2017

What is Data Science? Big Data and Data Science hype and getting past the hype Why now? Current landscape of perspectives Skill sets needed

Big Data and Data Science Hype What might be eyebrow-raising about Big Data and Data Science? Lack of definition around basic terminology Lack of recognition for researchers in academia and industry who have been working on this kind of stuff for years The hype is crazy Statisticians might perceive this whole movement as an identity theft Some say anything that has to call itself a science isn t Source: Doing Data Science (O Neil & Schutt, 2013).

Getting past the hype Around all the hype, there is a ring of truth Data Science is something new it has access to a larger body of knowledge and methodology as well as a process that has foundations in both statistics and computer science. [DDS, O Neil and Schutt] We are here in this course to understand this better and contribute to the ongoing pursuit of a sharper definition.

Quote from Intro of Foundations of Data Science manuscript by Avrim Blum, John Hopcroft and Ravindran Kannan (2015) Computer science as an academic discipline began in the 60 s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context free languages, and computability. In the 70 s, algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in modern setting. The emergence of the web and social networks, which are by far the largest such structures, presents both opportunities and challenges for theory. John Hopcroft

Quote from Intro of Foundations of Data Science manuscript by Avrim Blum, John Hopcroft and Ravindran Kannan (2015) While traditional areas of computer science are still important and highly skilled individuals are needed in these areas, the majority of researchers will be involved with using computers to understand and make usable massive data arising in applications, not just how to make computers useful on specific well-defined problems. With this in mind we have written this book to cover the theory likely to be useful in the next 40 years, just as automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods. John Hopcroft

Why Now? Enablers of today s big data revolution Proliferation of sensors Creation of almost all information in digital form Datafication Dramatic cost reduction in storage You can afford to keep all the data Dramatic increases in network bandwidth You can move the data to where it is needed Dramatic cost reduction and scalability improvements in computation Dramatic algorithmic breakthroughs Machine Learning, Data Mining, Fundamental advances in CS and Statistics Ever more powerful models producing ever increasing volumes of data that must be analyzed

Current landscape (of perspectives) Example 1. Metamarket CEO Mike Driscolli (on Quora discussion from 2010 on What is Data Science ): Data Science, as practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics. But data science is not merely hacking because when hackers finish debugging their Bash one-liners and Pig scripts, few of them care about non-euclidean distance metrics. And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a tab-delimited file into R if their job depended on it. Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools and materials, coupled with a theoretical understanding of what s possible.

Current landscape (of perspectives) Example 2. Drew Conway s Venn diagram of DS (2010)

Current landscape (of perspectives) Example 3. Vasant Dhar, in the article Data Science and Prediction, Communications of the ACM, Dec 2013, makes the following three big points: http://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext Data Science is the study of the generalizable extraction of knowledge from data. A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just its ability to explain the past. A data scientist requires an integrated skill set spanning math, ML, statistics, computer science, along with a deep understanding of the craft of problem formulation to engineer effective solutions.

A Data Science Profile Computer science Math Statistics Machine Learning Domain expertise Communication and presentation skills Data visualization

Author Schutt s data science profile