CptS 475/575: Data Science. What is Data Science? Fall 2018

Similar documents
CptS 483:04 Introduction to Data Science

Carnegie Mellon University, University of Pittsburgh

Information Infrastructure II (Data Mining) I211

1 Logistics. Chengkai Li. Department of Computer Science and Engineering University of Texas at Arlington Fall 2017

SJSU Annual Program Assessment Form Academic Year

MPJO : FEATURE WRITING GEORGETOWN UNIVERSITY: MPS- JOURNALISM Tuesdays, 6 p.m. to 9:20 p.m. Summer 2014

Info 2950, Lecture 26

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS. VISUALIZATION TECHNIQUES IDT 2630 (formerly IDT 1600)

ArkPSA Arkansas Political Science Association

Options in Computing Education in the United States

JEFFERSON COLLEGE COURSE SYLLABUS ART250 DIGITAL PHOTOGRAPHY II. 3 Credit Hours. Prepared by: Blake Carroll

PHOTOGRAPHY II SYLLABUS. SAMPLE SYLLABUS COURSE: AR320 Photography II NUMBER OF CREDIT HOURS: 3 PREREQUISITE: AR120

13 Dec 2pm-5pm Olin Hall 218 Final Exam Topics

Pure Versus Applied Informatics

JEFFERSON COLLEGE COURSE SYLLABUS ART150 DIGITAL PHOTOGRAPHY I. 3 credit hours. Prepared by: Blake Carroll

Angelina College Technology and Workforce Division TECHNICAL DRAFTING SYLLABUS DFTG 1405 Instructional Syllabus

Course Syllabus OSE 4240 OPTICS AND PHOTNICS DESIGN, 3 CREDIT HOURS

Course Syllabus OSE 3200 Geometric Optics

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

COWLEY COLLEGE & Area Vocational Technical School

ARTIFICIAL INTELLIGENCE (AI): HYPE OR HOPE?

Rev. December 2016 Angelina College Fine Arts Division ARTS 2356 /COMM 1318 Photography Instructional Syllabus Spring 2017 Instructional Syllabus

CSE 355: Human-aware Robo.cs Introduction to Theoretical Computer Science

Revised East Carolina University General Education Program

Academic Course Description. VL2004 CMOS Analog VLSI Second Semester, (Even semester)

Academic Course Description

SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO COURSE OUTLINE

BSc in Music, Media & Performance Technology

MSc(CompSc) List of courses offered in

Information Communication Technology

1. Demonstrate the ability to manipulate shutter speed, aperture, and other camera controls to correctly expose an image using the camera meter.

Proposers Day Workshop

GGS 412 Air Photography Interpretation

Course Syllabus. P age 1 5

APPROXIMATE KNOWLEDGE OF MANY AGENTS AND DISCOVERY SYSTEMS

Introduction To Automata Theory Languages And Computation Addison Wesley Series In Computer Science

Computer Science and Philosophy Information Sheet for entry in 2018

School of Computer Science. Course Title: Introduction to Human-Computer Interaction Date: 8/16/11

Introduction to Vision & Robotics

Applications of Machine Learning Techniques in Human Activity Recognition

Drafting & Design Technology

Course Syllabus OSE 3200 Geometric Optics

Artificial Intelligence in the Credit Department. Bob Karau CICP Manager of Client Financial Services Robins Kaplan LLP

INTRODUCTION TO MANAGEMENT SCIENCE TAYLOR CHAPTER 12

Coursework 2. MLP Lecture 7 Convolutional Networks 1

City University of Hong Kong. Course Syllabus. offered by Department of Computer Science with effect from Semester B 2016/17

#ARTS-110 COURSE SYLLABUS FOR PHOTOGRAPHY I. Michael DeRosa Instructor

Machine Learning for Antenna Array Failure Analysis

Introduction. amy e. earhart and andrew jewell

COMPUTER SCIENCE, SOCIOLOGY, COMMUNICATION

COM / ENG 267: Screenwriting Fundamentals -- Spring '14 Mon. & Wed :50am L & L 307

How Explainability is Driving the Future of Artificial Intelligence. A Kyndi White Paper

This course satisfies the Creative Arts core curriculum requirement.

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #14

Core Curriculum Content Standards (New Jersey State Department of Education)

Transportation Education in the New Millennium

ENH 110: Introduction to Literature

COURSE OUTLINE GRAPHIC COMMUNICATIONS FOR ARCHITECTURE wk Credits Class or Lecture Lab. Work Hours Course Length

COURSE SYLLABUS. ISE545: Technology Development and Implementation

Faculty of Mathematical and Computational Science Dept of Computer Science and Information Technology (CSIT) Guru Ghasidas Vishwavidyalaya, Bilaspur

Introduction to Computer Engineering

INAM-R2O07 - Environmental Intelligence

ES 330 Electronics II Fall 2016

CONSENT IN THE TIME OF BIG DATA. Richard Austin February 1, 2017

Artificial Intelligence and Deep Learning

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS AUTOCAD FOR INTERIOR DESIGN: STUDIO IV IDT 2305

PROGRAMME SYLLABUS Sustainable Building Information Management (master),

Module 1 : Numerical Methods for PDEs : Course Introduction, Lecture 1

RTVF INTRODUCTION TO SCREENWRITING. or, Writing for Visual Media. Tuesday & Thursday 9:30-10:50 AM (Media Arts building room 180-i)

Academic Course Description. BEE301 Circuit Theory Third Semester, (Odd Semester)

CM 21 Construction Graphics Course Syllabus Fall Instructor: Professor Keith Bisharat

Certificate. Estimated Program Length & Cost *

Academic Course Description. BHARATH UNIVERSITY Faculty of Engineering and Technology Department of Electrical and Electronics Engineering

Academic Course Description. BEC701 Fiber Optic Communication Seventh Semester, (Odd Semester)

TCET3202 Analog and digital Communications II

MOREHEAD STATE UNIVERSITY

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS MICROSTATION W/LAB CID 1200

Human + Machine How AI is Radically Transforming and Augmenting Lives and Businesses Are You Ready?

The Hong Kong Polytechnic University. Subject Description Form

Levels of Description: A Role for Robots in Cognitive Science Education

Telehealth and Digital Technology. Libbe Englander, PhD

Intro to Systems Theory and STAMP John Thomas and Nancy Leveson. All rights reserved.

Design and Implementation Options for Digital Library Systems

Overview of the NSF Programs

Information Visualization & Computer-supported cooperative work

OVERVIEW OF ARTIFICIAL INTELLIGENCE (AI) TECHNOLOGIES. Presented by: WTI

Los Angeles Mission College Art 201, #17692/17711 DRAWING I 3 Units, Spring 2018 (Feb. 5-June 4) Room: Pacoima City Hall No prerequisite needed.

BI TRENDS FOR Data De-silofication: The Secret to Success in the Analytics Economy

CJUS 361 CJUS 361. Note:

I. INTRODUCTION A. CAPITALIZING ON BASIC RESEARCH

Introduction to Computer Science - PLTW #9340

Sustainable Commercial Development MCEL Credit rating 10 Unit coordinator: Veronica Sanchez Romaguera. ECTS credits 5 Semester 1

A Conversation with Professor Shan Wang et al.

Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1

Preliminary QE TOPICS AND REFERENCES CORE COURSES

overblikk Framtidige teknologier et raskt Erik Lehne Managing Partner, Gartner Consulting

The Evolution of User Research Methodologies in Industry

Representation Learning for Mobile Robots in Dynamic Environments

COLLEGE OF DUPAGE Architecture Basic CADD-AutoCAD

Transcription:

CptS 475/575: Data Science What is Data Science? Fall 2018

First a good news Starting from Friday August 24 and for the remainder of the semester, the meeting location for the class has changed to CUE 319 CUE 319 is a bigger (and nicer) room Every one in waiting list will be enrolled!

Next a couple of left over slides from last time

Learning Outcomes Describe what Data Science is and the skill sets needed Describe the Data Science Process Use R to carry out statistical modeling and analysis Carry out exploratory data analysis (to gain insight) Apply machine learning algorithms for predictive modeling Correctly apply cross-validation to assess model performance Apply unsupervised learning methods to discover patterns, trends and anomalies in data Use effective data wrangling approaches to manipulate data Create effective visualization of data (to communicate or persuade) Reason around ethical and privacy issues in data science, and apply ethical practices Work effectively in teams on data science projects Apply knowledge gained in the course to carry out a project and write technical report

Weekly Schedule

Books No required textbook Lecture notes (slides) and reading material will be made available on the OSBLE+ page References Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 2013. (Freely available online) Cathy O'Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O'Reilly. 2014. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1. Cambridge University Press. 2014. (Freely available online) Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. Draft of a book, latest version, 2018. (Freely available online) Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers. 2012. Ethem Alpaydin. Introduction to Machine Learning. Third Edition. MIT Press, 2014. Nathan Yau. Visualize This: The FlowingData Guide to Design, Visualization, and Statistrics. Wiley Publications, 2011. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016. (Freely available online)

Policies Conduct in class Silence personal electronics Arrive on time and remain throughout the class Correspondence Happens via OSBLE+ Attendance Required. Make sure absences are cleared with me Missing or late work Max 48 hrs with 10% penalty per 24 hrs Academic Integrity Strongly enforced Consult syllabus for more details

Now to today s topic

What is Data Science? Outline: Big Data and Data Science hype and getting past the hype Why now? Landscape of perspectives Skill set needed

Big Data and Data Science Hype What might be eyebrow-raising about Big Data and Data Science? Lack of definition around basic terminology Lack of recognition for researchers in academia and industry who have been working on this kind of stuff for years The hype can be crazy Source: Doing Data Science (O Neil & Schutt, 2013).

Getting past the hype Around all the hype, there is a ring of truth Data Science is something new it has access to a larger body of knowledge and methodology as well as a process that has foundations in both statistics and computer science. [DDS, O Neil and Schutt] We are here in this course to understand this better and contribute to the pursuit of a sharper definition.

Quote from Introduction of Foundations of Data Science book by Avrim Blum, John Hopcroft and Ravindran Kannan (2018) (https://www.cs.cornell.edu/jeh/book.pdf) Computer science as an academic discipline began in the 60 s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context free languages, and computability. In the 70 s, algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks, which are by far the largest such structures, presents both opportunities and challenges for theory. John Hopcroft

Quote from Introduction of Foundations of Data Science book by Avrim Blum, John Hopcroft and Ravindran Kannan (2018) While traditional areas of computer science remain highly important, increasingly researchers of the future will be involved with using computers to understand and extract usable information from massive data arising in applications, not just how to make computers useful on specific well-defined problems. With this in mind we have written this book to cover the theory likely to be useful in the next 40 years, just as an understanding of automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is an increase in emphasis on probability, statistics, and numerical methods. John Hopcroft

Why Now? Enablers of today s big data revolution Proliferation of sensors Creation of almost all information in digital form Datafication Dramatic cost reduction in storage You can afford to keep all the data Dramatic increases in network bandwidth You can move the data to where it is needed Dramatic cost reduction and scalability improvements in computation Dramatic algorithmic breakthroughs Machine Learning, Data Mining, Fundamental advances in CS and Statistics Ever more powerful models producing ever increasing volumes of data that must be analyzed