What is the UC Irvine Data Science Initiative?

Similar documents
Journal of Professional Communication 3(2):41-46, Professional Communication

November 6, Keynote Speaker. Panelists. Heng Xu Penn State. Rebecca Wang Lehigh University. Eric P. S. Baumer Lehigh University

INSTITUTE FOR COASTAL & MARINE RESEARCH (CMR)

ArkPSA Arkansas Political Science Association

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Scalable Methods for the Analysis of Network-Based Data

inventing the collaborative research environment for the digital future CALIFORNIA INSTITUTE FOR TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY

A School in Computational Science &

The Intel Science and Technology Center for Pervasive Computing

TECHNOLOGY, ARTS AND MEDIA (TAM) CERTIFICATE PROPOSAL. November 6, 1999

TECHNOLOGY BACHELOR DEGREE (HEALTH SCIENCES OR ENGINEERING AND APPLIED SCIENCE OPTIONS) Prepare for a career as a technology leader.


Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

Science Integration Fellowship: California Ocean Science Trust & Humboldt State University

NCRIS Capability 5.7: Population Health and Clinical Data Linkage

Pure Versus Applied Informatics


Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

Advances and Perspectives in Health Information Standards

Engineer of 2020: A high-risk, high-pay-off approach

University of Queensland. Research Computing Centre. Strategic Plan. David Abramson

Data Science Research Fellow

MEDIA AND INFORMATION

Cross Disciplinary Research and the Role of Industry.

Governing energy transitions towards a low-carbon society: the role of reflexive regulation and strategic experiments

ABOUT COMPUTER SCIENCE

Our responses are interleaved with the questions that were posed in your request for feedback.

Iowa State University Library Collection Development Policy Computer Science

Human factors and design in future health care

Computing Disciplines & Majors

School of Computer Science McGill University

On Becoming Data Citizens in Contemporary & Future Well-being Service Ecosystems: Personal Genomics & Quantified Selves

Can Linguistics Lead a Digital Revolution in the Humanities?

Doctoral College Environmental Informatics

BERKELEY DAVIS IRVINE LOS ANGELES MERCED RIVERSIDE SAN DIEGO SAN FRANCISCO. Chair of the Assembly of the Academic Senate

FET in H2020. European Commission DG CONNECT Future and Emerging Technologies (FET) Unit Ales Fiala, Head of Unit

Division of Academic Affairs Technology Fee Project Proposal 2014

Information Technology Fluency for Undergraduates

Baccalaureate Program of Sustainable System Engineering Objectives and Curriculum Development

e-science Acknowledgements

Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University

Information Technology: Driving innovation, Engineering the engineer

Top Ten Characteristics of Community

Adopting Standards For a Changing Health Environment

PROGRAMME SYLLABUS Sustainable Building Information Management (master),

STRATEGIC FRAMEWORK Updated August 2017

Multidisciplinary education for a low-carbon society. Douglas Halliday, Durham University, UK

THE BIOMEDICAL ENGINEERING TEACHING & INNOVATION CENTER. at Boston University s College of Engineering

Master of Comm. Systems Engineering (Structure C)

Symposium: Urban Energy innovation

GRADUATE PROGRAMS POSSIBILITY

Prof Ina Fourie. Department of Information Science, University of Pretoria

NUIT Support of Researchers

Cyber-enabled Discovery and Innovation (CDI)

WFEO STANDING COMMITTEE ON ENGINEERING FOR INNOVATIVE TECHNOLOGY (WFEO-CEIT) STRATEGIC PLAN ( )

STUDENT FOR A SEMESTER SUBJECT TIMETABLE MAY 2018

Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress

LEADING DIGITAL TRANSFORMATION AND INNOVATION. Program by Hasso Plattner Institute and the Stanford Center for Professional Development

Mathematics for Data Science

Industry 4.0: the new challenge for the Italian textile machinery industry

Cross Linking Research and Education and Entrepreneurship

Integrated Product Development: Linking Business and Engineering Disciplines in the Classroom

Great Minds. Internship Program IBM Research - China

EPD ENGINEERING PRODUCT DEVELOPMENT

2018 Research Campaign Descriptions Additional Information Can Be Found at

Proposers Day Workshop

Benchmarking: The Way Forward for Software Evolution. Susan Elliott Sim University of California, Irvine

SME Adoption of Wireless LAN Technology: Applying the UTAUT Model

Curriculum Vitae September 2017 PhD Candidate drwiner at cs.utah.edu

MARINE STUDIES (FISHERIES RESOURCE MANAGEMENT) MASTER S DEGREE (ONLINE)

Information Sociology

Transportation Education in the New Millennium

A Survey of UAS Industry Professionals to Guide Program Improvement

Academia to Data Science. Faye Zheng Program Director Insight Data Science

INTEGRATING DESIGN AND ENGINEERING, II: PRODUCT ARCHITECTURE AND PRODUCT DESIGN

Modeling and Simulation: Linking Entertainment & Defense

Digital Transformation. A Game Changer. How Does the Digital Transformation Affect Informatics as a Scientific Discipline?

Convergence, Grand Challenges, Team Science, and Inclusion

MAE 298 June 6, Wrap up

Executive Summary. Chapter 1. Overview of Control

Advances in the Engineering Education

Exploring emerging ICT-enabled governance models in European cities

FOUR YEAR PLANNING FIRST YEAR: AWARENESS SUGGESTIONS FOR YOUR FIRST SUMMER SOME POINTS TO CONSIDER

Agricultural Economics and Interdisciplinary Work. Patricia Duffy

The Long Tail of Research Data

PART I NEW ACADEMIC PROGRAMS AND PROGRAM CHANGES

NAE Grand Challenges

Micaela Serra Dept. of Computer Science University of Victoria

Proposal Solicitation

Preparing the Future Workforce for Careers in Science and Engineering. Steven I. Gordon

HACETTEPE ÜNİVERSİTESİ COMPUTER ENGINEERING DEPARTMENT BACHELOR S DEGREE INFORMATION OF DEGREE PROGRAM 2012

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

Submitted on September 26, 2016 Submitted by anonymous user: [::ffff: ] Submitted values are:

Construction Technology Forecast: Harnessing Integration, Mobile, Sensors & Augmented Reality

PRESIDENT S FORUM NOVEMBER 7, 2013

AMERICAN METEOROLOGICAL SOCIETY

Signature Area Development Process

Signature Initiatives Working Group Draft Report Appendix A5

COURSE UNITS TAUGHT IN ENGLISH :: UNIVERSITY OF COIMBRA :: ACADEMIC YEAR 2009/2010

Computer Science and Philosophy Information Sheet for entry in 2018

Transcription:

What is the UC Irvine Data Science Initiative? Padhraic Smyth Director of the UCI Data Science Initiative Department of Computer Science University of California, Irvine

A Revolution in the Technology of Data Graphic from Ray Kurzweil, singularity.com Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 2

A Paradigm Shift in Data Analysis Technological drivers Sensors (cheap and ubiquitous) Data storage (everyone is a data owner ) Computational power Data analysis methods Data access via the internet Convergence..tremendous demand for data analysis In the sciences, in medicine, in engineering, in business, and more In the past, this demand was met by statistics, but. Does not scale up too few statisticians Even statisticians need computers Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 3

Human Computers The historical meaning of the term computer : one who computes (i.e., a person) Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 4

Human Computers The historical meaning of the term computer : one who computes (i.e., a person) Statisticians have been using computers for centuries e.g., Karl Pearson s team of human computers around 1900..but human computers could only work on relatively small problems Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 5

Statistics and Computing Post World War II Increasing use of computing to solve algorithmic aspects of statistical analyses 1960 s Development of statistical computing and exploratory data analysis 1980 s Computing allowed statisticians to explore more flexible models Increase in use of non-parametric techniques and simulation methods 1990 s Development of machine learning very flexible predictive modeling techniques Today Distinctions between statistics and computer science often blurred Interface is a very active and exciting research area Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 6

The role of theory in research is being dangerously ignored in favor of purely empirical work that proceeds without so much as a hypothesis. Public Opinion Quarterly, 1972 Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 7

From http://www.tylervigen.com/ Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 8

From http://www.tylervigen.com/ Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 9

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 10

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 11

How Much Climate Data Do We Actually Have? Image from http://cimss.ssec.wisc.edu/ Image from ipcc.ch Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 12

What is Data Science? Algorithms, Databases, Systems + Statistics, Optimization, Machine Learning + Decisions, Policy, Privacy Data Models and Predictions Humans Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 13

What is Data Science? Algorithms, Databases, Systems + Statistics, Optimization, Machine Learning + Decisions, Policy, Privacy Applications of Data Analysis Science, Medicine, Engineering, Humanities, Business Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 14

Challenges in Data Science Statistical Data is often observational, not a random sample How can we better combine theory and data-driven modeling? Algorithmic Scalability: how to work with an N 3 algorithm when N = 100 million? Can the models be updated automatically? Human and Socio-Cultural Balancing privacy and data usage Better tools to allow data users to see inside the black box Educational Shortage of people with skills in both statistics and computer science Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 15

The Shape of Data d = number of variables N = number of samples Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 16

The Shape of Data over Time Pre 1990 Post 1990 Post 2005 Small N, d Large d Large N Large N is good (many algorithms are linear in N).but large d is a challenge, both statistically and computationally Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 17

Computer Systems 101 CPU RAM Disk Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 18

How Far Away is the Data? CPU RAM Disk 10-8 seconds 10-3 seconds Random Access Times Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 19

How Far Away is the Data? CPU RAM Disk 1 meter 100 kilometers Effective Distances Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 20

Legislation on Restrictions on Data Collection Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 21

What is the UCI Data Science Initiative? Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 22

What is the UCI Data Science Initiative? Campus-wide initiative supported by the UCI Office of Academic Initiatives Started July 1, funded for 3 years One of 5 currently-funded initiatives From the Office of Academic Initiatives: Initiatives are expected to encompass projects that involve research, undergraduate and graduate education programs, outreach to public and private organizations, and philanthropic potential...the intent of this program is to support initiatives with a wider range of activities than can be accommodated by ORUs or campus and school research centers. teams of engaged faculty are expected to develop new programs of interschool excellence. Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 23

From the Initiative Website Data Science encompasses the full spectrum of theories and methods that use data to understand and make predictions about the world around us. This includes fundamental research on statistical methods, prediction algorithms, data management techniques, and policy issues; as well as a broad range of domain-specific data-driven research problems in the sciences, engineering, humanities, education, medicine, and business. Website: http://datascience.uci.edu Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 24

Faculty Advisory Board Anima Anandkumar Engineering Jessica Utts Pierre Baldi Geof Bowker Michael Carey Information and Computer Sciences Peter Krapp Humanities Jim Randerson Physical Sciences Suzanne Sandmeyer Medicine Vijay Gurbaxani Business Mark Warschauer Education Kevin Thornton Biological Sciences George Tita Social Ecology Mark Steyvers Social Sciences Tom Boellstoff Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 25

Current Activities of the Data Science Initiative Mini-Symposia Short-Courses Data Science Undergraduate Major Proposal Emphasis on statistics and computer science A minor option planned for later Other Support for large proposal efforts Clearing house for data-science-related information via mailing list and Website More activities being planned Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 26

Data Science Website: http://datascience.uci.edu Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 27

Short Courses Fall 2014 Introduction to R (Nov 14 th ) Introduction to Linux and HPC (Nov 17 th ) Analyzing Data in Linux (Nov 18 th ) Application deadline: November 1 st (on Web site) Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 28

Short Courses Fall 2014 Introduction to R (Nov 14 th ) Introduction to Linux and HPC (Nov 17 th ) Analyzing Data in Linux (Nov 18 th ) 2015 Advanced R (early 2015) Exploratory Data Analysis in Python Software Carpentry (early 2015) Big Data Management Predictive Modeling in Python Application deadline: November 1 st (on Web site) + Repeats of R and Linux/HPC courses Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 29

Research MiniSymposia ½ day to full day MiniSymposia on topics of relevance to data science Interdisciplinary in nature Roughly once per quarter Currently in planning mode for 2015 Statistical and Algorithmic Modeling of Social Network Data Social Science, Statistics, Computer Science March 2015 (tentative) Data Analysis in Education: Learning Analytics Education, Machine Learning Spring 2015 (tentative) Business and Informatics Spring 2015 (tentative) Additional topics under consideration Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 30

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 31

How can you Participate? Visit the Website, sign up for the mailing list, attend events Participate in short-courses Suggest new short courses Volunteer to teach a short course Propose/organize a mini-symposium Organize and chair a half/full day mini-symposium Emphasis on emerging research topics, data-centric, inter-disciplinary To start? Contact the faculty advisory board member in your school (names on Website) Grant proposals in the Data Science area Want to write a joint proposal with a data science angle and need collaborators? Let us know and we will try to make things happen If you have an idea.let us know Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 32

What the Initiative does not have. Open faculty positions Hardware Direct funding for research projects Consulting support for projects Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 33

Today s Kickoff Event Algorithms, Databases, Systems + Statistics, Optimization, Machine Learning + Decisions, Policy, Privacy Applications of Data Analysis Science, Medicine, Engineering, Humanities, Business Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 34

Session 1: Foundations Algorithms, Databases, Systems + Statistics, Optimization, Machine Learning + Decisions, Policy, Privacy Michael Carey Hal Stern Pierre Baldi Geof Bowker Applications of Data Analysis Science, Medicine, Engineering, Humanities, Business Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 35

Session 2: Applications Algorithms, Databases, Statistics, Optimization, Machine Learning + + Systems Decisions, Policy, Privacy Short Applications Talks on: Text Analysis, Particle Physics, Engineering, Genomics, the Environment, and Business Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 36

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 37

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 38

Padhraic Smyth, UC Irvine Data Science Initiative, Oct 2014: 39