Practical Big Data Science

Similar documents
MSc(CompSc) List of courses offered in

Course Overview; Development Process

Course Overview; Development Process

Course Overview; Development Process

Course Overview; Development Process

Radio Deep Learning Efforts Showcase Presentation

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Shuhua Liu Senior Research Fellow, Docent Arcada Universitty of Applied Sciences. KaTuMetro Kickoff Seminar, University of Helsinki

Introduction. Ioannis Rekleitis

Image Extraction using Image Mining Technique

Keynotes. Visual Mining Interpreting Image and Video. Stefan Rüger Professor Knowledge Media Institute, The Open University, UK

Information Infrastructure II (Data Mining) I211

Machine Learning Practical Part 2: Group Projects. MLP Lecture 11 MLP Part 2: Group Projects 1

League of Legends: Dynamic Team Builder

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Privacy Preserving, Standard- Based Wellness and Activity Data Modelling & Management within Smart Homes

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Software Engineering II - Exercise

Removing barriers from AI startups Machine Intelligence Garage

LEADING DIGITAL TRANSFORMATION AND INNOVATION. Program by Hasso Plattner Institute and the Stanford Center for Professional Development

Enabling daily R&D work with digital tools

Intelligent Buildings Remote Monitoring Using PI System at the VSB - Technical University of Ostrava Jan Vanus

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 1: Intro

Transer Learning : Super Intelligence

2017 UCLA Summer Art Institute. Photography. Session A: July 10th through 21st. Instructor: Bjarne Bare

Institute of Information Systems Hof University

Initial communication and dissemination plan. Elias Alevizos, Alexander Artikis, George Giannakopoulos. Scalable Data Analytics Scalable Algorithms,

Navigating the AI Adoption Minefield Pitfalls, best practices, and developing your own AI roadmap April 11

Technical Programme. Proceedings/technical programme now ready on web site

Distributed Artificial Intelligence Laboratory. Future in touch. at CeBIT 2014 on March, 10th to 14th, Hall 9, Booth A 44

Creative Informatics Research Fellow - Job Description Edinburgh Napier University

Modern Operational Spectrum Monitoring Requirements

LEADING DIGITAL TRANSFORMATION AND INNOVATION. Program by Hasso Plattner Institute and the Stanford Center for Professional Development

How Machine Learning and AI Are Disrupting the Current Healthcare System. Session #30, March 6, 2018 Cris Ross, CIO Mayo Clinic, Jim Golden, PwC

ES 492: SCIENCE IN THE MOVIES

A.I in Automotive? Why and When.

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

Analysis and Geoprocessing Sessions and Demo Theater Presentations

WFEO STANDING COMMITTEE ON ENGINEERING FOR INNOVATIVE TECHNOLOGY (WFEO-CEIT) STRATEGIC PLAN ( )

Landeshauptstadt München Oberbürgermeister. Dieter Reiter

Great Minds. Internship Program IBM Research - China

Project Example: wissen.de

Automated Planetary Terrain Mapping of Mars Using Image Pattern Recognition

ComPat Tomasz Piontek 12 May 2016, Prague Poznan Supercomputing and Networking Center

Challenges in Transition

EUROPEAN COMMISSION Directorate-General for Communications Networks, Content and Technology CONCEPT NOTE

Chapter 5: Game Analytics

CONFERENCE AGENDA USER CONFERENCE 2018 Hollywood Beach, Florida April 30th May 3 rd, 2018

The Evolution of Artificial Intelligence in Workplaces

Copyright: Conference website: Date deposited:

University of Wisconsin-Madison, Nelson Institute for Environmental Studies September 2, 2014

Application of AI Technology to Industrial Revolution

2 nd and Final Announcement

A r t s : D r a w i n g - I C l a s s M e e t i n g s : F 1 0 : : 3 0 pm I n s t r u c t o r : J u l i a L a m b r i g h t

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Academic Course Description. VL2004 CMOS Analog VLSI Second Semester, (Even semester)

Lecture 1: Introduction and Preliminaries

Haodong Yang, Ph.D. Candidate

Today. CS 395T Visual Recognition. Course content. Administration. Expectations. Paper reviews

Big Data & AI Governance: The Laws and Ethics

A Reconfigurable Citizen Observatory Platform for the Brussels Capital Region. by Jesse Zaman

Monday July 9 th 9:00 10:00: Check in, introduction to the program and short tour of campus

ACADEMIC YEAR

FROM BRAIN RESEARCH TO FUTURE TECHNOLOGIES. Dirk Pleiter Post-H2020 Vision for HPC Workshop, Frankfurt

AT HOME WHEREVER THE FUTURE IS EMERGING.

LONDON S BEST BUSINESS MINDS TO COMPETE FOR PRESTIGIOUS CHESS TITLE

User Research in Fractal Spaces:

City University of Hong Kong. Course Syllabus. offered by Department of Computer Science with effect from Semester B 2016/17

Report on NTT Communication Science Laboratories Open House 2012

CSC C85 Embedded Systems Project # 1 Robot Localization

Office hrs: QC: Tue, 1:40pm - 2:40pm; GC: Thur: 11:15am-11:45am.or by appointment.

Carnegie Mellon University, University of Pittsburgh

Science of Science & Innovation Policy and Understanding Science. Julia Lane

Marine Earth Observation & Applications at University College Cork

Artificial Intelligence Machine learning and Deep Learning: Trends and Tools. Dr. Shaona

2018 IEEE Signal Processing Cup: Forensic Camera Model Identification Challenge

Case Study. British Library 19th Century Book Digitisation Project

The 2 nd Annual Career Development Stakeholders Conference. The Fourth Industrial The future of work 28 June 2018

Towards Digital Ecosystems

I. INTRODUCTION II. LITERATURE SURVEY. International Journal of Advanced Networking & Applications (IJANA) ISSN:

Machine Learning and Decision Making for Sustainability

COMPSCI 372 S2 C Computer Graphics

Towards Trusted AI Impact on Language Technologies

International Simulation Science Semester (ISSS)

General Briefing v.1.1 February 2016 GLOBAL INTERNET POLICY OBSERVATORY

CS 102: Big Data Tools and Techniques Discoveries and Pitfalls. Spring 2018

Analogy Engine. November Jay Ulfelder. Mark Pipes. Quantitative Geo-Analyst

Monday July 24 th 9:00 10:00: Check in, introduction to the program and short tour of campus

Technical Issues and Requirements for privacy risk identification through Crowd-sourcing

Get Automating with Infoblox DDI IPAM and Ansible

Scalable Methods for the Analysis of Network-Based Data

computational social networks 5th pdf Computational Social Networks Home page Computational Social Networks SpringerLink

Botzone: A Game Playing System for Artificial Intelligence Education

BE THE FUTURE THE WORLD S LEADING EVENT ON AI IN MEDICINE & HEALTHCARE

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired

PYBOSSA Technology. What is PYBOSSA?

Construction of Mobile Robots

THE GSMA PRESENTS MINISTERIAL PROGRAMME

Publishable Summary for the Periodic Report Ramp-Up Phase (M1-12)

Transcription:

Practical Big Data Science Max Berrendorf Felix Borutta Evgeniy Faerman Prof. Dr. Thomas Seidl Lehrstuhl für Datenbanksysteme und Data Mining Ludwig-Maximilians-Universität München 12.04.2018 Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 1 / 31

Agenda Organisation Goals Schedule Topics Gitlab Introduction Group Assignment Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 2 / 31

Organisation Organisation Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 2 / 31

Organisation General Information Lab Organisation Offered as part of ZD.B Innovation Lab Big Data Science 1, coordinated by the chairs of Prof. Dr. Thomas Seidl 2 Prof. Dr. Bernd Bischl 3 Prof. Dr. Dieter Kranzlmüller 4 Hosted alternately at the chairs of Prof. Seidl (summer term) and Prof. Bischl (winter term) Open to Master students in Informatics and Statistics programmes Technical infrastructure for the lab is provided and maintained by the chair of Prof. Kranzlmüller and the Leibniz-Rechenzentrum (LRZ) 1 https: //zentrum-digitalisierung.bayern/massnahmen-alt/innovationslabore-fuer-studierende/ 2 http://www.dbs.ifi.lmu.de 3 http://www.compstat.statistik.uni-muenchen.de/ 4 http://www.nm.ifi.lmu.de Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 3 / 31

Organisation Contact Lab Organisation Supervisors Name Mail Room Max Berrendorf berrendorf@dbs.ifi.lmu.de F110 Felix Borutta borutta@dbs.ifi.lmu.de 156 Evgeniy Faerman faerman@dbs.ifi.lmu.de F109 Dave Chen Robert Müller Website davech2y@outlook.com robert.mueller@campus.lmu.de http://www.dbs.ifi.lmu.de/cms/studium lehre/ lehre master/pbds18/index.html Time schedule and material Check regularly for updates and announcements Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 4 / 31

Organisation Process Lab Organisation Process We assign students to groups of 5-6 students Each group can specify preferences for 5 different topics We assign the groups to the topics Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 5 / 31

Organisation Process Lab Organisation every 2 weeks 2 per week Sprint Planning Daily Sprint Sprint Review Retrospective Short Report Process Each group will work on its topic following an agile scrum-like process The lab is divided into sprints At the end of each sprint groups report about last sprint and plans for the next During the last plenum session, all groups will present their results and provide a demonstration of their developed systems Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 6 / 31

Organisation Infrastructure Infrastructure Project Management Compute Cloud Room Room U 151, Thursday, 14:00-18:00, exclusive usage The room is equipped with CIP-terminals, beamers and whiteboards Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 7 / 31

Goals Goals Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 7 / 31

Goals Doing Lab Goals What will you do in this lab? Literature study and familiarization with an active research direction in data science and related approaches Implementation of state-of-the-art approaches in TensorFlow Application of these approaches to a use case on real data Evaluation of the approaches w.r.t. Result quality Efficiency Scalability Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 8 / 31

Goals Learning Lab Goals What will you learn? Hands-on experience with a Data Science topic: Familiarization with a research direction Application of the Data Science process In-depth experience with machine learning platform TensorFlow Working with a cloud computing system: OpenNebula Agile development in a team using Scrum: GitLab Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 9 / 31

Goals Success Lab Goals Successful Participation In order to successfully complete the lab, you have to Attend all meetings Contribute actively in your group Guideline: 25h/week Implement the backlog items specified by your topic according to their respective definitions of done Maintain your group documentation and provide regular reports Present your final results and your developed system Participate in the discussions of other presentations Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 10 / 31

Schedule Schedule Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 10 / 31

Schedule Time Schedule Fixed Dates S 1 S 2 S 3 S 4 S 5 S 6 Kickoff Final Presentations 12.04. 19.04. 03.05. 17.05. 31.05. 14.06. 28.06. 12.07. Times Thur., 14:00-16:00: Scrum Meetings Thur., 16:00-18:00: Plenum Session Stand-up meetings on appointment with your supervisor Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 11 / 31

Topics Topics Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 11 / 31

Topics Conditions for Industry Projects Company Signs contract with the university Pays for the project execution first Optionally acquires rights of use (exclusive or non-exclusive) Students Sign contract with the university If necessary sign NDA (and take it seriously) Execute project Get money if the company acquires rights of use x for the team for non-exclusive rights of use y for the team for exclusive rights of use Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 12 / 31

Topics Company X (industry) 1. Company X (industry) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 12 / 31

Topics Company X (industry) Spatio-temporal signal interpolation Historic Only Historic + Future Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 13 / 31

Topics Company X (industry) Spatio-temporal signal interpolation Problem Measure stations spatially distributed Input: Historic data for each station Future prediction for few stations Output: Predictions for all other stations What will you learn Work on real-life project Experience with state-of-the-art Deep Learning methods: Recurrent networks Graph Neural Networks (Attention) Integration of different information sources Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 14 / 31

Topics Harman (industry) 2. Harman (industry) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 14 / 31

Topics Harman (industry) Active Learning for Object Detection (industry) Street Scenes Data Image Source: http://cbcl.mit.edu/software-datasets/streetscenes/ Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 15 / 31

Topics Harman (industry) Active Learning for Object Detection (industry) Basic Idea: Creating a support system for labeling Data: Street scenes images Problem: The set of labels is going to be very sparse Goal: Integrating user expertise into semi-automated labeling process Active Learning approaches to solve two problems 1. Object Detection 2. Object Labeling Tasks: Identification and Implementation of suitable algorithms Join two active learning steps within one framework Integration into existing UI Profit: Learn fundamental AI concepts that are already established in the area of ML Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 16 / 31

Topics Movie Rating Prediction 3. Movie Rating Prediction Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 16 / 31

Topics Movie Rating Prediction Movie Rating Prediction Task Predict the average IMDb rating for new movies based on meta data (e.g., actors, directors, posters,... ) As data sources, you may use all freely available resources (e.g., IMDb, Wikipedia, OMDB,...) Goal Develop a website where the user can input meta information concerning a specific movie AI backend should provide an accurate prediction of the average IMDb rating Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 17 / 31

Topics Movie Rating Prediction Movie Rating Prediction Challenges Heterogeneous data sources Cope with missing meta-data Profit Choose data sources by yourself Evaluate ML algorithms w.r.t. to heterogeneous data sources Find out if a new movie is worth watching Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 18 / 31

Topics Air pollution prediction (KDD CUP of Fresh Air) 4. Air pollution prediction (KDD CUP of Fresh Air) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 18 / 31

Topics Air pollution prediction (KDD CUP of Fresh Air) Air pollution prediction Task Predict air pollutants concentration for future Data: historical pollution and weather data from different sources 35 stations in Beijing and 13 in London Data from KDD Cup 2018 Goal Develop a system for air pollutant prediction Include additional information (e.g. distance between stations, etc.) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 19 / 31

Topics Explainable AI 5. Explainable AI Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 19 / 31

Topics Explainable AI Explainable AI for CNNs Inception Activations 5 Image Colour Texture Shape 5 3rd Layer, Inception v3 Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 20 / 31

Topics Explainable AI Explainable AI for CNNs Goal Open black-box of CNNs Activation Maximisation Data Space Data Set Image Source: https://distill.pub/2017/feature-visualization Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 21 / 31

Topics Explainable AI Explainable AI for CNNs Task Explorative Analysis of CNN activations for full Imagenet Goal Determine role of neurons ( Explanation by Example ) Identify important neurons Similarity Search based upon different Feature Representations Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 22 / 31

Topics Explainable AI Explainable AI for CNNs Challenges Huge data (for 1.2M images approx. 16 TiB raw data) Many possible queries (top-k retrieval, correlations, clustering,...) For explorative analysis: near realtime processing Profit Develop a system for big data analysis (backend + frontend) Deepen understanding of the inner workings of CNN Improve CNN structure? Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 23 / 31

Gitlab Introduction Gitlab Introduction Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 23 / 31

Gitlab Introduction Gitlab Introduction GitLab https://gitlab.lrz.de Sign in with LRZ-ID 6 How to create a group? How to create a project? Issues & Milestones 6 The LRZ-ID can be found at https://www.portal.uni-muenchen.de/benutzerkonto/index.html Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 24 / 31

Group Assignment Group Assignment Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 24 / 31

Group Assignment Group Assignment (removed for privacy reasons) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 26 / 31

Homework Homework Homework (until tomorrow) Get together with your group Decide for a group name Decide on a ranking for the topics with your group Send us an e-mail until Friday, 13.04., 15:00 We will match the groups to the topics based upon this rankings In LRZ-Gitlab 7 Create a group named as your group; invite all three supervisors and both Hiwis. Create a project within this group (More information about Gitlab later) 1h 1h 1h 7 https://gitlab.lrz.de/ Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 27 / 31

Homework Homework Homework (until next week) Get familiar with: Python numpy TensorFlow OpenNebula Git Scrum GitLab Issues/Milestones 22h Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 28 / 31

References Useful References Related Lectures Knowledge Discovery in Databases I (KDD I) Knowledge Discovery in Databases II (KDD 2) Big Data Management and Analytics Machine Learning OpenNebula Info LRZ Tutorials Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 29 / 31

References Useful References TensorFlow Get Started With TensorFlow Git Basics Branching Feature/Development/Master Branch (by Atlassian) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 30 / 31

References Useful References GitLab LRZ GitLab Workflow Overview SCRUM Scrum Overview (Atlassian) Berrendorf, Borutta, Faerman (LMU) PBDS 12.04.2018 31 / 31