From Morphological Box to Multidimensional Datascapes

Similar documents
A Review of Related Work on Machine Learning in Semiconductor Manufacturing and Assembly Lines

Digitalisation as day-to-day-business

Virtualization of Science and Scholarship S. George Djorgovski Caltech

Stereo-based Hand Gesture Tracking and Recognition in Immersive Stereoscopic Displays. Habib Abi-Rached Thursday 17 February 2005.

Enhancing Robot Teleoperator Situation Awareness and Performance using Vibro-tactile and Graphical Feedback

MECHANICAL DESIGN LEARNING ENVIRONMENTS BASED ON VIRTUAL REALITY TECHNOLOGIES

Image Searches, Abstraction, Invariance : Data Mining 2 September 2009

One Size Doesn't Fit All Aligning VR Environments to Workflows

Exploring the value of emerging technology in the lean enterprise

Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

Image Extraction using Image Mining Technique

BI TRENDS FOR Data De-silofication: The Secret to Success in the Analytics Economy

Image Searches, Abstraction, Invariance : Data Mining 8 September 2008

A Kinect-based 3D hand-gesture interface for 3D databases

Realizing Augmented Reality

Social Big Data. LauritzenConsulting. Content and applications. Key environments and star researchers. Potential for attracting investment

PURPOSE OF THIS EBOOK

interactive laboratory

2018 Avanade Inc. All Rights Reserved.

Virtual Grasping Using a Data Glove

Liferay as a headless CMS for Robotics & VR/AR environments

Introduction. digitalsupercluster.ca

EMPOWERING THE CONNECTED FIELD FORCE WORKER WITH ADVANCED ANALYTICS MATTHEW SHORT ACCENTURE LABS

MARITIME IN THE NEW BRINGING THE POWER AND CONNECTIVITY OF INDUSTRY X.0 TO THE NAVAL SHIPBUILDING INDUSTRY

This list supersedes the one published in the November 2002 issue of CR.

Deep Learning Overview

The robots are coming, but the humans aren't leaving

Connecting Commerce. Mining industry confidence in the digital environment. Written by

ISO JTC 1 SC 24 WG9 G E R A R D J. K I M K O R E A U N I V E R S I T Y

Partner sought to develop a Free Viewpoint Video capture system for virtual and mixed reality applications

Short Course on Computational Illumination

The Transformative Power of Technology

Building Spatial Experiences in the Automotive Industry

How technology can enable the fourth industrial revolution. Lynne McGregor 28 February 2018

Australian Approaches to Innovation and Transitioning to a Low Carbon Economy Lessons for Quebec

Perceptual Interfaces. Matthew Turk s (UCSB) and George G. Robertson s (Microsoft Research) slides on perceptual p interfaces

CREATING TOMORROW S SOLUTIONS INNOVATIONS IN CUSTOMER COMMUNICATION. Technologies of the Future Today

MSc(CompSc) List of courses offered in

Applying Virtual Reality, and Augmented Reality to the Lifecycle Phases of Complex Products

Effective Iconography....convey ideas without words; attract attention...

The future of work. Artificial Intelligence series

Enhancing Shipboard Maintenance with Augmented Reality

TECHNOLOGICAL COOPERATION MISSION COMPANY PARTNER SEARCH

VIRTUAL REALITY Introduction. Emil M. Petriu SITE, University of Ottawa

Industry 4.0: the new challenge for the Italian textile machinery industry

Chapter 4 SPEECH ENHANCEMENT

Modules for Graduate Certificate in Construction Productivity Enhancement Coming up soon Tentatively from January 2019 SkillsFuture funding may apply

Technology trends in the digitalization era. ANSYS Innovation Conference Bologna, Italy June 13, 2018 Michele Frascaroli Technical Director, CRIT Srl

Science on the Fly. Preview. Autonomous Science for Rover Traverse. David Wettergreen The Robotics Institute Carnegie Mellon University

Immersive Visualization On the Cheap. Amy Trost Data Services Librarian Universities at Shady Grove/UMD Libraries December 6, 2019

CHAPTER 1 INTRODUCTION

Nonuniform multi level crossing for signal reconstruction

Modeling and Simulation: Linking Entertainment & Defense

Chapter 1 - Introduction

Data Visualisation. Jingpeng Li. Data Visualisation

Proposers Day Workshop

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

preface Motivation Figure 1. Reality-virtuality continuum (Milgram & Kishino, 1994) Mixed.Reality Augmented. Virtuality Real...

How Digital Engineering Will Change The Way We Work Together To Design And Deliver Projects Adam Walmsley, BG&E, Australia.

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

STIMULATIVE MECHANISM FOR CREATIVE THINKING

* SkillsFuture credit (available for Singapore Citizens, subject to approval)

AIEDAM Special Issue: Sketching, and Pen-based Design Interaction Edited by: Maria C. Yang and Levent Burak Kara

Embedding Artificial Intelligence into Our Lives

USING BENFORD S LAW IN THE ANALYSIS OF SOCIO-ECONOMIC DATA

Below is provided a chapter summary of the dissertation that lays out the topics under discussion.

AI for Autonomous Ships Challenges in Design and Validation

Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. Interaction in Virtual and Augmented Reality 3DUIs

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

Mission Space. Value-based use of augmented reality in support of critical contextual environments

Health Care Analytics: Driving Innovation

immersive visualization workflow

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Automatic Locating the Centromere on Human Chromosome Pictures

UNIT 2 TOPICS IN COMPUTER SCIENCE. Emerging Technologies and Society

A Technical Perspective on Cognitive Architectures

Theory of Telecommunications Networks

Neuro-Fuzzy and Soft Computing: Fuzzy Sets. Chapter 1 of Neuro-Fuzzy and Soft Computing by Jang, Sun and Mizutani

Augmented Reality. ARC Industry Forum Orlando February Will Hastings Analyst ARC Advisory Group

The KNIME Image Processing Extension User Manual (DRAFT )

Immersive and Collaborative Data Visualization Using Virtual Reality Platforms

Towards Brain-inspired Computing

Space Biology RESEARCH FOR HUMAN EXPLORATION

INNOVATION : STATE OF PLAY MINING INDUSTRY SURVEY 2017

AUTOMATED METHOD FOR STATISTIC PROCESSING OF AE TESTING DATA

Earth Cube Technical Solution Paper the Open Science Grid Example Miron Livny 1, Brooklin Gore 1 and Terry Millar 2

Digital Disruption Thrive or Survive. Devendra Dhawale, August 10, 2018

Realtime 3D Computer Graphics Virtual Reality

Digital image processing vs. computer vision Higher-level anchoring

ENTREPRENEURSHIP & ACCELERATION

Lecture # 01. Introduction

Advanced Analytics for Intelligent Society

A CYBER PHYSICAL SYSTEMS APPROACH FOR ROBOTIC SYSTEMS DESIGN

Scalable systems for early fault detection in wind turbines: A data driven approach

Machine Learning and Data Mining Course Summary

ArkPSA Arkansas Political Science Association

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R2 Exhibit)

International Journal of Advanced Research in Computer Science and Software Engineering

IMAGE PROCESSING FOR EVERYONE

Transcription:

From Morphological Box to Multidimensional Datascapes S. George Center for Data-Driven Discovery and Dept. of Astronomy, Caltech AstroInformatics 2016, Sorrento, Italy, October 2016

Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Dan Ariely

What is Fundamentally New Here? The information volumes and rates grow exponentially Most data will never be seen by humans A great increase in the data information content Data driven vs. hypothesis driven science A great increase in the information complexity There are patterns in the data that cannot be comprehended by humans directly

From Morphological Box to the Observable Parameter Spaces Fritz Zwicky Zwicky s concept: explore all possible combinations of the relevant parameters in a given problem; these correspond to the individual cells. in a Morphological Box Example: Zwicky s discovery of the compact dwarfs

Systematic Exploration of the Observable Parameter Space (OPS) Its axes are defined by the observable quantities Every observation, surveys included, carves out a hypervolume in the OPS Technology opens new domains of the OPS New discoveries

Measurements Parameter Space Colors of stars and quasars SDSS Physical Parameter Space Fundamental Plane of hot stellar systems E dsp h GC Dimensionality the number of observed quantities Both are populated by objects or events

Measurements Parameter Space Color-magnitude diagram Physical Parameter Space H-R diagram Theory + Other data Not filled uniformly: clustering indicates different families Clustering + dimensionality reduction = correlations High dimensionality poses analysis challenges

Exploration of Parameter Spaces is the Central Problem of Data Science Clustering, classification, correlation and outlier searches, Machine Learning Is the Key Methodology Challenges: Algorithm and data model choices Data incompleteness Feature selection and dimensionality reduction Uncertainty estimation Scalability Visualization... etc. } Especially with the data dimensionality

Pattern or structure (Correlations, Clustering, Outliers, etc.) Discovery in High-Dimensional Parameter Spaces D >> 3 parameter space hypercube High-D data cloud: mostly noise, of an arbitrary distribution But in some corner of some sub-d projection of this data space, there is something noise

From Light Curves to Feature Vectors We compute ~ 70 parameters and statistical measures for each light curve: amplitudes, moments, periodicity, etc. This turns heterogeneous light curves into homogeneous feature vectors in the parameter space Apply a variety of automated classification methods 17.2 17.1 17.0 16.9 16.8 Mag 16.7 16.6 16.5 16.4 16.3 16.2 16.1 5.36 5.38 5.40 5.42 5.44 5.46 5.48 5.50 5.52 5.54 5.56 MJD 4 x10

Optimizing Feature Selection Rank features in the order of classification quality for a given classification problem, e.g., RR Lyrae vs. WUMa RR Lyrae Eclipsing binary (W U Ma) (Lead: C. Donalek)

Quasar Selection in a Combined Parameter Space of Variability and WISE Colors QSO region Initial results from the Kepler field: a 100% success rate! (Leads: M.Graham, D. Stern)

Looking for Outliers in the QSO Variability Parameter Space Spectra for the outliers in this parameter space (with anomalous/unusual variability patterns) show: BAL QSOs with evolving spectra Type-changing quasars (Type I Type 2) Double-peak emitters (Lead: M. Graham) Correlated photometric/spectroscopic variability

From the Information Technology to the Cognition Technology: Towards a Human-Computer Collaborative Discovery Vannevar Bush (1945) (1960)

(Lead: M. Graham)

The Rise of the Machines: Science on the Carbon-Silicon Interface Data processing: Automated data quality control (anomaly/fault detection/repair) Data mining and analysis: Clustering, classification, outlier or anomaly detection Pattern recognition, multivariate correlation search Machine discovery of analytical relationships Assisted dimensionality reduction for visualization Code design and implementation: from art to science?

A Key Challenge: Visualisating Multidimensional Data Spaces Hyperdimensional structures (clusters, correlations, etc.) may be present in many complex data sets, whose dimensionality may be D ~ 102 104, or higher It is a matter of data understanding, choosing the right data mining algorithms, and interpreting the results We are biologically limited to perceiving up to ~ 3-12(?) dimensions What good are the data if we cannot effectively extract knowledge from them?

Traditional 2D Visualization Quasar colors in an 8-Dimensional parameter space: typical 2-D projections

Diving Into Multidimensional Datascapes New interactive and collaborative data visualization tools using immersive or augmentative Virtual Reality

Effective Navigation and Interaction in VR Beyond a keyboard and a mouse: gesture based interfaces and control devices Developing optimal user interaction tools and methods for the new VR/AR platforms

Telepresence and Holoportation Scientific collaboration in shared virtual spaces, collaborative visual data exploration Virtual Mars at JPL (S. Davidoff, J. Norris, et al.) Holoportation with Microsoft HoloLens TM

Why Virtual Reality? Multi > 3; multi-d multiple 1-D VR/AR is the next computing platform, following on the mainframe, desktop, and mobile VR solves the problems that traditionally plagued 3-D visualization: occlusion, perspective, navigation, etc. The key concepts are proprioception (sense of the relative position) and kinesthesia (movement sense) VR is a natural platform for a collaborative visual exploration and collaboration Leverages a multi-$z investment by the games industry

Data Science Methodology Transfer There are common challenges and a common underlying methodology to much of the data science (computing, IT, ML, statistics...) How can we transfer the cyberinfrastructure developments, experience, and solutions from one scientific domain to others?

Center for Data-Driven Discovery A new research center at Caltech Serves research efforts Institute-wide A part of a new, Caltech-JPL joint initiative for data science and technology The goals are to assist faculty in formulation and execution of data-intensive projects, and facilitate interdisciplinary sharing of methods, ideas, novel projects, etc.

From Sky Surveys to Neurobiology Using the data analytics tools based on ML, developed for the analysis of sky surveys, to design a better diagnostics for autism Feature importance using random forests => Next: correlate with MRI scans (with R. Adolphs et al.) J. Bunn, CD 3

From Sky Surveys to Neurobiology Feature importance => 6-dimensional parameter space Mixed <= => Control C.Donalek, CD 3 Outlier Cylinders = Autistic, Cubes = Control Stripped = Male, Solid = Female

From%Sky%Surveys%to%Neurobiology% Symbolic)regression)finds)best2fitting)mathematical) description)of)a)sample)of)data)via)evolutionary)algorithm) Cast)binary)classification)as:) f(x))is)equation)of)discriminating)hyperplane) Dependent)features:) I)find)it)easy)to)put)myself)in)somebody)else s)shoes ) I)can)tell)if)someone)is)masking)their)true)emotion ) I)feel)at)upset ) class = g( f (x 1, x 2, x 3,..., x n ))) Accuracies)of)~90%))but)small)sample)data) set)and)feature)degeneracy) M.#Graham,#CD 3#

The Key Points A systematic exploration of high-dimensionality data spaces is a key arena for any data-rich science, astronomy included Machine learning and computational statistics tools are essential, and many challenges remain Uses of machine intelligence will lead to a collaborative human-computer discovery, and a cognition technology Multidimensional visualization is a key bottleneck Virtual reality will be a powerful platform for both data visualization and scientific collaboration Many data science challenges are common to all fields; their solutions constitute a rise of the new scientific methodology and methodology transfer can and should be done