Data Visualisation. Jingpeng Li. Data Visualisation

Similar documents
Sensors and Scatterplots Activity Excel Worksheet

Trial version. Resistor Production. How can the outcomes be analysed to optimise the process? Student. Contents. Resistor Production page: 1 of 15

NCSS Statistical Software

AWM 11 UNIT 1 WORKING WITH GRAPHS

How to define Graph in HDSME

Step 1: Set up the variables AB Design. Use the top cells to label the variables that will be displayed on the X and Y axes of the graph

Plotting scientific data in MS Excel 2003/2004

A graph is an effective way to show a trend in data or relating two variables in an experiment.

Using Charts and Graphs to Display Data

Line Graphs. Name: The independent variable is plotted on the x-axis. This axis will be labeled Time (days), and

Comparing Across Categories Part of a Series of Tutorials on using Google Sheets to work with data for making charts in Venngage

Data 1 Assessment Calculator allowed for all questions

Describing Data. Presenting Categorical Data Graphically. Describing Data 143

Important Considerations For Graphical Representations Of Data

National Curriculum Programme of Study:

Tables and Figures. Germination rates were significantly higher after 24 h in running water than in controls (Fig. 4).

Collecting and Organizing Data. The Scientific Method (part 3) Rules for making data tables: Collecting and Organizing Data

The Making of the Fittest: Natural Selection and Adaptation

Appendix C: Graphing. How do I plot data and uncertainties? Another technique that makes data analysis easier is to record all your data in a table.

Multiple Choice: Identify the choice that best completes the statement or answers the question.

Introducing Numicon into Year 1

Date. Probability. Chapter

Statistics. Graphing Statistics & Data. What is Data?. Data is organized information. It can be numbers, words, measurements,

Chapter 4: Patterns and Relationships

The Making of the Fittest: Natural Selection and Adaptation

St Paul s Catholic School Mathematics GCSE Revision MAY HALF TERM PACK 4 STATISTICS AND PROBABILITY TOPICS TO GRADE 4/5. Page 1. Name: Maths Teacher:

Infographics at CDC for a nonscientific audience

The Making of the Fittest: Natural Selection and Adaptation

Chapter 2. The Excel functions, Excel Analysis ToolPak Add-ins or Excel PHStat2 Add-ins needed to create frequency distributions are:

CHM 152 Lab 1: Plotting with Excel updated: May 2011

Quoting Designs and Corporate Logos

2. Pixels and Colors. Introduction to Pixels. Chapter 2. Investigation Pixels and Digital Images

Section 3 Correlation and Regression - Worksheet

Using Graphing Skills

ESSENTIAL MATHEMATICS 1 WEEK 17 NOTES AND EXERCISES. Types of Graphs. Bar Graphs

MAKING MATHEMATICS COUNT

AgilEye Manual Version 2.0 February 28, 2007

PASS Sample Size Software. These options specify the characteristics of the lines, labels, and tick marks along the X and Y axes.

Numbers. Counting. Key Point. Key Point. Understand what a number is Count from 0 20 in numbers and words Count to 100

Using Graphing Skills

Describing Data Visually. Describing Data Visually. Describing Data Visually 9/28/12. Applied Statistics in Business & Economics, 4 th edition

Scientific Investigation Use and Interpret Graphs Promotion Benchmark 3 Lesson Review Student Copy

Math 65A Elementary Algebra A Exam II STUDY GUIDE and REVIEW Chapter 2, Sections 3 5, and Chapter 3, Sections 1-3

The In and Out Game: The Shape of Change

CBL Lab WHY ARE THERE MORE REDS IN MY BAG? MATHEMATICS CURRICULUM GRADE SIX. Florida Sunshine State Mathematics Standards

Solving Problems. PS1 Use and apply mathematics to solve problems, communicate and reason Year 1. Activities. PS1.1 Number stories 1.

Unit 11 Probability. Round 1 Round 2 Round 3 Round 4

High Precision Positioning Unit 1: Accuracy, Precision, and Error Student Exercise

Rubik's Revenge Solution Page

Lesson 2: Exponential Decay PRACTICE PROBLEMS

GRAPHS IN ECONOMICS. A p p e n d i x 1. A n s w e r s t o t h e R e v i e w Q u i z. Page 28

Purpose. Charts and graphs. create a visual representation of the data. make the spreadsheet information easier to understand.

PiXL Independence. Technology Answer Booklet KS4. CAD, CAM and ICT. Contents: I. Multiple Choice Questions 10 credits in total

Probability Interactives from Spire Maths A Spire Maths Activity

Using Graphing Skills

Multiple Choice: Identify the choice that best completes the statement or answers the question.

Chapter 1. Picturing Distributions with Graphs

Report on generating a colour circle for testing in screenprinting and inkjet

Notes: Displaying Quantitative Data

Microsoft Excel: Data Analysis & Graphing. College of Engineering Engineering Education Innovation Center

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Basic tasks Project 1 out 4 Data preparation

SEEING IS BELIEVING...OR IS IT? INSECTS LEVEL 1

SEM OPERATION IN LOW VACUUM MODE

15-388/688 - Practical Data Science: Visualization and Data Exploration. J. Zico Kolter Carnegie Mellon University Spring 2018

Graphing Guidelines. Controlled variables refers to all the things that remain the same during the entire experiment.

CPM Educational Program

Laboratory 2: Graphing

Population Dynamics: Predator/Prey Student Version

TOPIC 4 GRAPHICAL PRESENTATION

Making the most of graph questions

Information Graphics: Graphs, Schematic Diagrams, Symbols and Signs.

A game by Marcel Süßelbeck and Marco Ruskowski for 2 4 players Parfum transports players to the wonderful world of fragrances, which dates.

Laboratory 1: Uncertainty Analysis

SERIES Addition and Subtraction

Simulating Rectangles

Hedge and Hog Line Master 1 (Assessment Master)

COLOR VARIATION OVER TIME IN ROCK POCKET MOUSE POPULATIONS

Lab 4 Projectile Motion

Name. Introduction to Tables and Graphs

Reflection and Color

Identify a pattern then use it to predict what happens next:

Grade 6 Math Circles Winter 2013 Mean, Median, Mode

Mathematics Essential General Course Year 12. Selected Unit 3 syllabus content for the. Externally set task 2017

Mathematics (Project Maths Phase 2)

SS Understand charts and graphs used in business.

Hypergeometric Probability Distribution

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best

Problem Solving with the Coordinate Plane

Equipment for the basic dice game

General tips for all graphs Choosing the right kind of graph scatter graph bar graph

Unit 4 Review. Multiple Choice: Identify the choice that best completes the statement or answers the question.

3. Data and sampling. Plan for today

Query select title from inraw where title like '%water%' and itemtype like '%bk%';

UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING The Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Stat 20: Intro to Probability and Statistics

Addition and Subtraction

Graphs and Probability

LESSON 2: FREQUENCY DISTRIBUTION

2009 HSC Senior Science Marking Guidelines

Sample pages. Skip Counting. Until we know the pattern of numbers, we can count on from the last answer. Skip count and write the numbers as you go.

Transcription:

Data Visualisation Jingpeng Li 1 of 28 Data Visualisation Our eyes are very good at data mining We can spot patterns, trends and clusters instantly in plotted data Problems begin when data covers more than a few dimensions Provides a good way to choose a more powerful data mining technique 2 of 28 1

When to Use It Before starting a data mining project, to understand the problem To guide the data mining project and choice of technique To improve the use of data mining techniques, e.g. choosing a number of clusters To show the results of a data mining analysis 3 of 28 Scatter Plots Perfect for seeing how one variable changes with another Can be used to see how well one variable predicts another Can be used to see how two variables combine to form clusters or a state space 4 of 28 2

A Word on Graphs Always give your graph a title Always label both axes with variable names and, if appropriate units (e.g. Spend in pounds or Number of products sold) Always show the scale of both axes Bar charts are for frequencies (counts of things) Line graphs are for continuous variables 5 of 28 Scatter Plots Insurance Claims Here is an example from a previous lecture It is easy to see that younger males and older females make claims Age Claim No claim Male Female 3

Class Labels Notice how the plot uses colour to represent the outcome class Age Claim No claim Male Female Scatter Plots Machine Monitoring Another previous example machine health monitoring This plot shows the operating relationship between temperature and pressure in a machine 1.5 1 0.5 0 40 45 50 55 60 65 70 75-0.5-1 -1.5 4

Overlap Problems Look at this plot, which plots the number of marriages a person has had against number of children they have We cannot tell if there are 1 or 100 examples at each point 9 of 28 Jitter This is the same data, but with small random amounts added to each value Notice how the distribution of points is revealed 10 of 28 5

Colour as Frequency By using a colour scale (red, orange, yellow, blue in this example), the number of times a data point is represented may be shown Size can also be used in place of colour 11 of 28 Problems With Dimensions Plotting two things against each other is fine But what about looking at 3,4 5 or more variables? We have already seen one way of adding a third dimension colour. 6

Colour or Shape As a Dimension Category values can have their own colour or shape, or even a word or picture: Weight 50 40 30 20 10 Elephant Boa Ostrich Fish Fly Mouse 0 1 2 3 4 Legs Projection If your data comes from a system that has more dimensions than you can plot, you will probably suffer problems with projection Imagine a cloud of moths flying in front of a projector. They occupy 3D space, but the shadow they project onto a wall is in 2D The third dimension (distance from the wall) is lost 14 of 28 7

Projection The same happens with plotting data Plotting data in fewer dimensions than it contains means that you see the shadow of higher dimensions That spoils your plot 15 of 28 Example Column C is determined by A and B, but plotting B against C suggests only a weak relationship 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 20 40 60 80 100 120 If your plot could show A and B against C, the true shape of the relationship would appear. 16 of 28 8

The Same Data in 3D Software that can rotate 3D views helps you see that extra dimension http://www.math.uri.edu/~bkaskosz/flashmo/graph3d/ where x (0,1000), y (0,10) Solving Projection Problems Represent all the dimensions in some way Colour, shape etc. as we have seen Size to show the third dimension larger things being closer Software that is able to rotate any fly through data, switching dimensions to allow you to search for patterns Reduce the dimensionality 18 of 28 9

Dimensionality Reduction If two or more dimensions are related, they can be reduced to a single, new dimension without loosing too much information This new single dimension can be plotted against others to allow deeper relationships to be found 1.4 1.2 1 Always a loss of 0.8 0.6 information 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 Dimensionality Reduction Example techniques are: Principal components reduction Non-linear principal components reduction Auto-associative neural networks Disadvantage is that the new dimensions are combinations of the original ones and might not make as much sense 20 of 28 10

Keep Some Constant Here is an example with 3 inputs a, b, c and one output d, which is affected by all 3 inputs 60 50 40 30 20 10 0 0 1 2 3 4 5 Here is a plot of input c against output d. The other variables are projected down onto the chart to show a mess of values 21 of 28 Keep Some Constant 60 50 40 30 20 10 0 0 1 2 3 4 5 Now we keep a and b constant and just plot c against d. In other words, we choose a combination of a and b that appear several times and plot a and b for just those points. 11

Visualising Data for Users Scientific charts might not always be the best way to represent data to users or to the press Other visualisations can be more appropriate in the right setting 23 of 28 Infographics Methods for displaying summaries of data in an attractive way Less of an analysis tool More of presentation tool Static or interactive 24 of 28 12

Recent Example The project is to build a system that predicts the side effects that chemotherapy patients are likely to suffer on a daily basis throughout their treatment Here is a traditional time-series plot of a set of predictions: Probability of Nausea Over Time Easy enough for us to understand the higher the line, the larger the risk of suffering from the symptom, i.e. nausea. For people who are not used to looking at charts, there might be a better way of presenting the same information In this example, we tried to use the familiar concept of a diary to present the same data Looks less like a scientific chart, but makes it much easier to see that planning a weekend away over the 7th and 8th of March might not be the best time to choose. Recent Example 13

WhereDoesMyMoneyGo www.wheredoesmymoneygo.org/bubbletree-map.html#/~/total 27 of 28 Hans Rosling's Famous Video It combine enormous quantities of public data to reveal the story of the world's past, present and future development. https://www.youtube.com/watch?v=jbksrlysojo How many dimensions of the data are used? Income (x), life span (y) population (circle size), country region (circle colour) time, country name 14