Nomograms for visualising relationships between three variables

Similar documents
Of Donkeys and Nomograms

Chapter 3. The Normal Distributions. BPS - 5th Ed. Chapter 3 1

Density Curves. Chapter 3. Density Curves. Density Curves. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition.

Univariate Descriptive Statistics

TO PLOT OR NOT TO PLOT?

Statistics, Probability and Noise

Proportional Nomograms

NUMERICAL DATA and OUTLIERS

Appendix III Graphs in the Introductory Physics Laboratory

How can it be right when it feels so wrong? Outliers, diagnostics, non-constant variance

Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University

The Age of Graphical Computing

10 Wyner Statistics Fall 2013

Regression. Albert Satorra. Mètodes Estadístics, UPF, hivern 2013

ESSENTIAL MATHEMATICS 1 WEEK 17 NOTES AND EXERCISES. Types of Graphs. Bar Graphs

STAB22 section 2.4. Figure 2: Data set 2. Figure 1: Data set 1

Experiment P11: Newton's Second Law Constant Force (Force Sensor, Motion Sensor)

Lesson 17. Student Outcomes. Lesson Notes. Classwork. Example 1 (5 10 minutes): Predicting the Pattern in the Residual Plot

proc plot; plot Mean_Illness*Dose=Dose; run;

Prediction Method of Beef Marbling Standard Number Using Parameters Obtained from Image Analysis for Beef Ribeye

LEVEL 9 Mathematics Observation

IE 361 Module 7. Reading: Section 2.5 of Revised SQAME. Prof. Steve Vardeman and Prof. Max Morris. Iowa State University

Science Binder and Science Notebook. Discussions

NEW ASSOCIATION IN BIO-S-POLYMER PROCESS

Section 3 Correlation and Regression - Worksheet

Dumfries and Galloway Council. Relationships Revision. Mathematics. Suzanne Stoppard. National 4

ROBUST DESIGN -- REDUCING TRANSMITTED VARIATION:

Recreational catch per unit effort of hogfish (Lachnolaimus maximus) in the Southeast US using MRFSS-MRIP intercept data,

Mathematics Essential General Course Year 12. Selected Unit 3 syllabus content for the. Externally set task 2017

Graphing Techniques. Figure 1. c 2011 Advanced Instructional Systems, Inc. and the University of North Carolina 1

IES, Faculty of Social Sciences, Charles University in Prague

Edexcel GCSE Mathematics A 1387 Paper 5 (Non-Calculator)

Department of Mechanical and Aerospace Engineering. MAE334 - Introduction to Instrumentation and Computers. Final Examination.

Mason Chen (Black Belt) Morrill Learning Center, San Jose, CA

NEXT-GENERATION ACOUSTIC WIND PROFILERS

11 Wyner Statistics Fall 2018

Mathematics. Foundation. Set E Paper 2 (Calculator)

Cambridge Secondary 1 Progression Test. Mark scheme. Mathematics. Stage 9

Research Article A New Approach to Investigation of the Relationship of VLF Signals by Using Longitudinal Analysis Models

Lampiran 1: Data Investasi Perusahaan GE, US, GM dan WEST

CHAPTER 13A. Normal Distributions

abc Mark Scheme Mathematics 4301 Specification A General Certificate of Secondary Education Paper 2 Foundation 2008 examination - November series

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. B) Blood type Frequency

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Analysis of Machining Parameters of Endilling Cutter on Surface Roughness for Aa 6061-T6

Datum reference frame Position and shape tolerances Tolerance analysis

Module 5. Simple Linear Regression and Calibration. Prof. Stephen B. Vardeman Statistics and IMSE Iowa State University.

Page 21 GRAPHING OBJECTIVES:

Section 1.5 Graphs and Describing Distributions

Sampling distributions and the Central Limit Theorem

Department of Statistics and Operations Research Undergraduate Programmes

Floods On The Minnesota River Planning For St. Peter

Probability WS 1 Counting , , , a)625 b)1050c) a)20358,520 b) 1716 c) 55,770

Edexcel GCSE 5505/05. Mathematics A Paper 5 (Non-Calculator) Higher Tier Tuesday 11 November 2003 Morning Time: 2 hours

Recommendation Systems UE 141 Spring 2013

Spring 2017 Math 54 Test #2 Name:

Assignment 8 Sampling, SPC and Control chart

Paul Schafbuch. Senior Research Engineer Fisher Controls International, Inc.

Lecture 5 Understanding and Comparing Distributions

Experiment 2: Transients and Oscillations in RLC Circuits

Fitting Probability Distribution Curves to Reliability Data

(3 pts) 1. Which statements are usually true of a left-skewed distribution? (circle all that are correct)

Unit 1: Statistics and Probability (Calculator) Wednesday 9 November 2011 Afternoon Time: 1 hour 15 minutes

**Gettysburg Address Spotlight Task

Linear Regression Exercise

Key-Words: - Dynamic, Cement, Mill, Grinding, Model, Uncertainty, PID, tuning, robustness, sensitivity

Data Analysis Part 1: Excel, Log-log, & Semi-log plots

Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.

Projecting Fantasy Football Points

Fisheries Research 145 (2013) Contents lists available at SciVerse ScienceDirect. Fisheries Research

Foundations for Functions

Toolwear Charts. Sample StatFolio: toolwear chart.sgp. Sample Data: STATGRAPHICS Rev. 9/16/2013

Separating the Signals from the Noise

RECOMMENDATION ITU-R SM.1268*

THE NORTH LONDON INDEPENDENT GIRLS SCHOOLS CONSORTIUM MATHEMATICS

Additional Practice. Name Date Class

Functions: Transformations and Graphs

Honors Chemistry Summer Assignment

Statistics is the study of the collection, organization, analysis, interpretation and presentation of data.

PSLE MATHEMATICS (FOUNDATION) PAPER 1 (BOOKLET A)

Exam 2 Review. Review. Cathy Poliak, Ph.D. (Department of Mathematics ReviewUniversity of Houston ) Exam 2 Review

Obs location y

Department of Mechanical and Aerospace Engineering. MAE334 - Introduction to Instrumentation and Computers. Final Examination.

Student's height (in)

This manual describes the Motion Sensor hardware and the locally written software that interfaces to it.

The purpose of this study is to show that this difference is crucial.

Module 7. Accounting for quantization/digitalization e ects and "o -scale" values in measurement

ES 111 Mathematical Methods in the Earth Sciences Lecture Outline 6 - Tues 17th Oct 2017 Functions of Several Variables and Partial Derivatives

Plotting Points & The Cartesian Plane. Scatter Plots WS 4.2. Line of Best Fit WS 4.3. Curve of Best Fit WS 4.4. Graphing Linear Relations WS 4.

THOMAS WHITHAM SIXTH FORM

RRC Vehicular Communications Part II Radio Channel Characterisation

Biggar High School Mathematics Department. S1 Block 1. Revision Booklet GOLD

Repeated Measures Twoway Analysis of Variance

Lesson Sampling Distribution of Differences of Two Proportions

Mathematics (Project Maths Phase 2)

!"#$%&'("&)*("*+,)-(#'.*/$'-0%$1$"&-!!!"#$%&'(!"!!"#$%"&&'()*+*!

IDEOLOG.PAS: Ideal Filters and their Approximations. 1. Page Parameters 1 SPECIFY THE PAGE PARAMETERS. Do you want PostScript Y/N?

Automatic hardness testing devices

FINAL REPORT. On Project Supplemental Guidance on the Application of FHWA s Traffic Noise Model (TNM) APPENDIX L Tunnel Openings

Appendix C: Graphing. How do I plot data and uncertainties? Another technique that makes data analysis easier is to record all your data in a table.

Transcription:

Nomograms for visualising relationships between three variables Jonathan Rougier 1 Kate Milner 2 1 Dept Mathematics, Univ. Bristol 2 Crossroads Veterinary Centre, Buckinghamshire UseR! 2011, August 2011, Warwick

Background A donkey drawn by my housemate Caroline (in the pub).

Background A donkey drawn by my housemate Caroline (in the pub). This donkey is not enjoying being weighed.

Background A donkey drawn by my housemate Caroline (in the pub). This donkey is not enjoying being weighed. A happy baby donkey being measured.

Usual practice The standard practice is to fit a relationship log(weight) = a + b log(heartgirth) + c log(height) to adult donkeys in good condition, and possibly other relationships for juveniles and donkeys in poor condition. What value can we statisticians add?

Usual practice The standard practice is to fit a relationship log(weight) = a + b log(heartgirth) + c log(height) to adult donkeys in good condition, and possibly other relationships for juveniles and donkeys in poor condition. What value can we statisticians add? 1. Explicit inclusion of factors for Age, Gender, and BCS (Body Condition Score);

Usual practice The standard practice is to fit a relationship log(weight) = a + b log(heartgirth) + c log(height) to adult donkeys in good condition, and possibly other relationships for juveniles and donkeys in poor condition. What value can we statisticians add? 1. Explicit inclusion of factors for Age, Gender, and BCS (Body Condition Score); 2. Box-Cox assessment of the appropriate transformation of the lefthand side (boxcox in the MASS package);

Usual practice The standard practice is to fit a relationship log(weight) = a + b log(heartgirth) + c log(height) to adult donkeys in good condition, and possibly other relationships for juveniles and donkeys in poor condition. What value can we statisticians add? 1. Explicit inclusion of factors for Age, Gender, and BCS (Body Condition Score); 2. Box-Cox assessment of the appropriate transformation of the lefthand side (boxcox in the MASS package); 3. Initial model to include interactions, then stepwise reduction to maximise AIC (stepaic in the MASS package).

Building the statistical model Box-Cox plot for transformations of the response favours square root

Building the statistical model Backwards stepwise deletion removes all interaction terms :) and Gender completely Stepwise Model Path Analysis of Deviance Table Initial Model: sqrt(weight) ~ BCSis + Gender + Age + log(heartgirth) + log(height) + log(heartgirth):log(height) + BCSis:log(HeartGirth) + Gender:log(HeartGirth) + Age:log(HeartGirth) + BCSis:log(Height) + Gender:log(Height) + Age:log(Height) Final Model: sqrt(weight) ~ BCSis + Age + log(heartgirth) + log(height) Step Df Deviance Resid. Df Resid. Dev AIC 1 504 78.14041-972.7873 2 - Age:log(HeartGirth) 5 0.37630656 509 78.51672-980.1883 3 - BCSis:log(HeartGirth) 4 0.49082973 513 79.00755-984.8168 4 - BCSis:log(Height) 4 0.41453445 517 79.42208-989.9858 5 - Age:log(Height) 5 0.91895494 522 80.34104-993.7620 6 - Gender:log(Height) 2 0.13986420 524 80.48090-996.8210 7 - log(heartgirth):log(height) 1 0.00927524 525 80.49018-998.7587 8 - Gender:log(HeartGirth) 2 0.31844543 527 80.80862-1000.6226 9 - Gender 2 0.06633122 529 80.87496-1004.1787

Building the statistical model Resulting model has additive adjustments for BCS and Age Call: lm(formula = sqrt(weight) ~ BCSis + Ageis + log(heartgirth) + log(height), data = donk, subset = subset) Residuals: Min 1Q Median 3Q Max -1.016797-0.275575-0.005298 0.255089 1.519246 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -58.89411 2.42162-24.320 < 2e-16 *** BCSis1.5-0.49820 0.17939-2.777 0.00568 ** BCSis2-0.24978 0.08253-3.026 0.00260 ** BCSis3.5 0.37485 0.05833 6.426 2.91e-10 *** BCSis4 0.57031 0.11024 5.173 3.27e-07 *** Ageis<2yo -0.35353 0.07676-4.605 5.16e-06 *** Ageis5-10yo 0.19782 0.06255 3.162 0.00165 ** Ageis>10yo 0.27681 0.05070 5.459 7.35e-08 *** log(heartgirth) 10.22732 0.50604 20.211 < 2e-16 *** log(height) 4.84926 0.60029 8.078 4.45e-15 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.392 on 531 degrees of freedom Multiple R-squared: 0.8724, Adjusted R-squared: 0.8703 F-statistic: 403.5 on 9 and 531 DF, p-value: < 2.2e-16

Nomogram for our donkeys Our statistical estimate of Weight is Weight = ( ) 2 58.9 + 10.2 log HeartGirth + 4.8 log Height where indicates adjustments to be made for BCS and Age. How do we turn this into something that can be used in the field? Most statisticians would immediately think of a contour plot, which would work for any relationship of the form f (u, v) = w. This requires two straight lines and an interpolation.

Nomogram for our donkeys Our statistical estimate of Weight is Weight = ( ) 2 58.9 + 10.2 log HeartGirth + 4.8 log Height where indicates adjustments to be made for BCS and Age. How do we turn this into something that can be used in the field? Most statisticians would immediately think of a contour plot, which would work for any relationship of the form f (u, v) = w. This requires two straight lines and an interpolation. For a large subset of such relationships, though, we can construct a nomogram, which needs one straight line and no interpolation.

Nomogram for our donkeys Additive corrections: BCS: 1.5, -11kg 2, -6kg 3.5, +10kg 4, +16kg Age: <2yo, -7kg 5-10yo, +5kg >10yo, +7kg

Nomogram for our donkeys Additive corrections: BCS: 1.5, -11kg 2, -6kg 3.5, +10kg 4, +16kg Age: <2yo, -7kg 5-10yo, +5kg >10yo, +7kg A healthy (BCS 2.5 or 3) 2-5yo donkey with a HeartGirth of 117cm and a Height of 102cm has a predicted weight of about 150kg.

Digression on nomograms Nomograms are visual tools for representing the relationship between three or more variables, in such a way that the value of one variable can be inferred from the values of the others by drawing a straight line. f 1 (u) + f 2 (v) = f 3 (w) gives a parallel scale-nomogram, like ours; We could also have used an N chart, used for f 1 (u)/f 2 (v) = f 3 (w); Proportional nomograms can handle more than three variables, e.g. in two stages using a pivot; An entire theory based around determinants allows the construction of nomograms for much more general relationships; typically these are curved scale nomograms.

Digression on nomograms All figures from Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.

Digression on nomograms All figures from Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.

Digression on nomograms All figures from Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.

Digression on nomograms All figures from Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.

Back to the donkeys! What is the effect of replacing sqrt(weight) with log(weight), which would be the more usual transformation?

Back to the donkeys! What is the effect of replacing sqrt(weight) with log(weight), which would be the more usual transformation? Gives slightly higher weights ( 5kg) for small and large donkeys. This difference is smaller than the residual standard deviation, which is 10kg.

Back to the donkeys! What is the effect of replacing sqrt(weight) with log(weight), which would be the more usual transformation? Gives slightly higher weights ( 5kg) for small and large donkeys. This difference is smaller than the residual standard deviation, which is 10kg.

Back to the donkeys! Things are a lot less clear if we try to visualise this using a contour plot.

Different relationships on one plot Height and Length seem to be interchangeable; so could estimate Weight with either.

Different relationships on one plot Height and Length seem to be interchangeable; so could estimate Weight with either. Estimate using Length can be added to existing nomogram, to give vets the choice of which measurement to make.

Different relationships on one plot Height and Length seem to be interchangeable; so could estimate Weight with either. Estimate using Length can be added to existing nomogram, to give vets the choice of which measurement to make.

Different types of donkey Different types of donkey can be displayed on the same plot. Here are our Kenyan donkeys, shown with a Length covariate.

Different types of donkey Different types of donkey can be displayed on the same plot. Here are our Kenyan donkeys, shown with a Length covariate. This is for Moroccan donkeys. They tend to be a bit lighter for the same size.

Different types of donkey Different types of donkey can be displayed on the same plot. Here are our Kenyan donkeys, shown with a Length covariate. This is for Moroccan donkeys. They tend to be a bit lighter for the same size.

Summary Visualisation is an important part of both data analysis and statistical communication. For relating three variables, contour plots will always work, but where they are available, nomograms might be clearer and simpler to use. Our donkey nomogram will be used by practicing vets in Kenya, but it has also been a useful tool for us in model choice and model comparison. Nomograms are also available for some relationships between four or more variables. One catch: Contour plots can be overlaid on a field showing predictive uncertainties. Unfortunately it is not as easy to visualise predictive uncertainty with a nomogram.

Resources Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493. http://myreckonings.com/wordpress/wp-content/uploads/ JournalArticle/The Lost Art of Nomography.pdf Ron Doerfler, Creating Nomograms with the PyNomo Software, Version 1.1 for PyNomo Release 0.2.2. http://www.myreckonings.com/pynomo/ CreatingNomogramsWithPynomo.pdf Leif Roschier, 2009, http://www.pynomo.org/