Nessie is alive! Role of statistics, bias and reproducibility in scientific research Gerco Onderwater c.j.g.onderwater@rug.nl
4/23/15 2 Loch Ness, Scotland
4/23/15 3 Legendary monster Saint Adomnán of Iona describes the wonders by Saint Columba; in 565 AD, [At] the river Ness a poor unfortunate little fellow, whom some water monster had a little before snatched at as he was swimming, and bitten with a most savage bite...
4/23/15 4 The surgeon's photo April 19, 1934
4/23/15 5 Sightings aplenty!
4/23/15 6 Controversy Legend, fact or fake?
4/23/15 7 Some opinions I think it is just a lot of tripe! I don't quite believe it. I'm wondering if it's a stunt If I stay here much longer... I shall see it It may possibly exist I don't know what to think
4/23/15 8 Getting beyond opinions Time for research!
4/23/15 9 Scientific method 1. 2. 3. 4. 5. 6. 7. 8. Define a question Gather information and resources (observe) Form an explanatory hypothesis Test the hypothesis by performing an experiment and collecting data in a reproducible manner Analyze the data Interpret the data and draw conclusions that serve as a starting point for a new hypothesis Publish results Retest (frequently done by other scientists)
4/23/15 10 Experimentation Cannot deal with full range of possibilities
4/23/15 11 Experimentation Cannot deal with full range of possibilities Select a representative sample Perform measurement Infer properties of full population
4/23/15 12 Probability Using properties of population & selection process, probability for an outcome predictable with certainty, outcome itself is subject to chance Truth Observation Observation Observation Observation Observation Observation Observation Observation Observation Observation
4/23/15 13 Demo 365 364 363 362 365 n p0 (n)= 0.02 365 365 365 365 365 for n=50
4/23/15 14 Likelihood In research we have to do the reverse Which explanation most likely given this observation? Observation Truth Speculation Wild guess Lie Lie Lie Lie lie Lie Lie
4/23/15 15 Sta tis tics branch of mathematics dealing with collection, analysis, interpretation, and presentation of numerical data Descriptive statistics summarize data for concise overview the mean grade of HC students is... Inferential statistics make claims about population based on sample my HC students were OK, so most likely they're all
4/23/15 16 Descriptive statistics 6, 1, 2, 6, 2, 3, 6, 5, 4, 5, 5, 4, 2, 5, 4, 3, 4, 2, 2, 1, 1, 4, 5, 4, 5, 1, 5, 1, 1, 3, 5, 2, 3, 1, 2, 4, 2, 4, 5, 3, 1, 3, 6, 3, 5, 1, 5, 5, 3, 4, 3, 1, 4, 4, 3, 5, 4, 5, 1, 1, 1, 5, 1, 2, 2, 1, 4, 5, 2, 3, 6, 4, 2, 4, 4, 2, 2, 3, 1, 6, 1, 4, 1, 3, 3, 6, 6, 3, 2, 3, 4, 5, 4, 1, 5, 5, 2, 2, 3, 4, 3, 5, 4, 5, 6, 1, 2, 6, 2, 2, 2, 6, 3, 1, 5, 1, 2, 2, 6, 5, 1, 2, 3, 3, 5, 2, 5, 6, 4, 5, 4, 6, 1, 3, 6, 1, 4, 6, 6, 1, 3, 2, 1, 6, 5, 3, 3, 5, 3, 2, 4, 1, 2, 3, 4, 5, 4, 1, 2, 6, 4, 4, 4, 4, 2, 3, 6, 5, 1, 5, 3, 1, 5, 6, 3, 1, 2, 1, 3, 4, 5, 1, 2, 4, 1, 5, 5, 5, 3, 4, 5, 3, 5, 3, 5, 6, 3, 3, 2, 2, 6, 5, 6, 6, 6, 2, 1, 3, 5, 3, 1, 6, 1, 3, 1, 2, 6, 4, 4, 5, 4, 2, 1, 2, 1, 6, 5, 5, 2, 3, 6, 4, 3, 6, 4, 1, 5, 3, 4, 6, 6, 5, 3, 2, 4, 5, 4, 4, 6, 4, 3, 4, 5, 1, 3, 1, 1, 6, 3, 5, 6, 3, 5, 3, 2, 3, 6, 3, 6, 1, 5, 4, 3, 6, 6, 5, 6, 3, 2, 1, 2, 1, 1, 2, 2, 6, 3, 5, 2, 6, 6, 1, 3, 4, 4, 4, 2, 4, 1, 5, 3, 2, 6, 2, 3, 5, 6, 1, 3, 3, 6, 5, 1, 3, 6, 5, 6, 2, 4, 3, 1, 5, 1, 6, 6, 4, 1, 5, 3, 4, 5, 4, 2, 3, 6, 3, 2, 1, 2, 1, 5, 1, 3, 4, 6, 1, 4, 2, 5, 6, 2, 6, 4, 6, 1, 3, 4, 2, 3, 4, 6, 2, 1, 2, 6, 1, 3, 1, 4, 5, 3, 3, 2, 4, 1, 3, 2, 5, 3, 1, 4, 6, 4, 4, 5, 2, 4, 4, 1, 6, 2, 4, 3, 4, 4, 6, 6, 1, 1, 3, 6, 5, 3, 1, 5, 5, 2, 2, 1, 4, 5, 4, 1, 3, 6, 1, 3, 2, 5, 5, 5, 2, 3, 6, 2, 4, 1, 4, 2, 2, 4, 6, 4, 4, 4, 4, 6, 6, 5, 3, 5, 2, 4, 3, 5, 5, 5, 3, 6, 4, 6, 1, 6, 5, 1, 1, 5, 1, 4, 1, 5, 6, 3, 2, 5, 4, 5, 3, 3, 3, 1, 1, 3, 1, 6, 5, 3, 4, 2, 5, 2, 2, 3, 1, 5, 1, 4, 2, 4, 6, 6, 2, 4, 5, 5, 1, 4, 1, 5, 5, 6, 4, 3, 3, 1, 3, 3, 5, 3, 1, 1, 4, 5, 6, 1, 1, 3, 2, 5, 2, 1, 6, 5, 1, 5, 2, 6, 5, 5, 4, 1, 4, 5, 6, 5, 5, 3, 4, 4, 3, 3, 4, 1, 2 4, 2, 3, 3, 2, 1, 4, 2, 2, 1, 4, 5, 5, 3, 1, 1, 1, 2, 1, 2, 6, 6, 1, 3, 3, 5, 2, 4, 2, 3, 6, 3, 3, 1, 6, 6, 5, 4, 3, 2, 5, 4, 1, 1, 5, 4, 5, 4, 4, 5, 6, 6, 1, 4, 4, 4, 1, 1, 5, 2, 1, 4, 5, 2, 2, 4, 5, 3, 4, 1, 1, 5, 1, 1, 6, 5, 5, 4, 1, 5, 4, 2, 4, 4, 5, 3, 4, 6, 2, 1, 5, 4, 3, 5, 2, 3, 2, 3, 4, 2, 5, 4, 3, 1, 2, 3, 2, 3, 1, 1, 1, 5, 6, 5, 5, 4, 6, 6, 2, 3, 5, 1, 2, 4, 4, 1, 2, 6, 3, 6, 6, 3, 6, 3, 4, 6, 1, 5, 5, 4, 2, 3, 6, 1, 6, 1, 3, 6, 5, 4, 4, 6, 2, 1, 1, 5, 5, 1, 4, 4, 6, 6, 6, 3, 6, 2, 6, 1, 1, 2, 6, 5, 3, 4, 2, 3, 6, 4, 5, 6, 3, 6, 3, 1, 2, 1, 5, 4, 5, 6, 2, 1, 3, 2, 6, 1, 4, 1, 2, 6, 3, 1, 3, 3, 3, 3, 3, 4, 4, 3, 6, 2, 5, 4, 6, 5, 6, 4, 1, 1, 1, 1, 1, 4, 5, 1, 2, 1, 2, 3, 5, 6, 5, 4, 5, 3, 4, 1, 4, 3, 4, 1, 4, 1, 4, 5, 6, 5, 3, 5, 6, 2, 6, 6, 2, 1, 6, 5, 6, 3, 3, 6, 4, 5, 5, 4, 6, 5, 1, 1, 6, 3, 6, 5, 3, 6, 5, 3, 4, 3, 6, 5, 1, 2, 6, 6, 3, 2, 6, 6, 5, 5, 5, 2, 5, 3, 1, 2, 4, 3, 2, 1, 6, 6, 2, 3, 3, 2, 4, 2, 5, 4, 2, 6, 6, 3, 6, 4, 2, 4, 3, 2, 4, 1, 6, 1, 2, 4, 1, 5, 6, 4, 6, 4, 3, 4, 4, 5, 4, 3, 4, 2, 6, 5, 3, 2, 5, 2, 6, 2, 4, 1, 4, 4, 5, 3, 6, 4, 4, 6, 3, 2, 5, 5, 4, 3, 1, 6, 1, 4, 3, 5, 5, 6, 2, 1, 1, 4, 6, 3, 4, 6, 2, 3, 5, 4, 4, 5, 3, 5, 3, 5, 4, 6, 3, 3, 6, 4, 2, 1, 2, 3, 4, 6, 1, 5, 1, 3, 4, 1, 6, 5, 3, 1, 2, 2, 1, 2, 2, 2, 6, 3, 6, 3, 2, 5, 4, 6, 2, 2, 2, 1, 5, 1, 5, 5, 2, 3, 4, 2, 4, 3, 2, 1, 2, 1, 5, 2, 4, 5, 2, 2, 5, 1, 6, 1, n N 1 171 2 153 3 169 4 176 5 174 6 157 μ = 3.50, σ = 1.69 χ²/ndf = 2.75/5 p = 0.74 6 x 166 + 4 Deviations can & must be there!
4/23/15 17 Descriptive statistics sample Includes fitting Parameters calculated from observations Parameters (thus) have uncertainty Functional form is assumed... My first publication...
4/23/15 18 Inferential statistics Check whether your assumptions are correct Best match doesn't mean good match (just that nothing else was better) Statistical fluctuation are predictable can test goodness-of-fit e.g. with χ² also χ² has fluctuations, follows χ²-distribution
4/23/15 19 Getting a good fit Great challenge: getting a good fit with 1010 events
4/23/15 20 Quality testing Put X's in the grid Give each square 50% change for X Count number of X's 0 6, 19 25 7, 8, 17, 18 9, 10, 15, 16 11 14 : : : : 1 5 16 29 Humans ill suited for randomness
4/23/15 21 Inferential statistics Decide between multiple truths (hypotheses) Match observation with expectation (with likelihood) Also likelihood can be calculated with certainty
4/23/15 22 Einstein Many experiments may prove me right, but it takes only one to prove me wrong! Make sure you pick the right one!
4/23/15 23 Einstein Many experiments may prove me right, but it takes only one to prove me wrong! Make sure you pick the right one!
4/23/15 24 Einstein Many experiments may prove me right, but it takes only one to prove me wrong! Make sure you pick the right one! risk for bias
4/23/15 25 Types of bias Intellectual phase locking Experimental imperfections Correlations Find what you want to find Stop looking at positive 'proof' Keep looking until positive 'proof' Fix problems until positive 'proof'
4/23/15 26 Reproduce independently Support claim of discovery Expose unfortunate mistakes Avoid fraud!
4/23/15 27 So what about Nessie? Wishful thinking or historically founded?
4/23/15 28 Systematic observation
4/23/15 29 Deep-scan Systematic scan with sonar
4/23/15 30 Hoax?
4/23/15 31 Nessie in Queensland, AUS she's on vacation!
4/23/15 32
Thank you for your attention!