How can it be right when it feels so wrong? Outliers, diagnostics, non-constant variance
|
|
- Elvin Sims
- 5 years ago
- Views:
Transcription
1 How can it be right when it feels so wrong? Outliers, diagnostics, non-constant variance D. Alex Hughes November 19, 2014 D. Alex Hughes Problems? November 19, / 61
2 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
3 Thanks to Christina and Simeon Agreement with our Best Practices What did they do that we know we should do? What did they do that we haven t talked about? What if anything was unclear in their presentations? D. Alex Hughes Problems? November 19, / 61
4 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
5 Outliers, generally Definition An outlier is an observation whose response-variable value is conditionally unusual given the value of the explanatory variable. Recall: What are we minimizing? What will be the effect of an outlier on our estimated regression coefficients? Influence on Coef = Leverage Discrepancy D. Alex Hughes Problems? November 19, / 61
6 Outliers, generally Problems May unduly influence estimation results; or, May identify model is missing important features of data D. Alex Hughes Problems? November 19, / 61
7 An Example y[ c(5, 6, 7)] x[ c(5, 6, 7)] D. Alex Hughes Problems? November 19, / 61
8 An Example y vals x vals D. Alex Hughes Problems? November 19, / 61
9 An Example y vals x vals D. Alex Hughes Problems? November 19, / 61
10 An Example y vals x vals D. Alex Hughes Problems? November 19, / 61
11 An Example Definition An outlier is an observation whose response-variable value is conditionally unusual given the value of the explanatory variable. So, in these cases, how can we talk about the leverage of the outlying data point? D. Alex Hughes Problems? November 19, / 61
12 An Example y[ c(5, 6, 7)] y vals x[ c(5, 6, 7)] x vals y vals y vals D. Alex Hughes Problems? November 19, / 61
13 Another Example Reported and Measured Weight weight reportedweight D. Alex Hughes Problems? November 19, / 61
14 Another Example Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) weight e-11 *** I(sex == "F")TRUE < 2e-16 *** weight:sexm < 2e-16 *** D. Alex Hughes Problems? November 19, / 61
15 Another Example Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) weight <2e-16 *** I(sex == "F")TRUE weight:sexm D. Alex Hughes Problems? November 19, / 61
16 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
17 Regression Diagnostics Goal: Assess whether the regression results are meaningful, stable, and comply with the assumptions underlying the regression model. 1 Are regression assumptions met? 2 Are there any influential or outlying values messing things up? D. Alex Hughes Problems? November 19, / 61
18 Recall Regression Assumptions Assumptions: 1 There is a linear relationship between X and Y 2 ɛ N(0, σ 2 ) 3 X are fixed (OR!) if X are random, X is orthogonal to ɛ Keep these in mind as we move through tools for regression diagnostics. The first two are sometimes easy to see visually. The third is more complicated. D. Alex Hughes Problems? November 19, / 61
19 Overview Residual Plots Regular residuals Studentized residuals Externally studentized residuals Leverages DFBetas Cook s Distance DFITS Partial Regression Plots D. Alex Hughes Problems? November 19, / 61
20 Residual Plots Recall: Residuals are variance in Y unexplained by the X s e i = Y i Ŷ i D. Alex Hughes Problems? November 19, / 61
21 Residual Plots Calculate residuals: e i = Y i Ŷ i = Y i Xˆβ = Y i â + ˆbx i There are two (at least) plots that we ll use to diagnose outliers using residuals 1 AFitted vs. Residual Plot plots ŷ on the x-axis and e on the y-axis. Are there places in the prediction of y that we re more off? 2 A Fitted vs. X Plot plots X (k) on the x-axis and e on the y-axis. Why would we choose one or the other? D. Alex Hughes Problems? November 19, / 61
22 Residual Plots: What to Watch For 1 Ideal: cloud, normal distribution 2 The Wedge 3 Nonlinearity 4 Outliers D. Alex Hughes Problems? November 19, / 61
23 Residual Plots: Ideal Scatterplot of XvY x y D. Alex Hughes Problems? November 19, / 61
24 Residual Plots: Ideal Residuals of Regression x y D. Alex Hughes Problems? November 19, / 61
25 Residual Plots: Ideal Residuals v X x resid(m3) D. Alex Hughes Problems? November 19, / 61
26 Residual Plots: Ideal hat(y) v resid predict(m3) resid(m3) D. Alex Hughes Problems? November 19, / 61
27 Residual Plots: Wedge Scatterplot of XvY x y D. Alex Hughes Problems? November 19, / 61
28 Residual Plots: Wedge X v Resid x resid(m4) D. Alex Hughes Problems? November 19, / 61
29 Residual Plots: Curve curvy? y x D. Alex Hughes Problems? November 19, / 61
30 Residual Plots: Curve curvy? x resid(m5) D. Alex Hughes Problems? November 19, / 61
31 Residual Plots: Outliers Reported and Measured Weight weight reportedweight D. Alex Hughes Problems? November 19, / 61
32 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
33 We could think of our fitted values as coming from the statement Ŷ j = h 1j Y 1 + h 2j Y h nj Y n = n h ij Y i i=1 Definition The hat-value h ij captures the contribution of Y i to the fitted value Ŷ j. If h ij is large, then the i th value can have a large impact on the j th fitted value. Theorem The hat-value h i h ii summarizes the impact of some Y i on all the Ŷ. And so, n h i h ii = j=1 h 2 ij D. Alex Hughes Problems? November 19, / 61
34 Hat properties Additional Properties The hat values are bounded on the range ( ) 1 n, 1 The average hat-value is h = (k + 1)/n where k is the number of regressors in the model. In multiple regression, h i measured distance from the centroid of the Xs. Then, multivariate outliers in the X-space are high-leverage observations. D. Alex Hughes Problems? November 19, / 61
35 Leverages - Matrix Version Matrix Version (the diagonals of the hat matrix ) Ŷ = X ˆβ = X (X X ) 1 X Y = Hy H X (X X ) 1 X Definition The hat-matrix, H = X (X X ) 1 X, when postmultiplied by y turns y into ŷ. Nothing about Y here just about unusual combinations of independent variable values Could be a large X, could be an odd combination of X s D. Alex Hughes Problems? November 19, / 61
36 library(car) data(duncan) attach(duncan) lm.out <- lm(prestige ~ income + education) plot(hatvalues(lm.out)) abline(h= c(2,3)*mean(hatvalues(lm.out))) identify(1:45, hatvalues(lm.out), row.names(duncan)) D. Alex Hughes Problems? November 19, / 61
37 Studentized Residuals The errors fed into regression ɛ i may have constant variance, but the residuals do not. In particular,. V (E i ) = σ 2 ɛ (1 h i ) Data-points with high leverage (hat-values) tend to have smaller residuals. Intuitive? Pull the regression line toward them. We could do a pretty simple standardization to figure out if a residual is strange by scaling it over the V Ei. E i E i S E 1 hi But, this is a drag because the numerator and denominator are not independent; so doesn t follow a t-distribution... D. Alex Hughes Problems? November 19, / 61
38 Studentized Residuals Solution: Calculate the i th residual, standardizing from a regression that excludes it. Ei E i = S E( i) 1 hi which follows a t-distribution with df = n k 2. S E( i) : our estimate of the standard deviation of the unobserved errors 1 h i : A measure of each observation s influence Follows a t-distribution with n k 2 degrees of freedom D. Alex Hughes Problems? November 19, / 61
39 Studentized Residuals Example x resid(m6) rstudent(m6) D. Alex Hughes Problems? November 19, / 61
40 Warning Language is Inconsistent Externally Studentized Residuals Internally studentized residuals, Deleted studentized residuals studentized residuals... D. Alex Hughes Problems? November 19, / 61
41 Leverage Remember how we defined influence on the regression coefficients? Definition Influence on Coef = Leverage Discrepancy We ve built mechanisms to identify discrepancy (the residual plots, hat-values, and studentized residuals from the last section). Here, we assess leverage D. Alex Hughes Problems? November 19, / 61
42 DFBETA(S) What is the impact on each coefficient of a regression (D i ) as a result of deleting some observation j? D ij = B j B ( i) j, i, j Where B j is the least squares coefficient, and B ( i) j is that same coefficient without observation j. Like most things...it s nice to scale this by the SE. And so... D ij = D ij SE ( i) (B j ) D. Alex Hughes Problems? November 19, / 61
43 DFBETA(S) D. Alex Hughes Problems? November 19, / 61
44 Cook s Distance D i = E i 2 k + 1 h i 1 h i Provides a summary index of influence on the coefficients Distance between the vector of β s including and excluding observation i. Combines all the coefficients into a single measure. Kind of like F-test - based on SSR -sum of squared (standardized) residuals D i = E i h i 1 h i D. Alex Hughes Problems? November 19, / 61
45 Cook s Distance D. Alex Hughes Problems? November 19, / 61
46 What s the big deal? In previous examples, outliers were very obvious All these, however, were bivariate regression With multivariate regression, strange combinations of x values can make hidden values influential. A regular residual plot won t pick these up! D. Alex Hughes Problems? November 19, / 61
47 Not always so obvious D. Alex Hughes Problems? November 19, / 61
48 Not always so obvious D. Alex Hughes Problems? November 19, / 61
49 Not always so obvious D. Alex Hughes Problems? November 19, / 61
50 Go to the code! require(mass) model1 <- lm(y~x) sresm1 <- studres(model1) cooksdm1 <- cooks.distance(model1) leveragesm1 <- hatvalues(model1) dfbetasm1 <- dfbetas(model1) rstudent(model1) plot(model1) D. Alex Hughes Problems? November 19, / 61
51 Cuttoffs? Generally, they re a poor idea, and should be handled on a case-by-case basis. Buuut... Look for h i > 2 h. Look for E i > 2 Look for D ij > 1or2 unless in large sample In a large sample, scale that by the root of number of observations: D ij > 2/ n. D. Alex Hughes Problems? November 19, / 61
52 Practical Diagnostics Computation with large n? Effectiveness and sample size Publishing More than one outlier? D. Alex Hughes Problems? November 19, / 61
53 What to do with outliers? Outliers might be random Learn more Make a subjective decision Full disclosure D. Alex Hughes Problems? November 19, / 61
54 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
55 Non-normal Errors The assumption of normal errors is nearly always arbitrary BUT! The CLT says that under hella broad conditions inference is good (unless we re in small samples). So, why do we care about non-normal variance? Let me count the ways? 1 The levels and of tests will be approximately correct in large samples; but, without normal errors they will not be maximally efficient. Indeed, for distributions with heavy tails, OLS will be pretty inefficient. 2 OLS is a conditional statement of means. This doesn t make sense in heavily skewed distributions. 3 A multimodal error distribution suggests that a discrete explanatory variable may be missing. D. Alex Hughes Problems? November 19, / 61
56 Graphical Diagnosis To diagnose non-normality, we typically use a quantile-comparison plot, or sometimes a Q-Q plot. t-distribution of the theoretical quantiles on the x-axis (or z-distribution). Studentized residuals on the y-axis Especially good at identifying problems in the tail-behavior of the distributions. Visible as deviations from the 45-degree line. Supplements Histogram of studentized residuals; or, Kernel density plot of studentized residuals. D. Alex Hughes Problems? November 19, / 61
57 Interpreting a Q-Q plot All but a few points fall on a line Left end below line Right end above line Left end above Right end below Curved pattern: slope increasing Curved pattern: slope decreasing Steps Outliers in the data Long tail at low side Long tail at high side Short tail at low side Short tail at high side Skew right Skew left Discrete data D. Alex Hughes Problems? November 19, / 61
58 1 Outliers Generally Residual Plots Assessing Leverage Hat-Values Studentized Residuals Measuring Influence DFBETEA(S) 2 Non-normality & Nonconstant Error Variance Non-normal Errors D. Alex Hughes Problems? November 19, / 61
59 Assumptions about Errors ɛ i N(0, σ 2 ) i How do things go wrong? 1 The mean isn t zero 2 The errors are not normally distributed 3 The variance of errors is different across observations 4 The covariance of errors across observations is not zero D. Alex Hughes Problems? November 19, / 61
60 What if the errors are not normally distributed? Normality: we either assume it or we rely on the central limit theorem and big datasets. Even without normality, regression is the best linear unbiased estimator (BLUE) of the model parameters β. This comes from the Gaus-Markov Theorem. However, the normality assumption provides t statistics for hypothesis testing - so inference is suspect without normality. If you don t feel comfortable being normal...you could specify an alterative distribution and come up with your own standard errors for hypothesis testing or you could use a nonparametric method to run the regresion, depending on your beliefs about the errors. Check for normality: formal test, or quantile plots of residuals. What s a quantile plot? D. Alex Hughes Problems? November 19, / 61
61 What if the mean of the error distribution isn t zero? This is the easy case. Why? Y = Xβ + ɛ ɛ N(θ, σ 2 ) D. Alex Hughes Problems? November 19, / 61
STAB22 section 2.4. Figure 2: Data set 2. Figure 1: Data set 1
STAB22 section 2.4 2.73 The four correlations are all 0.816, and all four regressions are ŷ = 3 + 0.5x. (b) can be answered by drawing fitted line plots in the four cases. See Figures 1, 2, 3 and 4. Figure
More informationChapter 10. Re-expressing Data: Get it Straight! Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 10 Re-expressing Data: Get it Straight! Copyright 2012, 2008, 2005 Pearson Education, Inc. Straight to the Point We cannot use a linear model unless the relationship between the two variables is
More informationPrices of digital cameras
Prices of digital cameras The August 2012 issue of Consumer Reports included a report on digital cameras. The magazine listed 60 cameras, all of which were recommended by them, divided into six categories
More informationIE 361 Module 7. Reading: Section 2.5 of Revised SQAME. Prof. Steve Vardeman and Prof. Max Morris. Iowa State University
IE 361 Module 7 Calibration Studies and Inference Based on Simple Linear Regression Reading: Section 2.5 of Revised SQAME Prof. Steve Vardeman and Prof. Max Morris Iowa State University Vardeman and Morris
More informationStatistical tests. Paired t-test
Statistical tests Gather data to assess some hypothesis (e.g., does this treatment have an effect on this outcome?) Form a test statistic for which large values indicate a departure from the hypothesis.
More informationPermutation inference for the General Linear Model
Permutation inference for the General Linear Model Anderson M. Winkler fmrib Analysis Group 3.Sep.25 Winkler Permutation for the glm / 63 in jalapeno: winkler/bin/palm Winkler Permutation for the glm 2
More informationDisplaying Distributions with Graphs
Displaying Distributions with Graphs Recall that the distribution of a variable indicates two things: (1) What value(s) a variable can take, and (2) how often it takes those values. Example 1: Weights
More informationBIOS 312: MODERN REGRESSION ANALYSIS
BIOS 312: MODERN REGRESSION ANALYSIS James C (Chris) Slaughter Department of Biostatistics Vanderbilt University School of Medicine james.c.slaughter@vanderbilt.edu biostat.mc.vanderbilt.edu/coursebios312
More informationSUPPLEMENT TO THE PAPER TESTING EQUALITY OF SPECTRAL DENSITIES USING RANDOMIZATION TECHNIQUES
SUPPLEMENT TO THE PAPER TESTING EQUALITY OF SPECTRAL DENSITIES USING RANDOMIZATION TECHNIQUES CARSTEN JENTSCH AND MARKUS PAULY Abstract. In this supplementary material we provide additional supporting
More informationChapter 4. Displaying and Summarizing Quantitative Data. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 4 Displaying and Summarizing Quantitative Data Copyright 2012, 2008, 2005 Pearson Education, Inc. Dealing With a Lot of Numbers Summarizing the data will help us when we look at large sets of quantitative
More informationPRICES OF THE LIBERTY STANDING QUARTER
This document deals with the prices paid by collectors for quarters in the Liberty standing set, issued between 1916 and 1930. Year / Mint / Type Mintage Value 1916 52,000 14,690 1917 Type 1 8,740,000
More informationCorrelation and Regression
Correlation and Regression Shepard and Feng (1972) presented participants with an unfolded cube and asked them to mentally refold the cube with the shaded square on the bottom to determine if the two arrows
More informationOn the GNSS integer ambiguity success rate
On the GNSS integer ambiguity success rate P.J.G. Teunissen Mathematical Geodesy and Positioning Faculty of Civil Engineering and Geosciences Introduction Global Navigation Satellite System (GNSS) ambiguity
More informationIE 361 Module 36. Process Capability Analysis Part 1 (Normal Plotting) Reading: Section 4.1 Statistical Methods for Quality Assurance
IE 361 Module 36 Process Capability Analysis Part 1 (Normal Plotting) Reading: Section 4.1 Statistical Methods for Quality Assurance ISU and Analytics Iowa LLC (ISU and Analytics Iowa LLC) IE 361 Module
More informationNEW ASSOCIATION IN BIO-S-POLYMER PROCESS
NEW ASSOCIATION IN BIO-S-POLYMER PROCESS Long Flory School of Business, Virginia Commonwealth University Snead Hall, 31 W. Main Street, Richmond, VA 23284 ABSTRACT Small firms generally do not use designed
More informationUnivariate Descriptive Statistics
Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin
More information6. Multivariate EDA. ACE 492 SA - Spatial Analysis Fall 2003
1 Objectives 6. Multivariate EDA ACE 492 SA - Spatial Analysis Fall 2003 c 2003 by Luc Anselin, All Rights Reserved This lab covers some basic approaches to carry out EDA with a focus on discovering multivariate
More informationGuess the Mean. Joshua Hill. January 2, 2010
Guess the Mean Joshua Hill January, 010 Challenge: Provide a rational number in the interval [1, 100]. The winner will be the person whose guess is closest to /3rds of the mean of all the guesses. Answer:
More informationChapter 25. One-Way Analysis of Variance: Comparing Several Means. BPS - 5th Ed. Chapter 24 1
Chapter 25 One-Way Analysis of Variance: Comparing Several Means BPS - 5th Ed. Chapter 24 1 Comparing Means Chapter 18: compared the means of two populations or the mean responses to two treatments in
More informationModule 5. Simple Linear Regression and Calibration. Prof. Stephen B. Vardeman Statistics and IMSE Iowa State University.
Module 5 Simple Linear Regression and Calibration Prof. Stephen B. Vardeman Statistics and IMSE Iowa State University March 4, 2008 Steve Vardeman (ISU) Module 5 March 4, 2008 1 / 14 Calibration of a Measurement
More informationThe Relationship Between Annual GDP Growth and Income Inequality: Developed and Undeveloped Countries
The Relationship Between Annual GDP Growth and Income Inequality: Developed and Undeveloped Countries Zeyao Luan, Ziyi Zhou Georgia Institute of Technology ECON 3161 Dr. Shatakshee Dhongde April 2017 1
More information2011, Stat-Ease, Inc.
Practical Aspects of Algorithmic Design of Physical Experiments from an Engineer s perspective Pat Whitcomb Stat-Ease Ease, Inc. 612.746.2036 fax 612.746.2056 pat@statease.com www.statease.com Statistics
More informationComparing Means. Chapter 24. Case Study Gas Mileage for Classes of Vehicles. Case Study Gas Mileage for Classes of Vehicles Data collection
Chapter 24 One-Way Analysis of Variance: Comparing Several Means BPS - 5th Ed. Chapter 24 1 Comparing Means Chapter 18: compared the means of two populations or the mean responses to two treatments in
More informationPossible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.
Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central. Note: I construct these as a service for both students and teachers to start
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. The distribution of a data set: which values occur, and how often they occur 3.
More informationOutlier-Robust Estimation of GPS Satellite Clock Offsets
Outlier-Robust Estimation of GPS Satellite Clock Offsets Simo Martikainen, Robert Piche and Simo Ali-Löytty Tampere University of Technology. Tampere, Finland Email: simo.martikainen@tut.fi Abstract A
More informationFirst-level fmri modeling. UCLA Advanced NeuroImaging Summer School, 2010
First-level fmri modeling UCLA Advanced NeuroImaging Summer School, 2010 Task on Goal in fmri analysis Find voxels with BOLD time series that look like this Delay of BOLD response Voxel with signal Voxel
More informationStat 20: Intro to Probability and Statistics
Stat 20: Intro to Probability and Statistics Lecture 17: Using the Normal Curve with Box Models Tessa L. Childers-Day UC Berkeley 23 July 2014 By the end of this lecture... You will be able to: Draw and
More informationIE 361 Module 17. Process Capability Analysis: Part 1. Reading: Sections 5.1, 5.2 Statistical Quality Assurance Methods for Engineers
IE 361 Module 17 Process Capability Analysis: Part 1 Reading: Sections 5.1, 5.2 Statistical Quality Assurance Methods for Engineers Prof. Steve Vardeman and Prof. Max Morris Iowa State University Vardeman
More informationDepartment of Statistics and Operations Research Undergraduate Programmes
Department of Statistics and Operations Research Undergraduate Programmes OPERATIONS RESEARCH YEAR LEVEL 2 INTRODUCTION TO LINEAR PROGRAMMING SSOA021 Linear Programming Model: Formulation of an LP model;
More informationChapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1
Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic
More informationPhysics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming)
Physics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming) Purpose: The purpose of this lab is to introduce students to some of the properties of thin lenses and mirrors.
More information28th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies
8th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies A LOWER BOUND ON THE STANDARD ERROR OF AN AMPLITUDE-BASED REGIONAL DISCRIMINANT D. N. Anderson 1, W. R. Walter, D. K.
More informationAlternation in the repeated Battle of the Sexes
Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated
More informationStatistics, Probability and Noise
Statistics, Probability and Noise Claudia Feregrino-Uribe & Alicia Morales-Reyes Original material: Rene Cumplido Autumn 2015, CCC-INAOE Contents Signal and graph terminology Mean and standard deviation
More informationGame Theory and Randomized Algorithms
Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international
More informationChapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1
Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Example: population mean Statistic known value calculated
More informationChapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1
Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic
More informationLaboratory 1: Uncertainty Analysis
University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can
More informationReminders. Quiz today. Please bring a calculator to the quiz
Reminders Quiz today Please bring a calculator to the quiz 1 Regression Review (sort of Ch. 15) Warning: Outside of known textbook space Aaron Zimmerman STAT 220 - Summer 2014 Department of Statistics
More informationN. Papadakis, N. Reynolds, C.Ramirez-Jimenez, M.Pharaoh
Relation comparison methodologies of the primary and secondary frequency components of acoustic events obtained from thermoplastic composite laminates under tensile stress N. Papadakis, N. Reynolds, C.Ramirez-Jimenez,
More informationDiscussion of The power of monitoring: how to make the most of a contaminated multivariate sample
Stat Methods Appl https://doi.org/.7/s-7-- COMMENT Discussion of The power of monitoring: how to make the most of a contaminated multivariate sample Domenico Perrotta Francesca Torti Accepted: December
More informationGeostatistical estimation applied to highly skewed data. Dr. Isobel Clark, Geostokos Limited, Alloa, Scotland
"Geostatistical estimation applied to highly skewed data", Joint Statistical Meetings, Dallas, Texas, August 1999 Geostatistical estimation applied to highly skewed data Dr. Isobel Clark, Geostokos Limited,
More informationLecture 3 - Regression
Lecture 3 - Regression Instructor: Prof Ganesh Ramakrishnan July 25, 2016 1 / 30 The Simplest ML Problem: Least Square Regression Curve Fitting: Motivation Error measurement Minimizing Error Method of
More informationMath 32, October 22 & 27: Maxima & Minima
Math 32, October 22 & 27: Maxima & Minima Section 1: Critical Points Just as in the single variable case, for multivariate functions we are often interested in determining extreme values of the function.
More informationChapter 4 Displaying and Describing Quantitative Data
Chapter 4 Displaying and Describing Quantitative Data Overview Key Concepts Be able to identify an appropriate display for any quantitative variable. Be able to guess the shape of the distribution of a
More informationHow Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
Prev Sci (2007) 8:206 213 DOI 10.1007/s11121-007-0070-9 How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory John W. Graham & Allison E. Olchowski & Tamika
More information(Notice that the mean doesn t have to be a whole number and isn t normally part of the original set of data.)
One-Variable Statistics Descriptive statistics that analyze one characteristic of one sample Where s the middle? How spread out is it? Where do different pieces of data compare? To find 1-variable statistics
More informationCS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University
CS534 Introduction to Computer Vision Linear Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters
More informationBusiness Statistics. Lecture 2: Descriptive Statistical Graphs and Plots
Business Statistics Lecture 2: Descriptive Statistical Graphs and Plots 1 Goals for this Lecture Graphical descriptive statistics Histograms (and bar charts) Boxplots Scatterplots Time series plots Mosaic
More informationPixel Response Effects on CCD Camera Gain Calibration
1 of 7 1/21/2014 3:03 PM HO M E P R O D UC T S B R IE F S T E C H NO T E S S UP P O RT P UR C HA S E NE W S W E B T O O L S INF O C O NTA C T Pixel Response Effects on CCD Camera Gain Calibration Copyright
More informationPage 21 GRAPHING OBJECTIVES:
Page 21 GRAPHING OBJECTIVES: 1. To learn how to present data in graphical form manually (paper-and-pencil) and using computer software. 2. To learn how to interpret graphical data by, a. determining the
More informationUsing Administrative Records for Imputation in the Decennial Census 1
Using Administrative Records for Imputation in the Decennial Census 1 James Farber, Deborah Wagner, and Dean Resnick U.S. Census Bureau James Farber, U.S. Census Bureau, Washington, DC 20233-9200 Keywords:
More informationproc plot; plot Mean_Illness*Dose=Dose; run;
options pageno=min nodate formdlim='-'; Title 'Illness Related to Dose of Therapeutic Drug'; run; data Lotus; input Dose N; Do I=1 to N; Input Illness @@; output; end; cards; 0 20 101 101 101 104 104 105
More informationScatter Plots, Correlation, and Lines of Best Fit
Lesson 7.3 Objectives Interpret a scatter plot. Identify the correlation of data from a scatter plot. Find the line of best fit for a set of data. Scatter Plots, Correlation, and Lines of Best Fit A video
More informationExam 2 Review. Review. Cathy Poliak, Ph.D. (Department of Mathematics ReviewUniversity of Houston ) Exam 2 Review
Exam 2 Review Review Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Exam 2 Review Exam 2 Review 1 / 20 Outline 1 Material Covered 2 What is on the exam 3 Examples
More informationIES, Faculty of Social Sciences, Charles University in Prague
IMPACT OF INTELLECTUAL PROPERTY RIGHTS AND GOVERNMENTAL POLICY ON INCOME INEQUALITY. Ing. Oksana Melikhova, Ph.D. 1, 1 IES, Faculty of Social Sciences, Charles University in Prague Faculty of Mathematics
More informationThree-Prisoners Puzzle. The rest of the course. The Monty Hall Puzzle. The Second-Ace Puzzle
The rest of the course Three-Prisoners Puzzle Subtleties involved with maximizing expected utility: Finding the right state space: The wrong state space leads to intuitively incorrect answers when conditioning
More informationChapter 4. September 08, appstats 4B.notebook. Displaying Quantitative Data. Aug 4 9:13 AM. Aug 4 9:13 AM. Aug 27 10:16 PM.
Objectives: Students will: Chapter 4 1. Be able to identify an appropriate display for any quantitative variable: stem leaf plot, time plot, histogram and dotplot given a set of quantitative data. 2. Be
More information2: Turning the Tables
2: Turning the Tables Gareth McCaughan Revision 1.8, May 14, 2001 Credits c Gareth McCaughan. All rights reserved. This document is part of the LiveWires Python Course. You may modify and/or distribute
More informationAP STATISTICS 2015 SCORING GUIDELINES
AP STATISTICS 2015 SCORING GUIDELINES Question 6 Intent of Question The primary goals of this question were to assess a student s ability to (1) describe how sample data would differ using two different
More informationStatistics 101: Section L Laboratory 10
Statistics 101: Section L Laboratory 10 This lab looks at the sampling distribution of the sample proportion pˆ and probabilities associated with sampling from a population with a categorical variable.
More informationMiguel I. Aguirre-Urreta
RESEARCH NOTE REVISITING BIAS DUE TO CONSTRUCT MISSPECIFICATION: DIFFERENT RESULTS FROM CONSIDERING COEFFICIENTS IN STANDARDIZED FORM Miguel I. Aguirre-Urreta School of Accountancy and MIS, College of
More informationSolutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13
Introduction to Econometrics (3 rd Updated Edition by James H. Stock and Mark W. Watson Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13 (This version July 0, 014 Stock/Watson - Introduction
More informationDevelopment of an improved flood frequency curve applying Bulletin 17B guidelines
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 Development of an improved flood frequency curve applying Bulletin 17B
More informationMath 58. Rumbos Fall Solutions to Exam Give thorough answers to the following questions:
Math 58. Rumbos Fall 2008 1 Solutions to Exam 2 1. Give thorough answers to the following questions: (a) Define a Bernoulli trial. Answer: A Bernoulli trial is a random experiment with two possible, mutually
More information2.1 Partial Derivatives
.1 Partial Derivatives.1.1 Functions of several variables Up until now, we have only met functions of single variables. From now on we will meet functions such as z = f(x, y) and w = f(x, y, z), which
More informationTO PLOT OR NOT TO PLOT?
Graphic Examples This document provides examples of a number of graphs that might be used in understanding or presenting data. Comments with each example are intended to help you understand why the data
More informationChapter 2. Describing Distributions with Numbers. BPS - 5th Ed. Chapter 2 1
Chapter 2 Describing Distributions with Numbers BPS - 5th Ed. Chapter 2 1 Numerical Summaries Center of the data mean median Variation range quartiles (interquartile range) variance standard deviation
More informationThe Statistical Cracks in the Foundation of the Popular Gauge R&R Approach
The Statistical Cracks in the Foundation of the Popular Gauge R&R Approach 10 parts, 3 repeats and 3 operators to calculate the measurement error as a % of the tolerance Repeatability: size matters The
More informationStatistical Process Control and Computer Integrated Manufacturing. The Equipment Controller
Statistical Process Control and Computer Integrated Manufacturing Run to Run Control, Real-Time SPC, Computer Integrated Manufacturing. 1 The Equipment Controller Today, the operation of individual pieces
More informationEfficiency and detectability of random reactive jamming in wireless networks
Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering
More informationSampling distributions and the Central Limit Theorem
Sampling distributions and the Central Limit Theorem Johan A. Elkink University College Dublin 14 October 2013 Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 1 / 29 Outline 1 Sampling 2 Statistical
More informationA COMPARATIVE ANALYSIS OF ALTERNATIVE ECONOMETRIC PACKAGES FOR THE UNBALANCED TWO-WAY ERROR COMPONENT MODEL. by Giuseppe Bruno 1
A COMPARATIVE ANALYSIS OF ALTERNATIVE ECONOMETRIC PACKAGES FOR THE UNBALANCED TWO-WAY ERROR COMPONENT MODEL by Giuseppe Bruno 1 Notwithstanding it was originally proposed to estimate Error Component Models
More informationSection 3 Correlation and Regression - Worksheet
The data are from the paper: Exploring Relationships in Body Dimensions Grete Heinz and Louis J. Peterson San José State University Roger W. Johnson and Carter J. Kerk South Dakota School of Mines and
More informationTable 1. List of NFL divisions that have won the Superbowl over the past 52 years.
MA 2113 Homework #1 Table 1. List of NFL divisions that have won the Superbowl over the past 52 years. NFC North AFC West NFC East NFC North AFC South NFC North NFC East NFC East AFC West NFC East AFC
More informationSECTION 7: FREQUENCY DOMAIN ANALYSIS. MAE 3401 Modeling and Simulation
SECTION 7: FREQUENCY DOMAIN ANALYSIS MAE 3401 Modeling and Simulation 2 Response to Sinusoidal Inputs Frequency Domain Analysis Introduction 3 We ve looked at system impulse and step responses Also interested
More informationAP Statistics Composition Book Review Chapters 1 2
AP Statistics Composition Book Review Chapters 1 2 Terms/vocabulary: Explain each term with in the STATISTICAL context. Bar Graph Bimodal Categorical Variable Density Curve Deviation Distribution Dotplot
More information1. Section 1 Exercises (all) Appendix A.1 of Vardeman and Jobe (pages ).
Stat 40B Homework/Fall 05 Please see the HW policy on the course syllabus. Every student must write up his or her own solutions using his or her own words, symbols, calculations, etc. Copying of the work
More informationContents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements
Contents List of Figures List of Tables Preface Notation Structure of the Book How to Use this Book Online Resources Acknowledgements Notational Conventions Notational Conventions for Probabilities xiii
More informationPhysics 3 Lab 5 Normal Modes and Resonance
Physics 3 Lab 5 Normal Modes and Resonance 1 Physics 3 Lab 5 Normal Modes and Resonance INTRODUCTION Earlier in the semester you did an experiment with the simplest possible vibrating object, the simple
More informationLane Detection in Automotive
Lane Detection in Automotive Contents Introduction... 2 Image Processing... 2 Reading an image... 3 RGB to Gray... 3 Mean and Gaussian filtering... 5 Defining our Region of Interest... 6 BirdsEyeView Transformation...
More informationClassification of Road Images for Lane Detection
Classification of Road Images for Lane Detection Mingyu Kim minkyu89@stanford.edu Insun Jang insunj@stanford.edu Eunmo Yang eyang89@stanford.edu 1. Introduction In the research on autonomous car, it is
More informationConstruction of SARIMAXmodels
SYSTEMS ANALYSIS LABORATORY Construction of SARIMAXmodels using MATLAB Mat-2.4108 Independent research projects in applied mathematics Antti Savelainen, 63220J 9/25/2009 Contents 1 Introduction...3 2 Existing
More informationSection 1: Data (Major Concept Review)
Section 1: Data (Major Concept Review) Individuals = the objects described by a set of data variable = characteristic of an individual weight height age IQ hair color eye color major social security #
More informationProjecting Fantasy Football Points
Projecting Fantasy Football Points Brian Becker Gary Ramirez Carlos Zambrano MATH 503 A/B October 12, 2015 1 1 Abstract Fantasy Football has been increasing in popularity throughout the years and becoming
More informationExploring Data Patterns. Run Charts, Frequency Tables, Histograms, Box Plots
Exploring Data Patterns Run Charts, Frequency Tables, Histograms, Box Plots 1 Topics I. Exploring Data Patterns - Tools A. Run Chart B. Dot Plot C. Frequency Table and Histogram D. Box Plot II. III. IV.
More informationIonospheric Estimation using Extended Kriging for a low latitude SBAS
Ionospheric Estimation using Extended Kriging for a low latitude SBAS Juan Blanch, odd Walter, Per Enge, Stanford University ABSRAC he ionosphere causes the most difficult error to mitigate in Satellite
More informationA Boxcar Kernel Filter for Assimilation of Discrete Structures (and Other Stuff)
A Boxcar Kernel Filter for Assimilation of Discrete Structures (and Other Stuff) Jeffrey Anderson NCAR Data Assimilation Research Section (DAReS) Anderson: NWP/WAF 27: Park City 1 6/18/7 Background: 1.
More informationImage analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror
Image analysis CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror 1 Outline Images in molecular and cellular biology Reducing image noise Mean and Gaussian filters Frequency domain interpretation
More informationMATRIX TECHNICAL NOTES MTN-109
200 WOOD AVENUE, MIDDLESEX, NJ 08846 PHONE (732) 469-9510 E-mail sales@matrixtest.com MATRIX TECHNICAL NOTES MTN-109 THE RELATIONSHIP OF INTERCEPT POINTS COMPOSITE DISTORTIONS AND NOISE POWER RATIOS Amplifiers,
More informationImage Denoising using Dark Frames
Image Denoising using Dark Frames Rahul Garg December 18, 2009 1 Introduction In digital images there are multiple sources of noise. Typically, the noise increases on increasing ths ISO but some noise
More informationCorrelation of Model Simulations and Measurements
Correlation of Model Simulations and Measurements Roy Leventhal Leventhal Design & Communications Presented June 5, 2007 IBIS Summit Meeting, San Diego, California Correlation of Model Simulations and
More informationPhysics 2310 Lab #6: Multiple Thin Lenses Dr. Michael Pierce (Univ. of Wyoming)
Physics 2310 Lab #6: Multiple Thin Lenses Dr. Michael Pierce (Univ. of Wyoming) Purpose: The purpose of this lab is to investigate the properties of multiple thin lenses. The primary goals are to understand
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationPASS Sample Size Software
Chapter 945 Introduction This section describes the options that are available for the appearance of a histogram. A set of all these options can be stored as a template file which can be retrieved later.
More informationComparing the State Estimates of a Kalman Filter to a Perfect IMM Against a Maneuvering Target
14th International Conference on Information Fusion Chicago, Illinois, USA, July -8, 11 Comparing the State Estimates of a Kalman Filter to a Perfect IMM Against a Maneuvering Target Mark Silbert and Core
More informationRegression: Tree Rings and Measuring Things
Objectives: Measure biological data Use biological measurements to calculate means, slope and intercept Determine best linear fit of data Interpret fit using correlation Materials: Ruler (in millimeters)
More informationCommunication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi
Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 23 The Phase Locked Loop (Contd.) We will now continue our discussion
More informationCH 54 SPECIAL LINES. Ch 54 Special Lines. Introduction
479 CH 54 SPECIAL LINES Introduction Y ou may have noticed that all the lines we ve seen so far in this course have had slopes that were either positive or negative. You may also have observed that every
More information