Comparing Means. Chapter 24. Case Study Gas Mileage for Classes of Vehicles. Case Study Gas Mileage for Classes of Vehicles Data collection

Similar documents
Chapter 25. One-Way Analysis of Variance: Comparing Several Means. BPS - 5th Ed. Chapter 24 1

8.6 Jonckheere-Terpstra Test for Ordered Alternatives. 6.5 Jonckheere-Terpstra Test for Ordered Alternatives

Jednoczynnikowa analiza wariancji (ANOVA)

One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8,... The assumed standard deviation = 110

Lectures 15/16 ANOVA. ANOVA Tests. Analysis of Variance. >ANOVA stands for ANalysis Of VAriance >ANOVA allows us to:

Math 58. Rumbos Fall Solutions to Exam Give thorough answers to the following questions:

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. B) Blood type Frequency

This page intentionally left blank

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Chapter 2. Describing Distributions with Numbers. BPS - 5th Ed. Chapter 2 1

Statistical tests. Paired t-test

Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

A1 = Chess A2 = Non-Chess B1 = Male B2 = Female

The Effect Of Different Degrees Of Freedom Of The Chi-square Distribution On The Statistical Power Of The t, Permutation t, And Wilcoxon Tests

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

proc plot; plot Mean_Illness*Dose=Dose; run;

Please Turn Over Page 1 of 7

1. Section 1 Exercises (all) Appendix A.1 of Vardeman and Jobe (pages ).

Name: Exam 01 (Midterm Part 2 take home, open everything)

IE 361 Module 4. Metrology Applications of Some Intermediate Statistical Methods for Separating Components of Variation

ANALYSIS OF VARIANCE PROCEDURE FOR ANALYZING UNBALANCED DAIRY SCIENCE DATA USING SAS

Assessing Measurement System Variation

Lesson Sampling Distribution of Differences of Two Proportions

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except on problems 1 & 2. Work neatly.

Mason Chen (Black Belt) Morrill Learning Center, San Jose, CA

Two Factor Full Factorial Design with Replications

11-1 Practice. Designing a Study

Laboratory 1: Uncertainty Analysis

MAT Mathematics in Today's World

Mean for population data: x = the sum of all values. N = the population size n = the sample size, µ = the population mean. x = the sample mean

Chapter 6 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc.

(Notice that the mean doesn t have to be a whole number and isn t normally part of the original set of data.)

Correlation and Regression

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Symmetric (Mean and Standard Deviation)

Repeated Measures Twoway Analysis of Variance

Chapter 1: Stats Starts Here Chapter 2: Data

Obs location y

Assignment 2 1) DAY TREATMENT TOTALS

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Two-Factor unbalanced experiment with factors of Power and Humidity Example compares LSmeans and means statement for unbalanced data

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

CCMR Educational Programs

Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection

A New Standard for Radiographic Acceptance Criteria for Steel Castings: Gage R&R Study

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Development of an improved flood frequency curve applying Bulletin 17B guidelines

COS Lecture 7 Autonomous Robot Navigation

Descriptive Statistics II. Graphical summary of the distribution of a numerical variable. Boxplot

Assessing Measurement System Variation

II/IV B.Tech (Supplementary) DEGREE EXAMINATION

Proportions. Chapter 19. Inference about a Proportion Simple Conditions. Inference about a Proportion Sampling Distribution

How can it be right when it feels so wrong? Outliers, diagnostics, non-constant variance

Statistical Hypothesis Testing

AP Statistics Composition Book Review Chapters 1 2

Player Speed vs. Wild Pokémon Encounter Frequency in Pokémon SoulSilver Joshua and AP Statistics, pd. 3B

I STATISTICAL TOOLS IN SIX SIGMA DMAIC PROCESS WITH MINITAB APPLICATIONS

Measurement over a Short Distance. Tom Mathew

EE 791 EEG-5 Measures of EEG Dynamic Properties

FINDING VALUES FROM KNOWN AREAS 1. Don t confuse and. Remember, are. along the scale, but are

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

Table 1. List of NFL divisions that have won the Superbowl over the past 52 years.

Department of Statistics and Operations Research Undergraduate Programmes

Univariate Descriptive Statistics

Prices of digital cameras

Hypothesis Tests. w/ proportions. AP Statistics - Chapter 20

I STATISTICAL TOOLS IN SIX SIGMA DMAIC PROCESS WITH MINITAB APPLICATIONS

Statistics is the study of the collection, organization, analysis, interpretation and presentation of data.

OFF THE WALL. The Effects of Artist Eccentricity on the Evaluation of Their Work ROUGH DRAFT

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.2- #

COMPARATIVE ANALYSIS OF DIAGNOSTIC APPLICATIONS OF AUTOSCAN TOOLS ON VEHICLE SYSTEMS

Measurement Systems Analysis

Density Curves. Chapter 3. Density Curves. Density Curves. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition.

2. The value of the middle term in a ranked data set is called: A) the mean B) the standard deviation C) the mode D) the median

Most typical tests can also be done as permutation tests. For example: Two sample tests (e.g., t-test, MWU test)

The Relationship Between Annual GDP Growth and Income Inequality: Developed and Undeveloped Countries

Optimization of Process Parameters of Plasma Arc Cutting Using Taguchi s Robust Design Methodology

Chapter 10. Re-expressing Data: Get it Straight! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Image Encryption Based on the Modified Triple- DES Cryptosystem

Chapter 3. The Normal Distributions. BPS - 5th Ed. Chapter 3 1

Section 1.5 Graphs and Describing Distributions

Bandit Algorithms Continued: UCB1

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Ordinary Level Time: 2 hours, 30 minutes. 300 marks L.19 NAME SCHOOL TEACHER

c. Find the probability that a randomly selected adult has an IQ between 90 and 110 (referred to as the normal range).

MEASUREMENT SYSTEMS ANALYSIS AND A STUDY OF ANOVA METHOD

Mapping road traffic conditions using high resolution satellite images

Analyzing Data Properties using Statistical Sampling Techniques

Class 10: Sampling and Surveys (Text: Section 3.2)

Introduction to Chi Square

Permutation inference for the General Linear Model

Plot of Items*Condition. Symbol is value of Age. 20 ˆ 18 ˆ Y 16 ˆ. Items Y 14 ˆ 12 ˆ O 10 ˆ 8 ˆ Y O O Y 6 ˆ

Chapter 19. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Introduction to Statistical Process Control. Managing Variation over Time

Chapter -4 RESULTS AND DISCUSSIONS

CHAPTER 6 PROBABILITY. Chapter 5 introduced the concepts of z scores and the normal curve. This chapter takes

MITOCW mit_jpal_ses06_en_300k_512kb-mp4

On the Peculiar Distribution of the U.S. Stock Indeces Digits

ESTIMATION OF GINI-INDEX FROM CONTINUOUS DISTRIBUTION BASED ON RANKED SET SAMPLING

Transcription:

Chapter 24 One-Way Analysis of Variance: Comparing Several Means BPS - 5th Ed. Chapter 24 1 Comparing Means Chapter 18: compared the means of two populations or the mean responses to two treatments in an experiment two-sample t tests This chapter: compare any number of means Analysis of Variance Remember: we are comparing means even though the procedure is Analysis of Variance BPS - 5th Ed. Chapter 24 2 Data from the Environmental Protection Agency s Model Year 2003 Fuel Economy Guide, www.fueleconomy.gov. Do SUVs and trucks have lower gas mileage than midsize cars? Data collection Response variable: gas mileage (mpg) Groups: vehicle classification 31 midsize cars 31 SUVs 14 standard-size pickup trucks BPS - 5th Ed. Chapter 24 3 BPS - 5th Ed. Chapter 24 4 Data Data Means ( s): Midsize: 27.903 SUV: 22.677 Pickup: 21.286 BPS - 5th Ed. Chapter 24 5 BPS - 5th Ed. Chapter 24 6 Chapter 22 1

Means ( s): Midsize: 27.903 SUV: 22.677 Pickup: 21.286 Data analysis Mean gas mileage for SUVs and pickups appears less than for midsize cars Are these differences statistically significant? BPS - 5th Ed. Chapter 24 7 Means ( s): Midsize: 27.903 SUV: 22.677 Pickup: 21.286 Data analysis Null hypothesis: The true means (for gas mileage) are the same for all groups (the three vehicle classifications) For example, could look at separate t tests to compare each pair of means to see if they are different: 27.903 vs. 22.677, 27.903 vs. 21.286, & 22.677 vs. 21.286 H 0 : µ 1 = µ 2 H 0 : µ 1 = µ 3 H 0 : µ 2 = µ 3 Problem of multiple comparisons! BPS - 5th Ed. Chapter 24 8 Multiple Comparisons Problem of how to do many comparisons at the same time with some overall measure of confidence in all the conclusions Two steps: overall test to test for any differences follow-up analysis to decide which groups differ and how large the differences are Follow-up analyses can be quite complex; we will look at only the overall test for a difference in several means, and examine the data to make follow-up conclusions BPS - 5th Ed. Chapter 24 9 Analysis of Variance F Test H 0 : µ 1 = µ 2 = µ 3 H a : not all of the means are the same To test H 0, compare how much variation exists among the sample means (how much the s differ) with how much variation exists within the samples from each group is called the analysis of variance F test test statistic is an F statistic use F distribution (F table) to find P-value analysis of variance is abbreviated ANOVA BPS - 5th Ed. Chapter 24 10 Follow-up analysis P-value<.05 significant differences Data analysis F = 31.61 P-value = 0.000 (rounded) (is <0.001) there is significant evidence that the three types of vehicle do not all have the same gas mileage from the confidence intervals (and looking at the original data), we see that SUVs and pickups have similar fuel economy and both are distinctly poorer than midsize cars BPS - 5th Ed. Chapter 24 11 BPS - 5th Ed. Chapter 24 12 Chapter 22 2

ANOVA Idea ANOVA tests whether several populations have the same mean by comparing how much variation exists among the sample means (how much the s differ) with how much variation exists within the samples from each group the decision is not based only on how far apart the sample means are, but instead on how far apart they are relative to the variability of the individual observations within each group BPS - 5th Ed. Chapter 24 13 ANOVA Idea Sample means for the three samples are the same for each set (a) and (b) of boxplots (shown by the center of the boxplots) variation among sample means for (a) is identical to (b) Less spread in the boxplots for (b) variation among the individuals within the three samples is much less for (b) BPS - 5th Ed. Chapter 24 14 ANOVA Idea CONCLUSION: the samples in (b) contain a larger amount of variation among the sample means relative to the amount of variation within the samples, so ANOVA will find more significant differences among the means in (b) assuming equal sample sizes here for (a) and (b) larger samples will find more significant differences BPS - 5th Ed. Chapter 24 15 Variation among sample means (how much the s differ from each other) BPS - 5th Ed. Chapter 24 16 Variation within the individual samples ANOVA F Statistic To determine statistical significance, we need a test statistic that we can calculate ANOVA F Statistic: must be zero or positive only zero when all sample means are identical gets larger as means move further apart large values of F are evidence against H 0 : equal means the F test is upper one-sided (like the chi-square test) BPS - 5th Ed. Chapter 24 17 BPS - 5th Ed. Chapter 24 18 Chapter 22 3

ANOVA F Test Calculate value of F statistic by hand (cumbersome) using technology (computer software, etc.) Find P-value in order to reject or fail to reject H 0 F table (not provided in book. Will provide on website) from computer output If significant relationship exists (small P-value): follow-up analysis observe differences in sample means in original data formal multiple comparison procedures (not covered here) ANOVA F Test F test for comparing I populations, with an SRS of size n i from the i th population (thus giving N = n 1 +n 2 + +n I total observations) uses critical values from an F distribution with the following numerator and denominator degrees of freedom: numerator df = I - 1 denominator df = N - I P-value is the area to the right of F under the density curve of the F distribution BPS - 5th Ed. Chapter 24 19 BPS - 5th Ed. Chapter 24 20 F = 31.61 I = 3 classes of vehicle n 1 = 31 midsize, n 2 = 31 SUVs, n 3 = 14 trucks N = 31 + 31 + 14 = 76 df num = (I-1) = (3-1) = 2 df den = (N-I) = (76-3) = 73 P-value from technology output is 0.000. This probability is not 0, but is very close to 0 and is smaller than 0.001, the smallest value the technology can record. ** P-value <.05, so we conclude significant differences ** BPS - 5th Ed. Chapter 24 21 BPS - 5th Ed. Chapter 24 22 ANOVA Model, Assumptions Conditions required for using ANOVA F test to compare population means 1) have I independent SRSs, one from each population. 2) the i th population has a Normal distribution with unknown mean µ i (means may be different). 3) all of the populations have the same standard deviation σ, whose value is unknown. Robustness ANOVA F test is not very sensitive to lack of Normality (is robust) what matters is Normality of the sample means ANOVA becomes safer as the sample sizes get larger, due to the Central Limit Theorem if there are no outliers and the distributions are roughly symmetric, can safely use ANOVA for sample sizes as small as 4 or 5 BPS - 5th Ed. Chapter 24 23 BPS - 5th Ed. Chapter 24 24 Chapter 22 4

Robustness ANOVA F test is not too sensitive to violations of the assumption of equal standard deviations especially when all samples have the same or similar sizes and no sample is very small statistical tests for equal standard deviations are very sensitive to lack of Normality (not practical) check that sample standard deviations are similar to each other (next slide) Checking Standard Deviations The results of ANOVA F tests are approximately correct when the largest sample standard deviation (s) is no more than twice as large as the smallest sample standard deviation BPS - 5th Ed. Chapter 24 25 BPS - 5th Ed. Chapter 24 26 s 1 = 2.561 s 2 = 3.673 s 3 = 2.758 safe to use ANOVA F test ANOVA F statistic: the measures of variation in the numerator and denominator are mean squares general form of a sample variance ordinary s 2 is an average (or mean) of the squared deviations of observations from their mean BPS - 5th Ed. Chapter 24 27 BPS - 5th Ed. Chapter 24 28 Numerator: Mean Square for Groups (MSG) an average of the I squared deviations of the means of the samples from the overall mean Denominator: Mean Square for Error (MSE) an average of the individual sample variances (s i2 ) within each of the I groups n i is the number of observations in the i th group MSE is also called the pooled sample variance, written as s p 2 (s p is the pooled standard deviation) s p 2 estimates the common variance σ 2 BPS - 5th Ed. Chapter 24 29 BPS - 5th Ed. Chapter 24 30 Chapter 22 5

the numerators of the mean squares are called the sums of squares (SSG and SSE) the denominators of the mean squares are the two degrees of freedom for the F test, (I-1) and (N-I) usually results of ANOVA are presented in an ANOVA table, which gives the source of variation, df, SS, MS, and F statistic ANOVA F statistic: For detailed calculations, see Examples 24.7 and 24.8 on pages 652-654 of the textbook. BPS - 5th Ed. Chapter 24 31 BPS - 5th Ed. Chapter 24 32 Summary ANOVA Confidence Intervals Confidence interval for the mean µ i of any group: t* is the critical value from the t distribution with N-I degrees of freedom s p (pooled standard deviation) is used to estimate σ because it is better than any individual s i BPS - 5th Ed. Chapter 24 33 BPS - 5th Ed. Chapter 24 34 BPS - 5th Ed. Chapter 24 35 Chapter 22 6