Sampling distributions and the Central Limit Theorem

Similar documents
Objectives. Module 6: Sampling

Section 6.4. Sampling Distributions and Estimators

Unit 8: Sample Surveys

Chapter 12: Sampling

Polls, such as this last example are known as sample surveys.

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

Chapter 3 Monday, May 17th

a) Getting 10 +/- 2 head in 20 tosses is the same probability as getting +/- heads in 320 tosses

AP Statistics S A M P L I N G C H A P 11

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

x y

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

Exam 2 Review. Review. Cathy Poliak, Ph.D. (Department of Mathematics ReviewUniversity of Houston ) Exam 2 Review

7.1 Sampling Distribution of X

CHAPTER 6 PROBABILITY. Chapter 5 introduced the concepts of z scores and the normal curve. This chapter takes

Class 10: Sampling and Surveys (Text: Section 3.2)

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

This page intentionally left blank

Math 58. Rumbos Fall Solutions to Exam Give thorough answers to the following questions:

Chapter 12 Summary Sample Surveys

There is no class tomorrow! Have a good weekend! Scores will be posted in Compass early Friday morning J

Probability - Introduction Chapter 3, part 1

Math 141 Exam 3 Review with Key. 1. P(E)=0.5, P(F)=0.6 P(E F)=0.9 Find ) b) P( E F ) c) P( E F )

UNIT 8 SAMPLE SURVEYS

Gathering information about an entire population often costs too much or is virtually impossible.

Sampling. I Oct 2008

SAMPLING BASICS. Frances Chumney, PhD

Full file at

Elements of the Sampling Problem!

Making Use of Benford s Law for the Randomized Response Technique. Andreas Diekmann ETH-Zurich

Stats: Modeling the World. Chapter 11: Sample Surveys

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

IE 361 Module 4. Metrology Applications of Some Intermediate Statistical Methods for Separating Components of Variation

Sample Surveys. Chapter 11

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

MATH-1110 FINAL EXAM FALL 2010

**Gettysburg Address Spotlight Task

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Section 2: Preparing the Sample Overview

SAMPLING. A collection of items from a population which are taken to be representative of the population.

Moore, IPS 6e Chapter 05

Week 3 Classical Probability, Part I

Statistics Laboratory 7

Sampling Designs and Sampling Procedures

SAMPLING DISTRIBUTION MODELS TODAY YOU WILL NEED: PENCIL SCRATCH PAPER A PARTNER (YOUR CHOICE) ONE THUMBTACK PER GROUP Z-SCORE CHART

Exam III Review Problems

Chapter 4: Sampling Design 1

Honors Statistics. Daily Agenda

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Midterm 2 Practice Problems

4.1: Samples & Surveys. Mrs. Daniel AP Stats

Proportions. Chapter 19. Inference about a Proportion Simple Conditions. Inference about a Proportion Sampling Distribution

Laboratory 1: Uncertainty Analysis

Chaloemphon Meechai 1 1

Discrete Random Variables Day 1

Theory of Probability - Brett Bernstein

Introduction to probability

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Spring 2017 Math 54 Test #2 Name:

Chapter 19. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

CHAPTER 4 Designing Studies

STAT 100 Fall 2014 Midterm 1 VERSION B

Probability: Anticipating Patterns

Statistics 101 Reviewer for Final Examination

Lesson Sampling Distribution of Differences of Two Proportions

MITOCW mit_jpal_ses06_en_300k_512kb-mp4

The Savvy Survey #3: Successful Sampling 1

Chapter 1 Introduction

INTRODUCTORY STATISTICS LECTURE 4 PROBABILITY

Section Summary. Finite Probability Probabilities of Complements and Unions of Events Probabilistic Reasoning

3. Data and sampling. Plan for today

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

The challenges of sampling in Africa

1. How many subsets are there for the set of cards in a standard playing card deck? How many subsets are there of size 8?

Department of Statistics and Operations Research Undergraduate Programmes

7 th grade Math Standards Priority Standard (Bold) Supporting Standard (Regular)

1. Describe the sample space and all 16 events for a trial in which two coins are thrown and each shows either a head or a tail.

Department of Mechanical and Aerospace Engineering. MAE334 - Introduction to Instrumentation and Computers. Final Examination.

One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8,... The assumed standard deviation = 110

Warm Up The following table lists the 50 states.

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Ordinary Level Time: 2 hours, 30 minutes. 300 marks L.19 NAME SCHOOL TEACHER

Statistical Hypothesis Testing

Unit Nine Precalculus Practice Test Probability & Statistics. Name: Period: Date: NON-CALCULATOR SECTION

23 Applications of Probability to Combinatorics

Lecture 1. Introduction

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition

MA 180/418 Midterm Test 1, Version B Fall 2011

Chapter 17: The Expected Value and Standard Error

Nessie is alive! Gerco Onderwater. Role of statistics, bias and reproducibility in scientific research

Multivariate Permutation Tests: With Applications in Biostatistics

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

Transcription:

Sampling distributions and the Central Limit Theorem Johan A. Elkink University College Dublin 14 October 2013 Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 1 / 29

Outline 1 Sampling 2 Statistical inference 3 Central Limit Theorem Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 2 / 29

Outline Sampling 1 Sampling 2 Statistical inference 3 Central Limit Theorem Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 3 / 29

Sampling Sampling Statistical inference (or inductive statistics) concerns drawing conclusions regarding a population of cases on the basis of a sample, a subset. Sampling refers to the selection of an appropriate subset of the population. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 4 / 29

Sampling Sampling frame The sampling frame refers to the identifiable list of members of the population, from which the sample can be selected. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 5 / 29

Sampling Simple random sampling Each subject from a population has the exact same chance of being selected in the sample, i.e. the sampling probability for each subject is the same. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 6 / 29

Sampling bias Sampling When the sampling probability correlates with a variable of interest, we are likely to get biased results. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 7 / 29

Sampling bias Sampling When the sampling probability correlates with a variable of interest, we are likely to get biased results. Other causes of bias: Misreporting by respondents Characteristics of interviewer Question-ordering effects Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 7 / 29

Exercise Sampling What is wrong with the following scenarios? Students in a class are asked to raise their hands if they have cheated on an exam one or more times within the past year. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 8 / 29

Exercise Sampling What is wrong with the following scenarios? Students in a class are asked to raise their hands if they have cheated on an exam one or more times within the past year. To get information on opinions among students, 100 students are surveyed at the start of a 9 am class. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 8 / 29

Exercise Sampling What is wrong with the following scenarios? Students in a class are asked to raise their hands if they have cheated on an exam one or more times within the past year. To get information on opinions among students, 100 students are surveyed at the start of a 9 am class. To get information on public opinion, you stand at the entrance of the Apple Store in a shopping street and interview passers-by randomly. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 8 / 29

Weighting Sampling Other types of sampling procedures exist, such as stratified or clustering sampling, whereby subsequent weighting of the data can recover the necessary unbiasedness for statistical inference. Generally, the weight would be the inverse of the probability of inclusion in the sample. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 9 / 29

Outline Statistical inference 1 Sampling 2 Statistical inference 3 Central Limit Theorem Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 10 / 29

Parameters Statistical inference A parameter is number that describes a feature of the population. A parameter is generally fixed and not observable. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 11 / 29

Parameters Statistical inference A parameter is number that describes a feature of the population. A parameter is generally fixed and not observable. A statistic is a number that describes a feature of a sample and is fixed for a given sample, but varies across samples. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 11 / 29

Parameters Statistical inference A parameter is number that describes a feature of the population. A parameter is generally fixed and not observable. A statistic is a number that describes a feature of a sample and is fixed for a given sample, but varies across samples. We can use statistics to estimate parameters. (Moore, McCabe & Craig 2012: 198) Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 11 / 29

Statistical inference From probability to statistics Using probability theory, we can understand how samples behave on average, given some assumptions. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 12 / 29

Statistical inference From probability to statistics Using probability theory, we can understand how samples behave on average, given some assumptions. By comparing the sample at hand to samples on average, we can draw probabilistic conclusions about the population parameters. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 12 / 29

Statistical inference Sampling distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. (Moore, McCabe & Craig 2012: 201) Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 13 / 29

Example Statistical inference Take 10 samples of size n = 4 from the class. Calculate average length. Draw histogram. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 14 / 29

Statistical inference Sampling error The amount of error when a population parameter is estimated or predicted by a sample estimate. The bigger the sample, the lower the sampling error. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 15 / 29

Statistical inference Estimates and uncertainty When we estimate a parameter, we are uncertain what the true value is. Besides an estimate of the parameter, we also need an estimate of how certain we are of this estimate. The typical indicator of this is the standard error. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 16 / 29

Outline Central Limit Theorem 1 Sampling 2 Statistical inference 3 Central Limit Theorem Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 17 / 29

A variable for which the first two assumptions hold is called iid. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 18 / 29 i.i.d. Central Limit Theorem We make three assumptions about our data to proceed: The observations are independent The observations are identically distributed The population has a finite mean and a finite variance

Central Limit Theorem Independent observations Intuitively: the value for one case does not affect the value for another case on the same variable. More formally: P(x 1 x 2 ) = P(x 1 )P(x 2 ). Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 19 / 29

Central Limit Theorem Independent observations Intuitively: the value for one case does not affect the value for another case on the same variable. More formally: P(x 1 x 2 ) = P(x 1 )P(x 2 ). Examples of dependent observations: grades of students in different classes; stock values over time; economic growth in neighbouring countries. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 19 / 29

Central Limit Theorem Identically distributed All the observations are drawn from the same random variable with the same probability distribution. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 20 / 29

Central Limit Theorem Identically distributed All the observations are drawn from the same random variable with the same probability distribution. An example where this is not the case would generally be panel data. E.g. larger firms will have larger variations in profits, thus their variance differs, thus these are not observations from the same probability distribution. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 20 / 29

Central Limit Theorem Random sample A proper random sample is i.i.d. The law of large numbers and the Central Limit Theorem help us to predict the behaviour of our sample data. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 21 / 29

Central Limit Theorem Law of large numbers The law of large numbers (LLN) states that, if these three assumptions are satisfied, the sample mean will approach the population mean with probability one if the sample is infinitely large. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 22 / 29

Central Limit Theorem Central Limit Theorem If these three assumptions are satisfied, Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 23 / 29

Central Limit Theorem Central Limit Theorem If these three assumptions are satisfied, The sample mean is normally distributed, regardless of the distribution of the original variable. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 23 / 29

Central Limit Theorem Central Limit Theorem If these three assumptions are satisfied, The sample mean is normally distributed, regardless of the distribution of the original variable. The sample mean has the same expected value as the population mean (LLN). Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 23 / 29

Central Limit Theorem Central Limit Theorem If these three assumptions are satisfied, The sample mean is normally distributed, regardless of the distribution of the original variable. The sample mean has the same expected value as the population mean (LLN). The standard deviation (standard error) of the sample mean is: S.E.( x) = σ x = σ x n. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 23 / 29

Central Limit Theorem Sample and population size Note that the standard error depends only on the sample size, not on the population size. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 24 / 29

Central Limit Theorem Central Limit Theorem: unknown σ When the population variance, σ, is unknown, we can use the sample estimate: ˆσ x = ˆσ x n ˆσ x = n i=1 (x i x) 2 n 1 Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 25 / 29

Central Limit Theorem Aside: variance of proportion Note that the variance of x that of which a proportion of p cases are 1 and all others 0 can be calculated as: σ 2 x = n i=1 (x i x) 2 n = p(1 p) Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 26 / 29

Central Limit Theorem Central Limit Theorem: example Suppose we have a random sample of 100 individuals and ask each what their first preference vote would be if there were elections today. If 30 of them say they would vote Fianna Fail, what is the standard error of the estimate that the proportion is p =.3? Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 27 / 29

Central Limit Theorem Central Limit Theorem: example Suppose we have a random sample of 100 individuals and ask each what their first preference vote would be if there were elections today. If 30 of them say they would vote Fianna Fail, what is the standard error of the estimate that the proportion is p =.3? σˆp = p(1 p) n = 0.21 100 = 0.0458 Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 27 / 29

Exercises Central Limit Theorem Calculate the standard errors: A sample of 20 students has an average grade of 60. Out of a sample of 100 road accidents, 10 were fatal. Of the 1300 respondents in a survey, 48% voted Yes on the Lisbon Treaty referendum. The average score on a 5-point political knowledge scale in the same survey is 2.34. Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 28 / 29

Regression Central Limit Theorem Open demdev.dta and look at the standard errors for: The mean of laggdppc and polity2. The correlation between laggdppc and polity2. The regression coefficients for regressing polity2 on laggdppc. The regression coefficients for regressing polity2 on log(laggdppc). Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 29 / 29