Chapter 12: Sampling

Similar documents
Polls, such as this last example are known as sample surveys.

AP Statistics S A M P L I N G C H A P 11

Sample Surveys. Chapter 11

Class 10: Sampling and Surveys (Text: Section 3.2)

Basic Practice of Statistics 7th

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Chapter 12 Summary Sample Surveys

Stats: Modeling the World. Chapter 11: Sample Surveys

Chapter 3 Monday, May 17th

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

7.1 Sampling Distribution of X

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Elements of the Sampling Problem!

Objectives. Module 6: Sampling

Sampling. I Oct 2008

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Sample Surveys. Sample Surveys. Al Nosedal. University of Toronto. Summer 2017

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

b. Stopping students on their way out of the cafeteria is a good way to sample if we want to know about the quality of the food there.

CHAPTER 4 Designing Studies

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

Other Effective Sampling Methods

Gathering information about an entire population often costs too much or is virtually impossible.

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

Chapter 4: Designing Studies

SAMPLING. A collection of items from a population which are taken to be representative of the population.

STA 218: Statistics for Management

Full file at

Section 6.4. Sampling Distributions and Estimators

Sampling distributions and the Central Limit Theorem

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

Unit 8: Sample Surveys

3. Data and sampling. Plan for today

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Ch. 12: Sample Surveys

CHAPTER 8: Producing Data: Sampling

Sampling Designs and Sampling Procedures

STAT 100 Fall 2014 Midterm 1 VERSION B

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

POLI 300 PROBLEM SET #2 10/04/10 SURVEY SAMPLING: ANSWERS & DISCUSSION

4.1: Samples & Surveys. Mrs. Daniel AP Stats

Warm Up The following table lists the 50 states.

UNIT 8 SAMPLE SURVEYS

Honors Statistics. Daily Agenda

Population vs. Sample

not human choice is used to select the sample.

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

The Savvy Survey #3: Successful Sampling 1

March 10, Monday, March 10th. 1. Bell Work: Week #5 OAA. 2. Vocabulary: Sampling Ch. 9-1 MB pg Notes/Examples: Sampling Ch.

The challenges of sampling in Africa

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

Honors Statistics. Daily Agenda

Understanding and Using the U.S. Census Bureau s American Community Survey

a) Getting 10 +/- 2 head in 20 tosses is the same probability as getting +/- heads in 320 tosses

Sample size, sample weights in household surveys

Comparing Generalized Variance Functions to Direct Variance Estimation for the National Crime Victimization Survey

Chapter 4: Sampling Design 1

**Gettysburg Address Spotlight Task

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

An Introduction to ACS Statistical Methods and Lessons Learned

Botswana - Botswana AIDS Impact Survey III 2008

Moore, IPS 6e Chapter 05

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

Chapter 1 Introduction

Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000

Exam 2 Review. Review. Cathy Poliak, Ph.D. (Department of Mathematics ReviewUniversity of Houston ) Exam 2 Review

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Laboratory 1: Uncertainty Analysis

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

RECOMMENDED CITATION: Pew Research Center, March 2014, Hillary Clinton s Strengths: Record at State, Toughness, Honesty

Zambia - Demographic and Health Survey 2007

Statistical Measures

AP Statistics Ch In-Class Practice (Probability)

Methodology Marquette Law School Poll August 13-16, 2015

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Mathematicsisliketravellingona rollercoaster.sometimesyouron. Mathematics. ahighothertimesyouronalow.ma keuseofmathsroomswhenyouro

Using Administrative Records for Imputation in the Decennial Census 1

Methodology Marquette Law School Poll June 22-25, 2017

PMA2020 Household and Female Survey Sampling Strategy in Nigeria

Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.

Section 2: Preparing the Sample Overview

Liberia - Household Income and Expenditure Survey 2016

Vincent Thomas Mule, Jr., U.S. Census Bureau, Washington, DC

Massachusetts Renewables/ Cape Wind Survey

Guyana - Multiple Indicator Cluster Survey 2014

Methodology Marquette Law School Poll October 26-31, 2016

Saint Lucia Country Presentation

Methodology Marquette Law School Poll February 25-March 1, 2018

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Ordinary Level Time: 2 hours, 30 minutes. 300 marks L.19 NAME SCHOOL TEACHER

Thailand - The Population and Housing Census of Thailand IPUMS Subset

Nigeria - Multiple Indicator Cluster Survey

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Methodology Marquette Law School Poll April 3-7, 2018

A Guide to Sampling for Community Health Assessments and Other Projects

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Survey of Massachusetts Congressional District #4 Methodology Report

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

Transcription:

Chapter 12: Sampling In all of the discussions so far, the data were given. Little mention was made of how the data were collected. This and the next chapter discuss data collection techniques. These methods depend strongly on randomization to insure that the data are as representative as possible of a population of interest. A population is a collection of units or individuals. In statistical applications, a researcher attempts to describe or draw conclusions about a population without examining all units in the population. Key ideas 1. Sampling: Draw some units from the population, and then measure the variables of interest on the sampled units. A sampling approach is pursued when the population is too large to allow measurement of all units. A sample is a subset of units drawn from the population. The principal objective is choose the units so that the sample is representative of the population 1. Some questions that might be answered by sampling: What is the average lead content of water in University of Montana drinking fountains? Has the survival time of cancer patients increased in the past 40 years? What is the ratio of elk calves to cows in Ravalli county? The procedure by which the sample is collected is called the sampling design. Polls are known as sample surveys. The most important attribute of a sampling design is the extent to which the sample is representative of the population from which it is drawn. If the sample is not representative of the population, it is biased. A biased sample is useless, or nearly so. Bias arises in many ways. The following examples of sample surveys illustrate a particular source of bias. 1 Even when things are done right, a small sample might not be representative by random chance. 84

(a) Voluntary response: KPAX conducts a poll where they ask viewers to call in and state whether or not they are happy in their marriage. 84% of the callers say they are unhappy in their marriage. Calls come from motivated callers. (b) Interviewer bias: A university instructor wants to know how students feel about their statistics course. As students come to her office hours, she asks them a few questions about the course. Most students will be reluctant to express negative feelings towards the course. (c) Convenience sampling: A survey was given to students regarding their opinions on a possible new business to open in the UC. Questionnaires were filled out by any student walking into the UC and willing to take the time to complete the questionnaire. Those individuals that were sampled were selected because it was convenient to the researcher. (d) Non-response bias: A questionnaire about hunting wolves is mailed to a random sample of households in Montana. Not all households will return the completed questionnaire. If there is something different about the way non-respondents would respond if they did respond, then the responses are biased. (e) Leading questions: There are now more wolves in Montana than anytime in the past 100 years. Don t you agree that an open season on wolves is necessary to restore elk populations? Leading questions, intentional or not, tend to prompt a particular answer to the respondent. 2. Randomization: Sample units are often selected from a list of all population units, and the selected units are chosen randomly. 2 Randomization minimizes bias related to the selection of sampling units. Examples of biased sampling designs: (a) 1936 Presidential election: Alf Landon v. Franklin Delano Roosevelt. 2.4 million people were polled by Literary Digest. 57% of the respondents said they would vote for Landon, yet Landon received 37% of the votes and Roosevelt received 62%. Researchers drew the sample from telephone books, magazine subscribers and club rosters. Why was the sampling design biased? 3 (b) 1948 Presidential election: Thomas Dewey v. Harry S. Truman. A phone survey conducted for The Chicago Tribune found that Dewey would beat Truman. The morning after the election, the Tribune announced that Dewey won. Yet Truman won easily. Why was the sampling design biased? 4 2 Some designs do not require a list. 3 The respondents were wealthier and more conservative than the general population. 4 People with phones were wealthier and more conservative than the general population. 85

(c) Would a telephone survey regarding the 2012 presidential election be unbiased if those called were randomly sampled from a phone book? 5 In summary, Collecting a sample by randomly selecting the sample units protects against selection bias. Selection bias is the result of important but perhaps unrecognized differences between population and sample units. Introducing randomness to the selection process usually is necessary for accurate inferences to be drawn about the population. Random sampling avoids bias because each population unit has an equal chance of being sampled. 6 3. Sample size: A fundamental question when planning a study is how many units are necessary to insure that the sample is representative of the population? Intuition suggests that the sample should be a certain fraction or percentage of the population. This is incorrect. In fact, the size of the population (as long as it s large) does not affect the accuracy of the results. For example, a random sample of 100 Montana residents will be about as representative of Montana incomes as a random sample of 100 U.S. residents will be representative of U.S. incomes. If the sample consists of the entire population, it is called a census. Collecting a census is usually not practical because of (a) Expense. (b) Most populations change over time so if it takes a long time to census the population, the population at the beginning of the survey may differ from the population at the end of the survey. (c) Complexity. Very large sampling efforts encounter problems such as undercounting, double-counting, and recording errors committed by the samplers. Parameters and Statistics The purpose of sampling is to obtain information about some aspect of the population. Often interest lies in estimating the mean or standard deviation of some variable or the proportion of population units with some characteristic. 5 Most land-line phone numbers are listed but few cell phones are listed. 6 This statement is not strictly true; there are unbiased designs with unequal but known positive probabilities of including each population unit. 86

Examples: The average December heating bill for Missoula residences, The proportion of UM students that are vaccinated against meningitis, The total number of cow elk between 6 and 12 years of age in Ruby River area of southwest Montana. These unknown population quantities are parameters. A sample is used to estimate parameters using statistics computed from the sample. Example: The average December heating bill y from a sample of 30 Missoula residences estimates the average December heading bill of all Missoula residences. Notation: (Sample) (Population) Name Statistic Parameter Mean y µ Standard Deviation s σ Proportion p p Correlation r ρ Regression coefficient b β More sampling terminology: Sampling unit: The sampling unit is an object on which the variables of interest are measured; sampling units might be people, households, mice, or plots of land. Sampling frame: The sampling frame is a list of units from which the sample is chosen. This will not always be the same as the population of interest. Example: if the population of interest is registered voters, then a phone book might be used as a sampling frame. Sampling variability: Sampling variability is variability in a statistic that is introduced by the random selection of units. Example: take three random samples of size n = 5 to estimate the average score on Exam I. The averages will be different because the samples are different. Variability among averages is sampling variability. Sample 1 Sample 2 Sample 3 Average 95.7 97.3 93.3 If many samples of 5 scores were collected, then the sample averages will be distributed around the population mean µ. Variability of the sample means about µ is sampling 87

variability. The sampling variability of a statistic quantifies the precision of the statistic. Identify each of the following as a parameter or a statistic, and give the symbol used to represent it. 1. From a sample of 2290 U.S. voters, 65% report they will be voting for a certain Presidential candidate on election day. 7 2. The proportion of all U.S. voters that voted for Barack Obama in the 2008 presidential election. 8 3. The proportion of of all U.S. births that are boys. 9 4. The proportion of women serving in the 112th Congress. 10 5. The standard deviation of monthly incomes of 50 Missoula residents. 11 6. The average GPA of students at the University of Montana. 12 Sampling designs: 1. Simple random sample (SRS): A sample of size n is collected. If this sample is drawn so that every possible sample of size n has the same chance of being selected, it is a simple random sample. Example: Estimate the volume of timber or the number of woodpecker nests within a forest stand. Divide the area into equal-sized blocks. The blocks should be small enough to inventory reliably. The stand is divided into 36 blocks (right). Nine blocks will be selected and inventoried. To select a SRS, label the blocks in any order. Starting at row 7, write down nine consecutive two-digit numbers between 1 and 36. 13 These numbers 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 identify the sample units. 31 32 33 34 35 36 7 Statistic: p =.65. 8 Parameter: p =.52. 9 Parameter: p =.51. 10 Parameter: p =.189 11 Statistic: s. 12 Parameter: µ. 13 It the same number appears twice, replace one copy with another random number. 88

Go to the random number table and select a row at random to generate the sample. For example, row 7 is 73184 95907 05179 51002 83374 52297 07769 99792 78365 93487 The sample consists of the first 9 two-digit numbers between 01 and 36: 18, 07, 05,.... In an SRS design, every combination of 9 blocks has the same probability of being selected. Unavoidably, some combinations of blocks will not be uniformly distributed in space. Selecting an SRS does not guarantee that a particular sample is perfectly representative of the population. It is not the sample that is biased or unbiased; it s the sampling design that is biased or unbiased. An unbiased sampling method is one that, on average, produces estimates that are not biased. 14 Regretably, a particular sample collected according to an unbiased sampling design may produce an inaccurate estimate of the parameter(s) of interest. 15 2. Stratified Random Sample: In the SRS example, suppose that trees within the stand vary predictably because of an environmental gradient (e.g., aspect). This extra information can be exploited to ensure a more representative sample if stratified random sampling is used. Stratified random sampling is used when the population is can be partitioned into a few sub-populations or strata that are more alike within sub-populations than between sub-populations. Each strata is randomly sampled and a parameter estimate is computed for each strata. Afterward, the estimates (or sub-samples) are pooled (combined) since it s usually of interest to compute estimates for the combined population as well as each strata. In the forest stand example, the stand might be divided into three bands or strata (from left to right with the elevation gradient) each containing 12 plots. The sample size is 9 as before, but under this design, the sample better represents each of the elevational strata. 14 That is, neither consistently too large nor too small. 15 By random chance. 89

Use row 29 from the random number table to select a random sample from the left stratum. Row 29 is: 72042 12287 21081 48426 44321 58765 Identify the sample plots by determining the intervals containing the two-digit random numbers. For example, the random number 72 means that plot 10 is in the sample. Random Random Plot Numbers Plot Numbers 1 00-07 7 48-55 2 08-15 8 56-63 3 16-23 9 64-71 4 24-31 10 72-79 5 32-39 11 80-87 6 40-47 12 88-95 Ignore numbers 96, 97, 98, 99. 01 07 01 07 01 07 02 08 02 08 02 08 03 09 03 09 03 09 04 10 04 10 04 10 05 11 05 11 05 11 06 12 06 12 06 12 Elevation In what situations is stratified random sampling better than simple random sampling? 16 3. Systematic random sampling: An alternative to a stratified random sampling that works well when sampling a geographic area is systematic random sampling. Systematic sampling is easiest when the population size is a multiple of the sample size n, as it is here. The sampling interval is the population size divided by the sample size. In this case, it is 36/9 = 4. Next, randomly choose one of the first 4 plots randomly using the random number table. 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 The sample consists of this first plot chosen, and then every fourth plot after that. To take a systematic sample here, use row 14 of the random number table. Take the first number between 1 and 4 as a starting point: consists of plots 3, 7, 11,, 35. 87736 3. The sample 16 When units are homogeneous within strata and heterogeneous between strata. Furthermore, when the population can be partitioned into a few meaningful strata, and the strata are straightforward to sample using simple random sampling. 90

Systematic sampling also works well for sampling plots along a transect, names from a list, and more generally, whenever there is a list that does not have systematic or repeating patterns. One advantage of systematic sampling is that a label does not have to be assigned to every population unit provided the units are already organized in some way. A phone book is an example of such a list. Other advantages: easy to use, and usually, the sample is uniformly distributed. Caution: be wary of patterns in the list that may lead to bias. 4. Cluster Sampling: Cluster sampling is conducted in two stages. In the first stage, a random sample of clusters (a cluster is a small group of units) is selected. Then, every unit in each selected cluster is sampled. Examples: (a) To survey households in Missoula, an SRS of 20 street blocks may be drawn. Then every household on the selected blocks are included in the sample. (b) To estimate the average height of trees in an area, randomly select 5 plots and measure every tree in each plot. Cluster sampling is preferable to stratified random sampling when the population can be organized as many small groups (clusters). 17 5. Multistage Designs: All of the sampling methods discussed above require a list of every unit in the population. Sometimes lists are unavailable. In this case, multistage sample designs often are used. A multistage design successively breaks down the population as a layers. The layers are are successively sampled using the three methods discussed above. For example, a demographic study of the U.S. population might define the top strata as geographic region. Within each stratum (a geographic region), an SRS of counties (the second layer) is selected. Within each selected county, an SRS of blocks (a geographic unit defined by the U.S. Census Bureau) is sampled. The third layer are blocks. In this case, a list of every census block in the U.S. is not necessary. A list of all blocks within the sampled counties is necessary, however. 17 Clusters tend not to be substantially different, though the units within clusters may be different. It should be easy to sample all units within a cluster. An example is sampling demographic variables. It s convenient to define a cluster as a family. Stratified random sampling is used when the population can be organized as a few large distinct strata (by geographic region, for example) and there is interest in each stratum as a distinct sub-population. 91

Survey methods: The table below summarizes the advantages and disadvantages of some common survey methods. Strategies Advantages Disadvantages Personal High response Interviewer bias Interview rate Leading questions Cost/time Telephone Less expensive Good lists unavailable Interview Easy to monitor (undercoverage) Ought be fast Questionnaires Inexpensive Low response rate No interviewer bias possible non-response bias Direct Generally very accurate Time consuming Observation Observer error 92