Polls, such as this last example are known as sample surveys.

Similar documents
Chapter 12: Sampling

Chapter 12 Summary Sample Surveys

Stats: Modeling the World. Chapter 11: Sample Surveys

Sample Surveys. Chapter 11

Basic Practice of Statistics 7th

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

AP Statistics S A M P L I N G C H A P 11

b. Stopping students on their way out of the cafeteria is a good way to sample if we want to know about the quality of the food there.

Other Effective Sampling Methods

Sample Surveys. Sample Surveys. Al Nosedal. University of Toronto. Summer 2017

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Chapter 3 Monday, May 17th

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

Sampling. I Oct 2008

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

7.1 Sampling Distribution of X

Class 10: Sampling and Surveys (Text: Section 3.2)

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

STA 218: Statistics for Management

Elements of the Sampling Problem!

Warm Up The following table lists the 50 states.

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

Sampling Designs and Sampling Procedures

CHAPTER 4 Designing Studies

4.1: Samples & Surveys. Mrs. Daniel AP Stats

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

Gathering information about an entire population often costs too much or is virtually impossible.

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Objectives. Module 6: Sampling

Unit 8: Sample Surveys

3. Data and sampling. Plan for today

Chapter 4: Designing Studies

Sampling distributions and the Central Limit Theorem

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Honors Statistics. Daily Agenda

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

Honors Statistics. Daily Agenda

Ch. 12: Sample Surveys

CHAPTER 8: Producing Data: Sampling

Population vs. Sample

STAT 100 Fall 2014 Midterm 1 VERSION B

Section 6.4. Sampling Distributions and Estimators

Full file at

UNIT 8 SAMPLE SURVEYS

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

Sample size, sample weights in household surveys

POLI 300 PROBLEM SET #2 10/04/10 SURVEY SAMPLING: ANSWERS & DISCUSSION

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

An Introduction to ACS Statistical Methods and Lessons Learned

SAMPLING. A collection of items from a population which are taken to be representative of the population.

The Savvy Survey #3: Successful Sampling 1

The challenges of sampling in Africa

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

March 10, Monday, March 10th. 1. Bell Work: Week #5 OAA. 2. Vocabulary: Sampling Ch. 9-1 MB pg Notes/Examples: Sampling Ch.

not human choice is used to select the sample.

Sierra Leone - Multiple Indicator Cluster Survey 2017

Moore, IPS 6e Chapter 05

Sampling Subpopulations in Multi-Stage Surveys

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233

RECOMMENDED CITATION: Pew Research Center, March 2014, Hillary Clinton s Strengths: Record at State, Toughness, Honesty

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Chapter 4: Sampling Design 1

Liberia - Household Income and Expenditure Survey 2016

Comparing Generalized Variance Functions to Direct Variance Estimation for the National Crime Victimization Survey

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

Mathematics Essential General Course Year 12. Selected Unit 3 syllabus content for the. Externally set task 2017

Methodology Marquette Law School Poll August 13-16, 2015

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Section 2: Preparing the Sample Overview

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

Chapter 1 Introduction

Methodology Marquette Law School Poll April 3-7, 2018

Mathematicsisliketravellingona rollercoaster.sometimesyouron. Mathematics. ahighothertimesyouronalow.ma keuseofmathsroomswhenyouro

A Guide to Sampling for Community Health Assessments and Other Projects

Botswana - Botswana AIDS Impact Survey III 2008

The main focus of the survey is to measure income, unemployment, and poverty.

Using Administrative Records for Imputation in the Decennial Census 1

Turkmenistan - Multiple Indicator Cluster Survey

Methodology Marquette Law School Poll June 22-25, 2017

AmericasBarometer, 2016/17

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Nigeria - Multiple Indicator Cluster Survey

Methodology Marquette Law School Poll October 26-31, 2016

There is no class tomorrow! Have a good weekend! Scores will be posted in Compass early Friday morning J

Guyana - Multiple Indicator Cluster Survey 2014

Methodology Marquette Law School Poll February 25-March 1, 2018

Austria Documentation

MAT Mathematics in Today's World

Sampling Subpopulations

Exam 2 Review. Review. Cathy Poliak, Ph.D. (Department of Mathematics ReviewUniversity of Houston ) Exam 2 Review

AF Measure Analysis Issues I

European Social Survey ESS 2010 Documentation of the Spanish sampling procedure

Why Randomize? Dan Levy Harvard Kennedy School

THE AP-GfK POLL August, 2012

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Laboratory 1: Uncertainty Analysis

Transcription:

Chapter 12 Notes (Sample Surveys) In everything we have done thusfar, the data were given, and the subsequent analysis was exploratory in nature. This type of statistical analysis is known as exploratory data analysis (EDA). Here, and in the next chapter, we will study techniques for producing or collecting data to answer specific questions. We will see that methods of collecting a sample of data in an unbiased fashion hinge on the idea of randomness. Three Keys to a Good Sample 1. Sampling: In most problems, we are interested in learning something about a population of individuals. However, it will often be the case that the population is too large or too difficult to examine completely, so we take a sample of individuals from the population which we hope is representative of the population as a whole. Examples: What is the average size of Ponderosa pines in a certain area? What proportion of Missoula residents have served on a jury? What proportion of American adults approve of the recently approved bail-out plan? What proportion are satisfied with the state of the country? Polls, such as this last example are known as sample surveys. 100

The most important aspect of a sample, no matter how the sample is taken, is that it is representative of the population from which it comes. If the sample is not representative of the population, we say it is biased. A biased sample is a useless sample! Bias arises in many ways. Consider the following examples which illustrate poor sampling techniques leading to biased samples. What s wrong with these samples? Examples: (a) Suppose CNN takes a poll where they ask viewers to call in and state whether or not they are happy in their marriage. 90% of the call-ins say they are unhappy in their marriage. Do we conclude that among the US population, 90% of all married people are unhappy in their marriage? (b) A university instructor wants to know how students feel about their statistics course. As students come to her office hours, she asks them to answer a few questions about the course. How accurate will the information gathered be? (c) A survey was given to UM students regarding their opinions on possible new businesses to open in the UC. The survey was administered to any student willing to fill it out. Would the responses received be accurate or somehow biased? 101

(d) Historical Mishaps: 1936 1948 This case illustrates a poor sampling technique that led to a sample not being representative of the intended population. In addition to the problems raised in the examples above (voluntary response, interviewer bias, convenience sampling), another common problem is nonresponse. People are sometimes difficult to locate or simply refuse to cooperate. How many of you have thrown out a mail survey or hung up the phone on someone who wants to ask you a few questions? If there is something different about the way nonrespondents would respond if they did respond, this can introduce bias in your results. How can we protect against unseen sources of bias? (e) Wording of Questions: Dont you agree that social workers should earn more money than they currently earn? 102

2. Randomization: The key to avoiding the introduction of bias in a sample is the use of randomization in selecting which population units will comprise the sample. Examples: (a) Reconsider the UC survey on new businesses. Although we might think that simply having students fill out the survey voluntarily is just as good as sampling students at random, can you think of sources of bias that might result? (b) Suppose a biologist captures and radio tags 50 cutthroat trout in the Rock Creek drainage to study the types of habitats in which these fish live. Do you think these 50 fish are representative of all cutthroat in terms of their habitat? Random selection of units to comprise a sample from the population protects against a particular type of bias known as selection bias. Such bias is the result of important but unrealized differences in the units of the population relative to what you are measuring. This is one of the startling truths about sampling. The introduction of randomness in selection actually allows us to draw accurate conclusions about the population. 103

3. Sample Size: The fundamental question when planning a study is: How large a sample do we need for the sample to be representative of the population? Although you might be tempted to think we should take a certain fraction or percentage of the population, it turns out that the size of the population (as long as it s large) is unimportant. In other words, a sample of 100 Missoula residents will be about as representative of the Missoula population as a sample of 100 US residents of the entire US. If the sample consists of the entire population, it is called a census. What problems might we encounter in trying to take a census? (a) (b) (c) 104

Parameters and Statistics: Typically, the purpose behind taking a sample is to gain information about some aspect of the population as a whole. In particular, we are often interested in estimating the mean or standard deviation of some variable, or the proportion of population units with some characteristic. For example, we might want to estimate: the average energy bill for Missoula residents, or the proportion of Americans currently unemployed, or the mean annual income of Montana residents. These unknown population quantities are called parameters. The point of taking a sample is to estimate these unknown parameters with statistics computed from the sample. (i.e.: we might use the sample mean energy bill y from a sample of 30 Missoula households to estimate the true but unknown average energy bill of Missoula residents). Notation: Common notation used is summarized below. (Sample) (Population) Name Statistic Parameter Mean y µ (pronounced mu ) Standard Deviation s σ (pronounced sigma ) Proportion p p Correlation r ρ (pronounced rho ) Regression coefficient b β (pronounced beta ) 105

Other Important Sampling Terminology Sampling Unit: The sampling unit is the basic unit on which we will measure the variables of interest (one value per unit); units might be people, households, animals, plots of land, etc. Sampling Frame: The sampling frame is a list of individual units from which the sample is chosen. This will not always be the same as the population of interest - examples? Sampling Variability: This is the notion that every time we take a sample from some population, we will generally not get the same answer. Consider taking a sample of size n = 10 from this class to estimate the average maximum distance traveled by foot in a day by a 241 student. Average Distance Sample 1 Sample 2 Sample 3 If we were to repeat this many times, we would have several distance averages, hopefully distributed around the true average. The variability in these averages is what we mean by sampling variability. The distribution of these averages is known as the sampling distribution of the mean maximum distance traveled in a sample of size 10. 106

Identify each of the following as a parameter or a statistic, and give the symbol used to represent it. In a sample of 2290 U.S. voters, 65% claim they will be voting for a certain Presidential candidate on election day. The proportion of all U.S. voters voting in the last election that voted for Barack Obama. 51% of all U.S. babies are boys. The proportion of all UM campus computers with working versions of SPSS. The 15% of women in the US Senate. The standard deviation of monthly incomes for 50 Missoula residents. The proportion of students at the University of Montana who participate in school- sponsored athletics. Now that we know all the terminology, let s consider some basic sampling methods which rely on randomness to select a representative sample from the population. 107

1. Simple Random Sample (SRS): Consider selecting a sample of size n. If this sample is drawn so that every possible sample of size n has the same chance of being selected, it is said to be a simple random sample (SRS). Example: Suppose we have a piece of land and we want to estimate the volume of timber or the number of woodpecker nests on the piece of land. A census might be too costly. One simple way to take a sample might be to divide the area into equal-sized blocks as shown below. The blocks should be small enough to survey reliably. Suppose the area is divided into 36 blocks and we ve decided to survey a sample of 9 blocks. To select an SRS, label the blocks in any order. Go to the random number table and select a row at random to generate 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 the sample. For example, suppose we choose row 7: 73184 95907 05179 51002 83374 52297 07769 99792 78365 93487 Starting at row 7, select an SRS of 9 plots, and mark them on the picture above. 108

Do the plots selected look random? In an SRS, every combination of 9 blocks has the same probability of being selected. Selecting an SRS does not guarantee that the particular sample selected is perfectly representative of the population. It is not the sample you select which is unbiased; it s the procedure by which the sample is selected which is unbiased. If we were selecting an SRS from an alphabetical list of 36 people, we probably wouldn t worry that the names weren t evenly distributed through the list, since we have no reason to believe that the variable being measured (e.g.: their opinion on some issue) is associated with their position on the list. However, in this example, we might know that there is geographic variation across the area (perhaps the left side is at a higher elevation than the right side). If this were true, we can use this extra information to ensure a more geographically representative sample by taking a stratified random sample of plots. 109

2. Stratified Random Sample: Suppose we divide the area into 3 rectangular subareas (from left to right with the elevation gradient) each containing 12 plots. Then take a separate SRS of size 3 within each subarea (using different random numbers for each subarea). This still gives a sample of size 9 as before, but under this plan, the sample taken is more equally representative of the varying elevations in the area. Note that we only need to label the individuals within each stratum. 01 07 01 07 01 07 02 08 02 08 02 08 03 09 03 09 03 09 04 10 04 10 04 10 05 11 05 11 05 11 06 12 06 12 06 12 Elevation Use row 29 from the random # table to select a stratified random sample, starting in the left stratum and proceeding to the right: 72042 12287 21081 48426 44321 58765 Select plots 01-12 according to: Ignore the values 96,97,98,99. Why do we do this? In what situations does stratified random sampling work best versus simple random sampling? 110 Random Random Plot Numbers Plot Numbers 1 00-07 7 48-55 2 08-15 8 56-63 3 16-23 9 64-71 4 24-31 10 72-79 5 32-39 11 80-87 6 40-47 12 88-95

3. Systematic random sampling: An alternative to a stratified random sample, which works quite well in sampling over a geographic area, is a systematic random sample. Systematic sampling is easiest when the population size is a multiple of the sample size n, as it is here. We then calculate the population size divided by the sample size, 36/9 = 4. Next, we randomly choose one of the first 4 plots randomly using the random number table. 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 The sample consists of this first plot chosen, and then every fourth plot after that. To take a systematic sample here, use row 14 of the random number table: 87736 Systematic sampling also works well for sampling plots along a transect, names from a list, etc. One advantage of systematic sampling is that we don t have to know how many individuals there are in the population because we don t have to assign a label to every individual beforehand. Other advantages? Issues? 111

4. Cluster Sampling: In cluster sampling, we first select a random sample of clusters of individuals, and then survey every individual in the selected clusters. Examples: (a) To survey households in Missoula, I might select an SRS of 20 street blocks and include every household on the selected blocks. (b) To estimate the average height of trees in an area, we might randomly select 5 plots and measure every tree in each plot. When would we use a cluster sample versus a stratified random sample? 5. Multistage Designs: All of the sampling methods discussed above require a list of every individual in the population. Often, such a list is not available (such as sampling US households), so multistage sample designs are used. For example, we might take a stratified random sample of counties in the US (with geographic regions as strata), then an SRS of blocks within each county, and an SRS of households within each block. Here, we only need a list of households for the selected blocks, not every household in the US. 112

Data Collection Strategies: Having discussed some of the details involved with taking good samples, the table below summarizes the advantages and disadvantages of some common data collection methods. Strategies Advantages Disadvantages Personal High response Interviewer bias Interview rate Leading questions Cost/time Telephone Less expensive Good lists unavailable Interview Easy to monitor (Undercoverage) Must be shorter Questionnaires Inexpensive Low response rate No interviewer bias Bias? Direct Generally very accurate Time consuming Observation Observer error 113