Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Similar documents
Stats: Modeling the World. Chapter 11: Sample Surveys

Unit 8: Sample Surveys

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

Sample Surveys. Chapter 11

Chapter 12: Sampling

Chapter 3 Monday, May 17th

Polls, such as this last example are known as sample surveys.

Chapter 12 Summary Sample Surveys

Sample Surveys. Sample Surveys. Al Nosedal. University of Toronto. Summer 2017

3. Data and sampling. Plan for today

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

SAMPLING. A collection of items from a population which are taken to be representative of the population.

Objectives. Module 6: Sampling

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

7.1 Sampling Distribution of X

Basic Practice of Statistics 7th

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

STA 218: Statistics for Management

b. Stopping students on their way out of the cafeteria is a good way to sample if we want to know about the quality of the food there.

AP Statistics S A M P L I N G C H A P 11

Full file at

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Class 10: Sampling and Surveys (Text: Section 3.2)

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Sampling. I Oct 2008

The challenges of sampling in Africa

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

Gathering information about an entire population often costs too much or is virtually impossible.

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Other Effective Sampling Methods

not human choice is used to select the sample.

STAT 100 Fall 2014 Midterm 1 VERSION B

CHAPTER 4 Designing Studies

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Chapter 1 Introduction

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

The Savvy Survey #3: Successful Sampling 1

Sampling Designs and Sampling Procedures

4.1: Samples & Surveys. Mrs. Daniel AP Stats

Chapter 4: Designing Studies

Elements of the Sampling Problem!

Warm Up The following table lists the 50 states.

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

Population vs. Sample

CHAPTER 8: Producing Data: Sampling

Section 2: Preparing the Sample Overview

Botswana - Botswana AIDS Impact Survey III 2008

POLI 300 PROBLEM SET #2 10/04/10 SURVEY SAMPLING: ANSWERS & DISCUSSION

Sampling, Part 2. AP Statistics Chapter 12

Ch. 12: Sample Surveys

Honors Statistics. Daily Agenda

Sierra Leone - Multiple Indicator Cluster Survey 2017

Sampling Subpopulations

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) August 12, 2014

Chapter 4: Sampling Design 1

Sampling distributions and the Central Limit Theorem

Zambia - Demographic and Health Survey 2007

Sample size, sample weights in household surveys

Name: Marta Maia Title: Dr (Technical Manager) Organization: Vox Populi

Survey of Massachusetts Congressional District #4 Methodology Report

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

March 10, Monday, March 10th. 1. Bell Work: Week #5 OAA. 2. Vocabulary: Sampling Ch. 9-1 MB pg Notes/Examples: Sampling Ch.

a) Getting 10 +/- 2 head in 20 tosses is the same probability as getting +/- heads in 320 tosses

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

An Introduction to ACS Statistical Methods and Lessons Learned

The main focus of the survey is to measure income, unemployment, and poverty.

Massachusetts Renewables/ Cape Wind Survey

Honors Statistics. Daily Agenda

Thailand - The Population and Housing Census of Thailand IPUMS Subset

Methodology Marquette Law School Poll August 13-16, 2015

Methodology Marquette Law School Poll June 22-25, 2017

Residential Paint Survey: Report & Recommendations MCKENZIE-MOHR & ASSOCIATES

AmericasBarometer, 2016/17

SAMPLING BASICS. Frances Chumney, PhD

Sampling Subpopulations in Multi-Stage Surveys

Methodology Marquette Law School Poll October 26-31, 2016

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

Methodology Marquette Law School Poll February 25-March 1, 2018

Comparative Study of Electoral Systems (CSES) Module 3: Sample Design and Data Collection Report June 05, 2006

Methodology Marquette Law School Poll April 3-7, 2018

RECOMMENDED CITATION: Pew Research Center, March 2014, Hillary Clinton s Strengths: Record at State, Toughness, Honesty

FINANCIAL LITERACY SURVEY IN BOSNIA AND HERZEGOVINA 2011

Why Randomize? Dan Levy Harvard Kennedy School

UNIT 8 SAMPLE SURVEYS

MOTIVATING BLACK COMMUNITIES TO PARTICIPATE IN THE 2020 CENSUS

THE AP-GfK POLL August, 2012

Statistical Measures

Proportions. Chapter 19. Inference about a Proportion Simple Conditions. Inference about a Proportion Sampling Distribution

Some Indicators of Sample Representativeness and Attrition Bias for BHPS and Understanding Society

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report)

Course Overview J-PAL HOW TO RANDOMIZE 2

A Guide to Sampling for Community Health Assessments and Other Projects

SAMPLE DESIGN A.1 OBJECTIVES OF THE SAMPLE DESIGN A.2 SAMPLE FRAME A.3 STRATIFICATION

European Social Survey ESS 2010 Documentation of the Spanish sampling procedure

Preservation Costs Survey. Summary of Findings

Estimation Methodology and General Results for the Census 2000 A.C.E. Revision II Richard Griffin U.S. Census Bureau, Washington, DC 20233

Lesson Sampling Distribution of Differences of Two Proportions

Transcription:

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM 1

Chapter 1: Introduction Three Elements of Statistical Study: Collecting Data: observational data, experimental data, survey data Describing and Presenting Data: graphical and numerical descriptions Drawing Conclusions from Data: Point estimation and inference 2

Survey sampling: want to use sample information to make inference about the finite population. - The rest of statistics, Y 1, Y 2, Y n are random variables with a distribution, say normal distribution N(u, σ 2 ). Observed values of random variables are y 1, y 2,, y n. - General probability sampling (Design based analysis): Y 1, Y 2,, Y N is the population. We sample n of the N units, say y 1, y 2,, y n according to a pre-specified design in which we assign a probability of selection to each possible subset of the population of size n. Neither Y 1, Y 2, Y N nor y 1, y 2,, y n are random variables. Random variables are Z i s with Z i = { 1 if unit i S 0 otherwise 3

Some definitions: Observation Units: an object on which a measurement is taken. Sometimes called an element Target Population: the complete collection of observations we want to study -Defining the target population is an important and often difficult part of the study. -For example, in a political poll, should the target population be all adults eligible to vote? All registered voters? All persons who voted in the last election? The choice of target population will profoundly affect the statistics that result Sampled population: the population from which the sample was taken Note: In an ideal survey, the sampled population will be identical to the target population, but this ideal is rarely met exactly 4

Sample: a subset of a population Sampling unit: the unit we actually sample Example: we want to study individuals but do not have a list of all individuals in the target population. Instead, households serve as the sampling units, and the observation units are the individuals living in the households Sampling frame: the list of sampling units Example: for telephone surveys, the sampling frame might be a list of all residential telephone members in the city; for personal interviews, the sampling frame might be a list of all street addresses 5

Census: when data is collected on every unit of the population, it is called a census. Population parameter: a number that results from measuring all the units in the population Statistic: a number that results from measuring all the units in the sample Statistics derived from samples are used to estimate population parameters. 6

Example: Telephone survey of likely voters Target population: all likely voters Sampling frame: a list of telephone numbers (1) Not all households have telephones (2) People with phone are not registered to vote, hence ineligible for survey (3) Some eligible people with phones can not be contacted, refuse to respond or incapable of responding Sampling unit: a phone number Observation unit: individual associated with the phone number 7

Figure 1: Telephone survey of likely voters 8

Why Sampling? Cost: census is expensive. Time: census is very time consuming. Impractical: in some applications census can be impractical. Example: The government requires automakers who want to sell cars in the U.S. to demonstrate that their cars can survive certain crash tests. Obviously, the company can t be expected to crash every car to see if it survives! So the company crashes only a sample of cars. 9

Types of Samples: 1. Non-probability (non-random) samples: these samples focus on volunteers, easily available units, or those that just happen to be present when the research is done. Non-probability samples are useful for quick and cheap studies, for case studies, for qualitative research, for pilot studies, and for developing hypotheses for future research. However, non-probability samples are often biased. Convenience sample: also called an accidental sample or man-in-the-street samples. The researcher selects units that are convenient, close at hand, easy to reach, etc. 10

Purposive sample: the researcher selects the units with some purpose in mind, for example, students who live in dorms on campus, or females. Quota sample: the researcher constructs quotas for different types of units. For example, to interview a fixed number of shoppers at a mall, half of whom are male and half of whom are female. 11

2. Probability-based (random) samples: These samples are based on probability theory. Every unit of the population of interest must be identified, and all units must have a known, non-zero chance of being selected into the sample. Simple random sample (SRS): Randomly select a size n sample from a size N population. Each unit in the population is identified. -a) The sampling unit and observation unit is the same; b) Each subset of size n has same probability of being the sample; -c) Each unit has an equal chance of being selected in the sample; -Random number generators -Lottery method 12

Systematic random sampling: First randomly picks the first item or subject from the population. Then, select each nth subject from the list. The results are representative of the population unless certain characteristics of the population are repeated for every nth individual which is highly unlikely. Systematic sampling is useful for selecting large samples, say 100 or more. It is less cumbersome than a simple random sample using either a table of random numbers or lottery method If the selection interval matches some pattern in the list, for example, the list is male, female, male, female,, and you select No.1, No.3, No.5 observations to form a systematic sample, you will introduce systematic bias into your sample 13

Stratified random sampling: Divide population into H strata, take an SRS of size n h from stratum h, h = 1,, H, select the sample independently. Example: You want to find out the attitudes of students on your campus about immigration. 27,000 students: 22,000 West; 3,000 East; 1000 Midwest; 600 South; 400 Foreign. Select a simple random sample of 1500 students, you might not get any from the Midwest, South, or Foreign. Divide the students into these five groups (Stratum), and then select the same percentage of students from each group using a simple random sampling method. This is proportional stratified random sampling. Divide students into the five groups and then select the same number of students from each group using a simple random sampling method. This is disproportionate stratified random sampling. 14

Cluster sampling: A cluster is a naturally-occurring grouping of the members of the population. For example, city residents are also residents of neighborhoods, blocks, and housing structures. Randomly select n clusters, then observe all the elements in the selected clusters or partial of the elements in the selected clusters. 15

Example: To obtain information about the drug habits of all high school students in New Mexico. -Obtain a list of all the high schools in NM -Select an SRS of high schools -Within each selected high school, list all classes, and select an SRS of classes -The students in the selected classes are the observations in your sample 16

Biases Selection Bias: If some part of the target population is not in the sampled population, a bias called Selection Bias occurs. Example: In a survey to estimate per capita income, if transient people are ignored. Mis-specification of the target population -Failure to include all the target population in the sampling frame, also called undercoverage -Substituting a convenient member of a population for a designated member not readily available 17

-Non Response: Failure to obtain responses from all those chosen in the sample -Allowing a sample to consist entirely of volunteers (Radio, TV, or call-in polls) Note that large samples are generally considered good but if the sample is unrepresentative, it can be quite bad. The design of the survey is far more important than the absolute size of the sample. 18

Measurement Bias: Measurement bias occurs when the measuring instrument has a tendency to reord in one direction more often than the other. Measurement biases are more common when dealing with people. -People may not tell the truth -Lack of understanding of questions -Lack of proper account of events in memory -Variations in responses due to interviewer -Misreading questions, or miss recording responses -Desire to impress the interviewer -Ordering and wording of questions have effects on responses Many of these problems can be avoided by proper questionnaire design 19

Questionnaire Design: Decide what you want to find out; this is the most important step in writing a questionnaire Pilot study: Test questions before sending out the questionnaire. Keep the questions Simple and Clear: Questions should be neither too lengthy nor too technical. They should be easily understood by non experts Questions should be specific and not general Decide whether to use open or closed questions. -Open Question: The respondent is not prompted with categories for responses. It allows responses to form their own response categories. Closed Question: A question is closed when specific response categories are provided. 20

Closed questions with well thought and researched categories elicit more accurate responses. Avoid questions that prompt or motivate the respondent to say what investigator wants to hear Use choices rather that Agree/Disagree type questions Ask only one concept in one question Pay attention to question-order effect. Ask general questions first then follow with specific questions. 21

Sampling and Non-Sampling Errors: Sampling Errors: Sampling errors are results of inherent variability in the sampling process. These arise because the results vary from sample to sample. Margin of errors reported are a result of sampling error. These can only be reduced by increasing the sample size but not be eliminated. Non-Sampling Errors: These result from selection bias, measurement error and inaccuracies of responses. These can not be attributed to sample-tosample variability. Such errors can be eliminated by proper precautions. Selection bias can be reduced by using probability sample. Accurate responses can be achieved through proper and careful design of survey instrument and training of interviewers. 22