INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens University of Illinois at Chicago www.srl.uic.edu 1 of 22
Census or sample? Census: Gathering information about every individual in a population Sample: Selection of a small subset of a population 2 of 22
Why sample instead of taking a census? Less expensive Less time-consuming More accurate Samples can lead to statistical inference about the entire population 3 of 22
Probability Sample Generalize to the entire population Unbiased results Known, non-zero probability of selection Non-probability Sample Exploratory research Convenience Probability of selection is unknown 4 of 22
Target population Definition: The population to which we want to generalize our findings. Unit of analysis: Individual/Household/City Geography: State of Illinois/Cook County/ Chicago Age/Gender Other variables 5 of 22
Examples of target populations Population of adults (18+) in Cook County UIC faculty, staff, students Youth age 5 to 18 in Cook County 6 of 22
Sampling frame A complete list of all units, at the first stage of sampling, from which a sample is drawn For example, Lists Phone numbers in specific area codes Maps of geographic areas 7 of 22
Sampling frames Example 1: Population: Adults (18+) in Cook County Possible Frame: list of phone numbers, list of block maps, list of addresses Example 2: Population: Females age 40 60 in Chicago Possible Frame: list of phone numbers, list of block maps Example 3: Population: Youth age 5 to 18 in Cook County Possible Frame: List of schools 8 of 22
Sample designs for probability samples Simple random samples Systematic samples Stratified samples Cluster Multi-stage 9 of 22
Simple random sampling Definition: Every element has the same probability of selection and every combination of elements has the same probability of selection. Probability of selection: n/n, where n = sample size; N = population size Use Random Number tables, software packages to generate random numbers Most precision estimates assume SRS 10 of 22
Systematic sampling Definition: Every element has the same probability of selection, but not every combination can be selected. Use when drawing SRS is difficult List of elements is long & not computerized Procedure Determine population size N and sample size n Calculate sampling interval (N/n) Pick random start between 1 & sampling interval Take every ith case Problem of periodicity 11 of 22
Stratified sampling: Proportionate To ensure sample resembles some aspect of population Population is divided into subgroups (strata) Students by year in school Faculty by gender Simple Random Sample (with same probability of selection) taken from each stratum. 12 of 22
Stratified sampling: Disproportionate Major use is comparison of subgroups Population is divided into subgroups (strata) Compare girls & boys who play Little League Compare seniors & freshmen who live in dorms Probability of selection needs to be higher for smaller stratum (girls & seniors) to be able to compare subgroups. Post-stratification weights 13 of 22
Cluster sampling Typically used in face-to-face surveys Population divided into clusters Schools (earlier example) Blocks Reasons for cluster sampling Reduction in cost No satisfactory sampling frame available 14 of 22
Determining sample size: SRS Need to consider Precision Variation in subject of interest Formula Sample size n o = CI 2 * (pq) Precision For example: n o = 1.96 2 * (.5 *.5).05 2 Sample size not dependent on population size. 15 of 22
Sample size: Other issues Finite Population Correction n = n o /(1 + n o /N) Design effects Analysis of subgroups Increase size to accommodate nonresponse Cost 16 of 22
Cell Phones 24.5% of US Households are cell phone only (Blumberg & Luke, 2010) Cell phone only households: Unrelated adults Non-white Young (<=29) Poor RDD sample frames often do not include cell phones and can lead to bias 17 of 22
Cell Phones, cont Cell phone frames harder to target geographically than landline frame Frame overlap with RDD Cell phone surveys expensive and have low rates of participation Public Opinion Quarterly, 2007 Special Issue, Vol. 71, Num. 5 18 of 22
Address Based Sampling Subject of many papers at 2010 AAPOR Sampling addresses from a near universal listing of residential mail delivery locations (Michael Link) Post-office Delivery Sequence Files (DSF) 19 of 22
Address Based Sampling Advantages Can be matched to name (85%) and listed telephone numbers (65%) Can be used for multiple modes of administration Includes non-telephone households and cell-only households More efficient than traditional blocklisting 20 of 22
Address Based Sampling Disadvantages Incomplete in rural areas (although improving with 9-1-1 address conversion) Difficulties with multidrop addresses Incomplete coverage for mail only or telephone only administration Best when used as part of multi-mode administration 21 of 22
Before taking questions Slides available at www.srl.uic.edu; click on Seminar Series Next seminar: Introduction to Web Surveys, Thursday, Oct. 14 Evaluation 22 of 22