INTRODUCTION TO SURVEY SAMPLING October 18, 2012 Linda Owens University of Illinois at Chicago www.srl.uic.edu Census or sample? Census: Gathering information about every individual in a population Sample: Selection of a small subset of a population 1 of 22 2 of 22 Why sample instead of taking a census? Less expensive Less time-consuming More accurate Samples can lead to statistical inference about the entire population Probability Sample Generalize to the entire population Unbiased results Known, non-zero probability of selection Non-probability Sample Exploratory research Convenience Probability of selection is unknown 3 of 22 4 of 22 Target population Examples of target populations Definition: The population to which we want to generalize our findings. Unit of analysis: Individual/Household/City Geography: State of Illinois/Champaign County/City of Urbana Age/Gender Other variables Population of adults (18+) in Champaign County UIUC faculty, staff, students Youth age 5 to 18 in Champaign County Homeless people 5 of 22 6 of 22 1
Sampling frame A complete list of all units, at the first stage of sampling, from which a sample is drawn For example, Lists of addresses Phone numbers in specific area codes Maps of geographic areas List of members of professional organization Cell phone numbers Sampling frames Example 1: Population: Adults (18+) in Champaign County Possible Frame: list of phone numbers, list of block maps, list of addresses Example 2: Population: Females age 40 60 in Chicago Possible Frame: list of phone numbers, list of block maps Example 3: Population: Youth age 5 to 18 in Cook County Possible Frame: List of schools Example 4: Population: Homeless People Possible Frame:?? 7 of 22 8 of 22 Modes of Data Collection Face to face Landline telephone Cellular telephone Web Mode/Frame Correspondence Mode consistent with information in frame Frame for Web survey should contain email addresses Frame information inconsistent with mode of data collection General population survey using Web 9 of 22 10 of 22 11 of 22 Sample designs for probability samples Simple random samples Systematic samples Stratified samples Cluster Multi-stage 12 of 22 Simple random sampling Definition: Every element has the same probability of selection and every combination of elements has the same probability of selection. Probability of selection: n/n, where n = sample size; N = population size Use Random Number tables, software packages to generate random numbers Most precision estimates assume SRS 2
Systematic sampling Definition: Every element has the same probability of selection, but not every combination can be selected. Use when drawing SRS is difficult List of elements is long & not computerized Procedure Determine population size N and sample size n Calculate sampling interval (N/n) Pick random start between 1 & sampling interval Take every ith case Problem of periodicity Stratified sampling: Proportionate To ensure sample resembles some aspect of population Population is divided into subgroups (strata) Students by year in school Faculty by gender Simple Random Sample (with same probability of selection) taken from each stratum. 13 of 22 14 of 22 Stratified sampling: Disproportionate Major use is comparison of subgroups Population is divided into subgroups (strata) Compare girls & boys who play Little League Compare seniors & freshmen who live in dorms Probability of selection needs to be higher for smaller stratum (girls & seniors) to be able to compare subgroups. Post-stratification weights Cluster sampling Typically used in face-to-face surveys Population divided into clusters Schools (earlier example) Blocks Reasons for cluster sampling Reduction in cost No satisfactory sampling frame available 15 of 22 16 of 22 17 of 22 Determining sample size: SRS Need to consider Precision Variation in subject of interest Formula Sample size n o = CI 2 * (pq) Precision For example: n o = 1.96 2 * (.5 *.5).05 2 Sample size not dependent on population size. 18 of 22 Sample size: Other issues Finite Population Correction n = n o /(1 + n o /N) Design effects Analysis of subgroups Increase size to accommodate nonresponse Cost 3
Changes in Field of Survey Research Cellular Phones and Cell Phones 32.3% of US Households are cell phone only (Blumberg & Luke, 2011) Cell phone only households tend to be: Unrelated adults Non-white Young (<=29) Lower SES RDD sample frames tend not to include cell phones and can lead to bias 19 of 22 20 of 22 Cell Phones, cont Cell phone frames harder to target geographically than landline frame Frame overlap with RDD Public Opinion Quarterly, 2007 Special Issue, Vol. 71, Num. 5 Sampling addresses from a near universal listing of residential mail delivery locations (Michael Link) Post-office Delivery Sequence Files (DSF) 21 of 22 22 of 22 Advantages Coverage of target population is very high Can be matched to name (~85%) and listed telephone numbers (~65%) Includes non-telephone households and cell-only households More efficient than traditional blocklisting Disadvantages Incomplete in rural areas (although improving with 9-1-1 address conversion) Difficulties with multidrop addresses 23 of 22 24 of 22 4
Thank You! Evaluations 25 of 22 5