Introduction Census: Gathering information about every individual in a population Sample: Selection of a small subset of a population INTRODUCTION TO SURVEY SAMPLING October 28, 2015 Karen Foote Retzer www.srl.uic.edu 4 General information Please hold questions until the end of the presentation Slides available at http://www.srl.uic.edu/seminars/semnotes.htm Please raise your hand so that I can see that you can hear me Why sample instead of taking a census? Less expensive Less time-consuming More accurate Samples can lead to statistical inference about the entire population 2 5 Outline Introduction Target Populations Sample Frames Sample Designs Determining Sample Sizes Modes of Data Collection Questions Probability vs. non-probability Probability Sample Generalize to the entire population Unbiased results Known, non-zero probability of selection Non-probability Sample Exploratory research Convenience Probability of selection is unknown 3 6
Target population Definition: The population to which we want to generalize our findings Unit of analysis: Individual/Household/City Geography: State of Illinois/Champaign County/City of Urbana Age/Gender Other variables Target populations, sample frames, and coverage Example 1: Population: Adults in Champaign County, IL Frames: List of landline numbers, list of census blocks, list of addresses Example 2: Population: Youth age 5 to 18 in Cook County Frame: List of schools Example 3: Population: Adults age 18-34 in United States Frame:?? Coverage: What part of the target population is not included in these sample frames? 7 10 Examples of target populations Sample designs for probability samples Population of adults in Champaign County Faculty, staff, or students at the University of Illinois Youth age 5 to 18 in Champaign County Simple random samples Systematic samples Stratified samples Cluster Multi-stage 8 11 Sampling frame A complete list of all units, at the first stage of sampling, from which a sample is drawn For example, lists of... addresses landline phone numbers in specific area codes blocks or census tracts in specified geographic areas members of professional organization schools cell phone numbers Simple random sampling Definition: Every element has the same probability of selection and every combination of elements has the same probability of selection. Probability of selection: n/n, where n = sample size; N = population size Use Random Number tables, software packages to generate random numbers Most precision estimates assume SRS 9 12
Systematic sampling Definition: Every element has the same probability of selection, but not every combination can be selected. Use when drawing SRS is difficult List of elements is long & not computerized Procedure Determine population size N and sample size n Calculate sampling interval (N/n) Pick random start between 1 & sampling interval Take every ith case Problem of periodicity 13 Cluster sampling Typically used in face-to-face surveys Population divided into clusters Schools (earlier example) Blocks Reasons for cluster sampling Reduction in cost No satisfactory sampling frame available 16 Stratified sampling: Proportionate To ensure sample resembles some aspect of population Population is divided into subgroups (strata) Students by year in school Faculty by gender Simple Random Sample (with same probability of selection) taken from each stratum. 14 Determining sample size: SRS Need to consider Precision Variation in subject of interest Formula Sample size n o = CI 2 * (pq) Precision For example: n o = 1.96 2 * (.5 *.5).05 2 Sample size not dependent on population size. 17 Stratified sampling: Disproportionate Major use is comparison of subgroups Population is divided into subgroups (strata) Compare girls & boys who play Little League Compare seniors & freshmen who live in dorms Probability of selection needs to be higher for smaller stratum (girls & seniors) to be able to compare subgroups. Post-stratification weights Sample size: Other issues Finite Population Correction n = n o /(1 + n o /N) Design effects Analysis of subgroups Increase size to accommodate nonresponse Cost 15 18
Modes of data collection Cell phone and landline frames, cont. Face to face Phone Web Mail Cell phone frames harder to target geographically than landline frames Survey researchers are combining landline and cell phone frames 19 22 Target population/frame/mode correspondence Address-based sampling Mode needs to be consistent with information in sample frame Sampling addresses from a near universal listing of residential mail delivery locations Mode needs to be consistent with target population Post Office Delivery Sequence Files (DSF) 20 23 Cell phone and landline frames Increasing proportion of US households are cell phone only Cell phone only households tend to be Unrelated adults Hispanic adults Younger Lower SES Landline sample frames can lead to bias Address-based sampling: advantages Coverage of households is very high Can be matched to name and listed telephone numbers Includes non-telephone households More efficient than traditional block-listing 21 24
Address-based sampling: disadvantages Incomplete in rural areas (although improving with 9-1-1 address conversion) Difficulties with multidrop addresses Questions 25 28 Thank you! Future noontime webinars Introduction to Web Surveys, Wednesday, November 4 Introduction to Questionnaire Design, Tuesday, November 10 Introduction to Survey Data Analysis: Addressing Survey Design and Data Quality, Wednesday, November 18 26 Evaluation 27