Sampling Subpopulations in Multi-Stage Surveys

Similar documents
Sampling Subpopulations

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61

Section 2: Preparing the Sample Overview

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

6 Sampling. 6.2 Target population and sampling frame. See ECB (2013a), p. 80f. MONETARY POLICY & THE ECONOMY Q2/16 ADDENDUM 65

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

Sierra Leone - Multiple Indicator Cluster Survey 2017

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Guyana - Multiple Indicator Cluster Survey 2014

Lao PDR - Multiple Indicator Cluster Survey 2006

Sampling Designs and Sampling Procedures

SAMPLE DESIGN A.1 OBJECTIVES OF THE SAMPLE DESIGN A.2 SAMPLE FRAME A.3 STRATIFICATION

Polls, such as this last example are known as sample surveys.

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Nigeria - Multiple Indicator Cluster Survey

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

An Introduction to ACS Statistical Methods and Lessons Learned

Sample size, sample weights in household surveys

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

Chapter 3 Monday, May 17th

Sample Surveys. Chapter 11

Urban and rural migration

Chapter 12 Summary Sample Surveys

Stats: Modeling the World. Chapter 11: Sample Surveys

Barbados - Multiple Indicator Cluster Survey 2012

The main focus of the survey is to measure income, unemployment, and poverty.

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

AmericasBarometer, 2016/17

Turkmenistan - Multiple Indicator Cluster Survey

Zambia - Demographic and Health Survey 2007

Some Indicators of Sample Representativeness and Attrition Bias for BHPS and Understanding Society

PMA2020 Household and Female Survey Sampling Strategy in Nigeria

2011 National Household Survey (NHS): design and quality

Response: ABS s comments on Estimating Indigenous life expectancy: pitfalls with consequences

A Guide to Sampling for Community Health Assessments and Other Projects

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Chapter 12: Sampling

Botswana - Botswana AIDS Impact Survey III 2008

Methodology Statement: 2011 Australian Census Demographic Variables

Comparing Generalized Variance Functions to Direct Variance Estimation for the National Crime Victimization Survey

ILO-IPEC Interactive Sampling Tools No. 5. Listing the sample Primary Sampling Units (PSUs)

Estimating Population Totals using Imperfect Register Data and a Survey Subject to Nonignorable. Dr. James Chipperfield

2011 UK Census Coverage Assessment and Adjustment Methodology

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233

APPENDIX A UNDERSTANDING SOCIETY: THE UK HOUSEHOLD LONGITUDINAL STUDY (UKHLS)

AP Statistics S A M P L I N G C H A P 11

Working Paper n. 79, January 2009

Class 10: Sampling and Surveys (Text: Section 3.2)

Comparing the Quality of 2010 Census Proxy Responses with Administrative Records

FINANCIAL LITERACY SURVEY IN BOSNIA AND HERZEGOVINA 2011

National Population Estimates: March 2009 quarter

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

2020 Population and Housing Census Planning Perspective and challenges for data collection

Saint Lucia Country Presentation

Gathering information about an entire population often costs too much or is virtually impossible.

Session 12. Quality assessment and assurance in the civil registration and vital statistics system

Overview of the Course Population Size

Italian Americans by the Numbers: Definitions, Methods & Raw Data

Tonga - National Population and Housing Census 2011

Estimation Methodology and General Results for the Census 2000 A.C.E. Revision II Richard Griffin U.S. Census Bureau, Washington, DC 20233

Section 6.4. Sampling Distributions and Estimators

Sample Surveys. Sample Surveys. Al Nosedal. University of Toronto. Summer 2017

Maintaining knowledge of the New Zealand Census *

National Population Estimates: June 2011 quarter

Name Position Telephone First contact. [redacted under

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Other Effective Sampling Methods

Recent changes to the Indigenous population geography of Australia: evidence from the 2016 Census

2010 Census Coverage Measurement - Initial Results of Net Error Empirical Research using Logistic Regression

Note: Some squares have continued to be monitored each year since the 2013 survey.

European Social Survey ESS 2010 Documentation of the Spanish sampling procedure

Data sources data processing

2012 AMERICAN COMMUNITY SURVEY RESEARCH AND EVALUATION REPORT MEMORANDUM SERIES #ACS12-RER-03

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Chapter 4: Sampling Design 1

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Crop area estimates in the EU. The use of area frame surveys and remote sensing

Using Administrative Records for Imputation in the Decennial Census 1

K.R.N.SHONIWA Director of the Production Division Zimbabwe National Statistics Agency

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Namibia - Demographic and Health Survey

Simulated Statistics for the Proposed By-Division Design In the Consumer Price Index October 2014

Aboriginal Demographics. Planning, Research and Statistics Branch

Sampling. I Oct 2008

1996 CENSUS: ABORIGINAL DATA 2 HIGHLIGHTS

SESSION 11. QUALITY ASSESSMENT AND ASSURANCE IN THE CIVIL REGISTRATION

DISTRIBUTION AND BACKHAUL

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Malawi - MDG Endline Survey

Virginia Employment Commission

Methods and Techniques Used for Statistical Investigation

Jamaica - Multiple Indicator Cluster Survey 2011

AUSTRALIAN DIGITAL INCLUSION INDEX

Montenegro - Multiple Indicator Cluster Survey Roma Settlements

The challenges of sampling in Africa

Indonesia - Demographic and Health Survey 2007

Virginia Employment Commission

Transcription:

Sampling Subpopulations in Multi-Stage Surveys Robert Clark, Angela Forbes, Robert Templeton This research was funded by the Statistics NZ Official Statistics Research Fund 2007/2008, and builds on the NZ Health Survey 2006/2007 sample design conducted for the NZ Ministry of Health.

Outline Surveying Rare Populations Snowball Sampling and Intercept Point Surveys Screening: Proxy screening of households Accuracy of proxy screening Disproportionate Sampling Optimal one-stage and two-stage allocations Intercensal mobility Dual Frame using the Maori Electoral Roll ABS Findings on Sampling Indigenous Australians Conclusions 2/36

Surveying Subpopulations Group of interest is a relatively small subset of the population. No reliable list of the subpopulation. Common problems: not highly geographically clustered; over-surveyed? mobile population frequent identification errors / variability 3/36

Example: Maori Population Maori comprise about 12% of the adult population 60% of Maori live in Meshblocks (primary sampling units containing about 50 dwellings) where the proportion of Maori is 20% or less. New Zealand Health Survey 07-08: equal probability would give approx 1500 Maori in sample, more like 3000 are needed best possible outcome for Maori sample Disproportionate allocation according to MB density simple random sampling (SRS) + 15.9% 4/36

Distribution of Proportion of Maori in Meshblocks Number of Maori 0 10000 20000 30000 40000 50000 60000 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Maori in MB 5/36

Snowball Sampling Select sample of people; Ask subpopulation members to identify others among their acquaintance; Advantage: don t need to contact as many people to achieve the same number of subpopulation members in sample. Disadvantages: Can be biased towards people with more friends? unbiased estimation possible provided that everyone is linked to others; Subpopulation members need to know each other; Image problem for government? 6/36

Intercept Point Survey Example: sample homeless population by selecting individuals visiting selected soup kitchens ( aggregation points ) at selected times. At each location, individuals are asked how often they visit this and other aggregation points. Very cost-efficient, but biased, possibly extremely so. 7/36

McKenzie and Mistaien, Surveying Migrant Populations: A Comparison of Census-Based, Snowball and Intercept Point Surveys (2008; Journal of the Royal Statistical Society Series A, cond.accepted) Group of interest: Japanese-Brazilian families (about 0.9% of population) Snowball Survey: poor response rate; most respondents did not want to provide referrals Snowball and intercept point surveys selected individuals more tied to the Nikkei community. Intercept point can be useful for exploratory investigation but snowball not much cheaper than probability sampling. 8/36

Screening Not so much a method as the absence of a method. Select a large sample of people and identify whether they belong to the subpopulation. Conduct the survey on all identified members. If the initial identification is subject to error, take a subsample of the (apparent) non-members. Important to make screening as cheap as possible per household or person! 10/36

two-phase screening: Use a relatively cheap method of screening subject to error; Select all those passing the screen, and a subsample of others Kalton and Anderson(1986), quoting Deming(1977): initial screen needs to be much cheaper than the second phase costs (6:1 or better) and screening needs to be quite accurate (at least 75% of the subpopulation classified to stratum a). 11/36

Proxy Screening A number of NZ surveys, including the NZ Health Survey, have improved the efficiency of screening by: Each PSU has a main sample and an oversample. Collect household information from any contacted adult in selected households, including ethnicity and age; In the main sample, one adult is selected at random. In the oversample, one (apparently) eligible adult is selected at random 12/36

Incidentally, this creates a challenge in weighting: Can t calculate probabilities of selection for the main sample unless screening tool is applied Wells (ANZJS 1998) has an alternative, approximately unbiased weighting method 13/36

14/36

Misclassification apparently not due to the use of proxy reporting, as errors are about the same for single-person and multi-person households. The main misclassification is that about 20% of Maori are missed. (Deming, 1977, recommended 25% or less). The use of proxy reporting apparently increases effective sample size of subpopulation by around 3-4% for a hypothetical future design, for fixed cost. If there were no errors in the screener, gains of 30% in effective sample size would be made. 15/36

Could we ever just omit the main sample?

K&A(1986): Optimal One-Stage Allocation (Subpopulation Mean) Let N k be population in stratum k Let ϕ k be proportion of stratum k who are in subpopulation Let π k be probability of selection for people in stratum k. Then: E[ n( subpop)] = k π k N k ϕ k 18/36

19/36 BUT, there is a penalty from using unequal π k. This leads to the variance being multiplied by: ( ) 2 1 1 1 / : 1 + = k k k k k k k k k k k i N N N subpop sample i RV deff ϕ π ϕ π ϕ π

Cost = C 1 n + C 2 n sub where C 1 =cost per screen, C 2 =cost per interview Variance proportional to n -1 sub / deff Minimize variance for fixed cost: π k ϕ k ( C C ) / + ϕ 1 2 k 20/36

two extremes C1=0: no screening cost π k = constant C2>>C1: interview much shorter than screen (would not occur in reality, but useful to give upper bound) π ϕ k k 21/36

Alternative Disproportionate Sampling Regimes for NZ I will compare results from setting π k proportional to different powers of ϕ k. all designs equivalent cost assuming C 1 /C 2 =0.4. Poisson sampling assumed. 22/36

π k prop to: # screened Sample size (eligible) Deff Effective sample size (eligible) Constant 14,514 1,695 1.00 1,695 Sqrt(ϕ k ) 13,187 2,225 1.19 1,867 ( C C ) ϕ / + ϕ k 1 2 k 13,566 2,073 1.09 1,895 ϕ k 11,848 2,761 2.00 1,383 ϕ k 2 9,710 3,616 13.78 262

Optimal Two-Stage Allocation Select a sample of primary sampling units with some probabilities; Select a sample of people from PSUs and screen them; Select a subsample of eligibles and a subsample of ineligibles. Cost = C1 #PSUs + C2 #approached + C3 #interviewed Trade-off between cost and variance

If screen perfectly accurate, and subpopulation means are the only objective, then: Select PSUs with probability proportional to density times population; Sampling fraction within PSU for screening proportional to 1 / + ( ) C C ϕ ϕg 2 3 g i.e. over-target high-concentration PSUs, but then under-sample within them! 25/36

example PSUs 1 and 2 each contain 40 people; PSU 1: 6% Maori; PSU 2: 24% Maori; C1=C2=0.4, C3=1, rho=0.02 We give PSU 1 a probability of selection of 1/20 and approach 27 people in this PSU. We would then give PSU 2 a probability of 1/5, and approach only 11 of the people in the PSU! 26/36

Intercensal Mobility The optimal designs assume that the concentration of subpopulation members is known exactly for every PSU. In practice, out of date census data is used Designs less efficient than they appear; A less targeted design would be appropriate: use E[density census data] rather then census-density. Over 50% of New Zealanders change addresses over a five year period. 27/36

Correlations between 01 and 06 densities: Meshblocks: 0.911 PSUs: 0.939 Territorial Authority: 0.997 For small MB counts, there is more uncertainty than suggested by the correlation. For example, for MBs where there were no Maori in 2001, 56% had one or more Maori in 2006, with a median of 1 and quartiles of 0, 1, 3. 28/36

Comparison of Designs based on 2001 Census Data Cost Fixed at 12500, C1=2, C2=0.3, C3=1 rho=0.05 Design SE(%) in 2001 SE(%) in 2006 (simulated) Undercoverage in 2006 (%) (simulated) Using 2001 MB densities unadjusted 1.022 1.046 1.77 Assume >=1 Maori per MB 1.046 1.087 0.00 Shrinkage estimate of MB density 1.040 1.091 0.00 Shrinkage estimates of MB density and total population 1.042 1.058 0.00

Dual Frame using Maori Electoral Roll? Available addresses from the NZ Health Survey sample were matched to the Maori electoral roll Thus we had a sample of addresses, and for each address: Did a Maori adult live at the address (Y/N) (as measured by NZHS) Did a Maori adult live at the address according to the Maori electoral roll? 30/36

Results: In urban areas, approximately 85% of Maori in the matched sample lived in an address found on the electoral roll. Of addresses on the roll, 77% would be found to have a Maori resident by the survey. Results were less good for rural areas, partly due to more ambiguous addresses. 31/36

Sampling the Australian Indigenous Population: Some ABS Findings From Working Paper, Sample Design Issues for National Surveys of Aboriginal and Torres St Islander Populations (Alistair Rogers and Geoffrey Brent), www.abs.gov.au Indigenous Australians about 2.3% of the total population of Australia (less at household level). 24% live in remote areas (vs 3% general population). 32/36

At regional levels, many indigenous populations can be summarised as either geographically clustered and relatively inaccessible, or relatively accessible but geographically diverse. 33/36

Many split-meshblocks (SMBs) had zero Indigenous population in census. Some options: 1. Exclude them. Leads to unacceptable undercoverage due to intercensal changes; 2. Give them a reduced probability of selection; 3. Make use of SMB and CD census numbers, to exclude some size-0 SMBs, such that a conservative estimate of undercoverage was less than 5% in each region. This led to very substantial savings (>30% reduction in screening in some areas). A combination of (2) and (3) was used. 34/36

Conclusions The main thing is to avoid over-targeting. Multiplicity sampling and Intercept Point surveys are tempting but generally only good for indicative information. A rough cost-variance approach can lead to improved efficiency of the order for Maori sampling. New two-stage allocations can yield modest further gains. Proxy screening gives some gains but under-identification can quickly degrade these. Intercensal mobility needs to be considered Maori electoral roll shows promise. For rarer populations such as Indigenous Australians, an ABS study suggests a combination of (roughly) optimal allocation and limited undercoverage looks promising. 35/36

www.cssm.uow.edu.au www.uow.edu.au/~rclark/talks.html 36/36