How to conduct a network scale-up survey

Similar documents
American Community Survey 5-Year Estimates

American Community Survey 5-Year Estimates

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

1 NOTE: This paper reports the results of research and analysis

The main focus of the survey is to measure income, unemployment, and poverty.

Austria Documentation

DATA VALIDATION-I Evaluation of editing and imputation

Namibia - Demographic and Health Survey

Indonesia - Demographic and Health Survey 2007

Sierra Leone - Multiple Indicator Cluster Survey 2017

Follow your family using census records

Survey of Massachusetts Congressional District #4 Methodology Report

Guyana - Multiple Indicator Cluster Survey 2014

PROBABILITY-BASED SAMPLING USING Split-Frames with Listed Households

Zambia - Demographic and Health Survey 2007

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

Methodology Statement: 2011 Australian Census Demographic Variables

SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES American Community Survey 5-Year Estimates

Measuring Multiple-Race Births in the United States

The Savvy Survey #3: Successful Sampling 1

Botswana - Botswana AIDS Impact Survey III 2008

Italian Americans by the Numbers: Definitions, Methods & Raw Data

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Understanding and Using the U.S. Census Bureau s American Community Survey

2007 Census of Agriculture Non-Response Methodology

Section 2: Preparing the Sample Overview

Understanding the Census A Hands-On Training Workshop

Chapter 3 Monday, May 17th

FINANCIAL PROTECTION Not-for-Profit and For-Profit Cemeteries Survey 2000

Chapter 19. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Manifold s Methodology for Updating Population Estimates and Projections

3. Data and sampling. Plan for today

NATIONAL SOCIO- ECONOMIC SURVEY (SUSENAS) 2001 MANUAL HEAD OF PROVINCIAL, REGENCY/ MUNICIPALITY AND CORE SUPERVISOR/ EDITOR

Supplement No. 7 published with Gazette No. 18 dated 30 August, THE STATISTICS LAW (1996 REVISION) THE CENSUS (CAYMAN ISLANDS) ORDER, 2010

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA

Turkmenistan - Multiple Indicator Cluster Survey

Review Questions on Ch4 and Ch5

Methodology Marquette Law School Poll February 25-March 1, 2018

Lao PDR - Multiple Indicator Cluster Survey 2006

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

; ECONOMIC AND SOCIAL COUNCIL

Barbados - Multiple Indicator Cluster Survey 2012

The Internet Response Method: Impact on the Canadian Census of Population data

Demographic and Social Statistics in the United Nations Demographic Yearbook*

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

Proportions. Chapter 19. Inference about a Proportion Simple Conditions. Inference about a Proportion Sampling Distribution

Objectives. Module 6: Sampling

1999 AARP Funeral and Burial Planners Survey. Summary Report

Class 10: Sampling and Surveys (Text: Section 3.2)

Socio-Economic Status and Names: Relationships in 1880 Male Census Data

Neighbourhood Profiles Census and National Household Survey

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

Methodology Marquette Law School Poll June 22-25, 2017

Neighbourhood Profiles Census and National Household Survey

MINISTERIAL DIRECTIVE TO SERVICE MANAGERS UNDER S OF THE HOUSING SERVICES ACT, 2011

LOGO GENERAL STATISTICS OFFICE OF VIETNAM

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

The ONS Longitudinal Study

Methodology Marquette Law School Poll October 26-31, 2016

Evaluation and analysis of socioeconomic data collected from censuses. United Nations Statistics Division

Chapter 1: Economic and Social Indicators Comparison of BRICS Countries Chapter 2: General Chapter 3: Population

Year Census, Supas, Susenas CPS and DHS pre-2000 DHS Retro DHS 2007 Retro

Moldova - Multiple Indicator Cluster Survey 2012

United Nations Demographic Yearbook review

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233

National Longitudinal Study of Adolescent Health. Public Use Contextual Database. Waves I and II. John O.G. Billy Audra T. Wenzlow William R.

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

Lesson Sampling Distribution of Differences of Two Proportions

Dallas Regional Office US Census Bureau

CENSUS DATA COLLECTION IN MALTA

The American Community Survey and the 2010 Census

Access to Contraceptive Services in Florida

Neighbourhood Profiles Census

Filling out a form quiz

Randomized Evaluations in Practice: Opportunities and Challenges. Kyle Murphy Policy Manager, J-PAL January 30 th, 2017

ESP 171 Urban and Regional Planning. Demographic Report. Due Tuesday, 5/10 at noon

An Overview of the American Community Survey

Montenegro - Multiple Indicator Cluster Survey Roma Settlements

Methodology Marquette Law School Poll April 3-7, 2018

0-4 years: 8% 7% 5-14 years: 13% 12% years: 6% 6% years: 65% 66% 65+ years: 8% 10%

Digit preference in Nigerian censuses data

K.R.N.SHONIWA Director of the Production Division Zimbabwe National Statistics Agency

Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database

Coverage evaluation of South Africa s last census

2016 Election Impact on Cherokee County Voter Registration

Saint Lucia Country Presentation

Liberia - Demographic and Health Survey 2007

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Community Radio. National Listener Survey Wave #1 FACT SHEET ACT. July Prepared for:

The 1999 Population Census in the Republic of Kazakhstan CENSUS QUESTIONNAIRE 3C

Quick Reference Guide

Nigeria - Multiple Indicator Cluster Survey

STAT 100 Fall 2014 Midterm 1 VERSION B

RECOMMENDED CITATION: Pew Research Center, March 2014, Hillary Clinton s Strengths: Record at State, Toughness, Honesty

Community Radio. National Listener Survey Wave #1 FACT SHEET NON-METRO QLD. July Prepared for:

Sample Surveys. Chapter 11

The Accuracy and Coverage of Internet based Data collection for Korea Population and Housing Census

Transcription:

How to conduct a network scale-up survey Christopher McCarty and H. Russell Bernard University of Florida February, 2009 2009 Christopher McCarty and H. Russell Bernard Suggested citation: C. McCarty and H. R. Bernard 2009. How to conduct a network scale-up survey. +URL where found.

Network scale-up begins like most surveys Define respondent population Choose sample frame Choose survey mode Choose sample size Design questionnaire (This is the part that s s different)

Selecting respondent population Respondent population is not the same as the population to be estimated (target population) U.S. respondents to estimate homeless population Urban population to estimate heroin users You must know the size of the respondent population Do transmission and barrier errors suggest using a respondent population with more ties to target population? This opportunity to do this research in multiple countries could help solve this problem

Choose sample frame The sample frame represents the respondent population For our work we used random digit dial telephone numbers For face-to to-face a general population survey may rely on census or voter registration data

Choosing mode There are five survey modes Face-to to-face Telephone (this is what we used) Mail Drop and collect Web There is a large literature on mode effects in surveys For the populations of interest to UNAIDS a face- to-face or mixed mode makes sense

Choose sample size As with any survey, the sample size should be based on expected margins of error For this survey we have margins of error associated with network size Although estimates of network size are remarkably reliable, they have large standard deviations Our data suggest that a survey of 400 respondents would generate a margin of error of ±26 alters A survey of 1,000 would generate a margin of error of ±16 alters Keep in mind these are based on variance for U.S. respondents

Design questionnaire Network scale-up questionnaire has three parts 1. Demographics used to estimate bias 2. Question to estimate the number of alters respondents knows in the target population 3. Questions to estimate network size (c) Steps 2 and 3 require a boundary definition of who is counted as a network alter

Alter boundary Definition of who is an alter can have enormous effects on the estimate Defining the alter boundary as 12 months will generate different network sizes than a boundary of two years Our definition: You know them and they know you by sight or by name. You have had some form of contact with them in the past two years and you could contact them if you had to Question: Should respondents be instructed to exclude those met on networking sites such as Facebook?

There are two ways to estimate c Scaling from known populations The summation method

Using known populations Select a set of known populations, the more the better Populations should vary in size and type Limiting the study to populations related to health conditions, although plentiful, may introduce barrier error Using only large populations (such as men or people over age 65) introduces a lot of estimation error Using only small populations introduces error from very few hits Known populations should be within.1% to 4% of population (this may change as we learn more) The demographic characteristics of the known populations should match as closely as possible the demographic characteristics of the population pulation upon which the known estimates are based Populations are often related to transmission and barrier effects In the past we assumed that by using populations of multiple size e and type these effects are cancelled out

Examples of populations we used In the U.S. there are a variety of sources for known populations: The U.S. Statistical Abstract The U.S. Census The FBI Crime Statistics Ideally collection of sub-population data will be recurring so that they can be used in subsequent years It is important that the data all reflect the same year (be aware that some population data lags) Known populations are very susceptible to transmission and barrier error

Relationship between number known and demographic characteristics Population State Sex Race Age Education Marital status Work status Religion Political Party Native Americans Gave birth in past 12 months Adopted a child in past year Widow(er) under 65 years On kidney dialysis Postal worker Commercial pilot Member of Jaycees Diabetic Opened a business in year Have a twin brother or sister Licensed gun dealer Came down with AIDS Males in prison Homicide victim in past year Suicide in past year Died in wreck in past year Women raped in past year Homeless HIV positive

We experimented with names Census provides estimates of both first names and last names We experimented with both types and found problems with each The advantage of names is that they vary in size and are typically ascribed Countries and cultures vary in the way they use names They are prone to barrier error

Relationship between number known and demographic characteristics Population State Sex Race Age Education Marital Work Religion Political Party status status Michael Christina Christopher Jacqueline James Jennifer Anthony Kimberly Robert Stephanie David Nicole

Summation method We can estimate network size (c) directly by asking respondents to tell us how many people they know This is an unreasonable task unless it is broken into reasonable subtasks We use culturally relevant categories of relation types that are mutually exclusive and exhaustive These are small enough that respondents can estimate them reliably

Relation categories we used Immediate family Other birth family Family of spouse or significant other Co-workers People at work but don't work with directly Best friends/confidantes People know through hobbies/recreation People from religious organization People from other organization School relations Neighbors Just friends People known through others Childhood relations People who provide a service Other

Developing a protocol for discovering summation categories We assume that relation categories used to elicit estimates will be culturally relative Different languages will require their own category names The way people maintain people in their mind will almost certainly vary by culture Further research is needed to determine the best protocol for discovering these categories Summation categories must be mutually exclusive, exhaustive and small enough that respondents count rather than estimate

Approaches we are studying Our current categories emerged from a previous study about the ways people know each other This is not ideally suited to this study We are exploring using cultural consensus analysis or personal network structure to quickly develop these categories An empirical approach is to start with very large culturally relevant categories and use alter characteristics to split them when they are too large

Estimates of network size from two methods (scaling from known and summation) are very close Scaling from known populations 290.8 (SD 264.4) Summation method 290.7 (SD 258.8) We checked in multiple ways to see whether this was an artifact of the method It wasn t

Advantages of the summation method It is quicker, taking about half the time or less than estimating from known sub- populations It should not be subject to transmission or barrier error for estimates of network size It does not require finding known populations, which could be a problem in some countries

Disadvantages of summation method It cannot be verified statistically It may be easy for respondents to double count network alters as they are multiplex relations (such as co-worker and social contact) Network size calculated from scaling known populations can be checked by back-estimating each known with the other knowns

Modeling issues At this point in our work we are convinced that our estimates of network size are relatively reliable, but not absolutely reliable If my network is 300 then I am confident it is half as large as that of someone with a network of size 600 I am not confident that the network size is actually 300 This compromises our ability to estimate the absolute size of a population Again, the opportunity to replicate this method may yield solutions

How to generate scale-up estimates There are two steps Estimate network size c Use c with respondents estimates of unknown populations to scale-up to the size of the unknown in the population We will look at these steps separately

Step 1: Estimating c using summation method With the summation method you add up the estimates form each relation category to get a c value for each respondent The c used in the formula will be the average of all those c values from each respondent

Step 1: Estimating c using known populations This procedure requires three parameters t=the size of the population to which you are scaling up (this is the same for each respondent) e=the sum of all the known populations you are using in the survey (this is the same for each respondent) m=the sum of all the reported known subpopulation sizes for each respondent c for each respondent is (m*t)/e The c used in the formula will be the average of all those c values from each respondent

Step 2: Applying c This step also requires three parameters t=the size of the population to which you are scaling up (this is the same for each respondent) c=the average c value, either from the scale-up or the summation method m=the average of all respondents estimates of the number of people they know in the unknown subpopulation The formula to estimate the size of the unknown subpopulation e=(m/c m/c)*t