Capture-recapture studies

Similar documents
A Guide to Linked Mortality Data from Hospital Episode Statistics and the Office for National Statistics

Data Dictionary: HES-ONS linked mortality data

NILS-RSU Introductory Information

Inventory studies to assess TB underreporting: progress to date and next steps

The SCOTTISH LONGITUDINAL STUDY (SLS)

The ONS Longitudinal Study

3. Data and sampling. Plan for today

Presented by Doris Ma Fat on behalf of the. Department of Health Statistics and Information Systems World Health Organization, Geneva

Health Economic implications of disease related malnutrition - can we afford not to get it right?

Saint Lucia Country Presentation

Botswana - Botswana AIDS Impact Survey III 2008

Some Indicators of Sample Representativeness and Attrition Bias for BHPS and Understanding Society

Response ID ANON-TX5D-M5FX-5

SHTG primary submission process

Doing, supporting and using public health research. The Public Health England strategy for research, development and innovation

How a People Classification Can Add Value to Census Data. Simon Perry

Estimation of the number of Welsh speakers in England

VICTORIAN PANEL STUDY

March 2018 CCG localities profile for Hertfordshire

Generating reliable cause-of-death information within a civil registration and vital statistics system

Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database

Health Record Linkage at Statistics Canada

The pro bono work of solicitors. PC Holder Survey 2015

Overview. Scotland s Census. Development of methods. What did we do about it? QA panels. Quality assurance and dealing with nonresponse

2011 UK Census Coverage Assessment and Adjustment Methodology

Section 2: Preparing the Sample Overview

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

2011 Census quality assurance: The estimation process

SAMPLING. A collection of items from a population which are taken to be representative of the population.

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

Guide on use of population data for health intelligence in Wales

Appendix 6.1 Data Source Described in Detail Vital Records

Technical Papers Number 13 January 1981

It s good to share... Understanding the quality of the 2011 Census in England and Wales

Automated Digitization of Gram Stains. Centralized Reading. Decentralized Assessment. Improved Quality Management.

Math 58. Rumbos Fall Solutions to Exam Give thorough answers to the following questions:

NHS Ipswich and East Suffolk CCG

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

NHS Islington CCG. Interim CCG code. This CCG has 43 practices¹, based on those with a registered population in April 2011.

HEALTH STATUS. Health Status

Workshop on anonymization Berlin, March 19, Basic Knowledge Terms, Definitions and general techniques. Murat Sariyar TMF

ISD Scottish Genetics Genealogy Service

NHS Sutton CCG. Interim CCG code. This CCG has 29 practices¹, based on those with a registered population in April 2011.

NHS West London (K&C & QPP) CCG

Statistics Laboratory 7

Infection Control The Power of Integration

Cluster Assessment Pre visits Community Involvement & Census

Understanding and demonstrating variation through use of national data tools

The ONS Longitudinal Study

LIFE-M. Longitudinal, Intergenerational Family Electronic Microdata

Class 10: Sampling and Surveys (Text: Section 3.2)

Stats: Modeling the World. Chapter 11: Sample Surveys

Ministry of Justice: Call for Evidence on EU Data Protection Proposals

Chapter 12 Summary Sample Surveys

Sample Surveys. Chapter 11

APPENDIX A BRITISH HOUSEHOLD PANEL STUDY

The Savvy Survey #3: Successful Sampling 1

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Under-registration of deaths in Thailand in : results of cross-matching data from two sources

Why Randomize? Jim Berry Cornell University

Is the Dragon Learning to Fly? China s Patent Explosion At Home and Abroad

UNITAID s approach to funding innovations in TB diagnosis and treatment Robert Matiru & Janet Ginnard, UNITAID Geneva, 29 April 2015

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

Drafted by Anne Laurence 9 Dec 2013

Enhanced reporting of deaths among Aboriginal and Torres Strait Islander peoples using linked administrative health datasets

National capacity in CRVS 2 nd workshop Session 5 Cause of Death (CoD) Workshop for national CRVS focal points 6-10 March 2017

Cancer Genetics Patient Information

Priorities for medical research in the UK

CCG IAF Methodology Manual

CCG Assurance and the Balanced Scorecard Balanced Scorecard An overview of the tool, and its role in CCG assurance. Khadir Meer Richard Wells

Preserving privacy in record linkage of anonymised administrative and survey data

Off label use Bedaquilline beyond 24 weeks Lorenzo Guglielmetti Bligny Hospital, France

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Justice Select Committee: Inquiry on EU Data Protection Framework Proposals

Probability Sampling - A Guideline for Quantitative Health Care Research

Cross-border Flow of Health Information: is Privacy by Design sufficient to obtain complete and accurate data for Public Health in Europe?

Corporate slide master

UNIT 8 SAMPLE SURVEYS

Department of Economic and Social Affairs 20 June 2011 United Nations Statistics Division

Full file at

Cancer Genetics Patient Information

Objectives. Module 6: Sampling

Fundamentals of Statistical Monitoring: The Good, Bad, & Ugly in Biosurveillance

APPENDIX A UNDERSTANDING SOCIETY: THE UK HOUSEHOLD LONGITUDINAL STUDY (UKHLS)

Twenty-Thirty Health care Scenarios - exploring potential changes in health care in England over the next 20 years

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Project Barn Owl. Title Project Barn Owl

Evaluation commissioner:

Longitudinal data in the UK Censuses

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT)

THE SCOTTISH LONGITUDINAL STUDY Tracing rates and sample quality for the 1991 Census SLS sample

Data Sources & Limitations

Proposed Data Collection Submitted for Public Comment and. AGENCY: Centers for Disease Control and Prevention (CDC),

9 October Opportunities to Promote Data Sharing UCL and the YODA Project. Emma White. Associate Director

Appendicitis a common disease

Enfield CCG. CCG 360 o stakeholder survey 2015 Main report. Version 1 Internal Use Only Version 1 Internal Use Only

Oxfordshire CCG. CCG 360 o stakeholder survey 2015 Main report. Version 1 Internal Use Only Version 1 Internal Use Only

Southern Derbyshire CCG. CCG 360 o stakeholder survey 2015 Main report. Version 1 Internal Use Only Version 1 Internal Use Only

South Devon and Torbay CCG. CCG 360 o stakeholder survey 2015 Main report Version 1 Internal Use Only

Transcription:

Capture-recapture studies Laura Anderson Centre for Infections Health Protection Agency UK

Reiterating underlying assumptions 1) No misclassification of records (perfect record linkage) 2) Closed population (no immigration or emigration in time period studied) 3) Homogenous population (same chance of being observed and re-observed) 4) Independent registers (probability of being on one is not affected by being/not being on the other).

Efforts can be made to reduce violation of assumptions Complete data Good personal identifiers Allows linkage of patients from different registers Short time period to collect data Stratification Covariates in model. Adding an interaction Ensures there is little change in the population Dealing with heterogeneity Identifying and controlling interdependency

Describe two capture-recapture studies: 1) Egypt study middle income setting 2) England study resource rich setting Why were these studies successful? What were the limitations? Can they be considered representative of the population of the country as a whole?

Egypt Study Bassili et al., 2009 Objectives: To estimate case detection rates in Egypt in 2007 using record-linkage and capturerecapture and Explore this method as a tool for periodic evaluation of the WHO tuberculosis control strategy in a resource-limited setting.

Methodology 4 randomly selected governorates: Cairo, Dakahlia, Fayum and Matrouh = 11.1% of national population. Longitudinal surveillance Oct-Dec 2007 within public and private non-ntp sector plus data from NTP. Case definition patients registered with the NTP (new and retreatment cases) or confirmed on NTP criteria for non-ntp providers. Three source log-linear capture-recapture on all and smear positive cases.

Data sources and linkage 1) NTP register 2) Public non-ntp 3) Private non-ntp NTP identical register introduced to non-ntp providers (demographic and clinical data) plus identical NTP lab register. Record linkage by name Misclassification was corrected by examining NTP register Jan 2007-April 2007.

Results NTP n = 364 247 76 5 Public non-ntp n =82 40 1 0 41 Private non-ntp n =82 Total number of tuberculosis cases = 410

Capture-recapture analysis Log-linear model included main effects plus the following expected interactions: positive NTP*Public non-ntp negative Public non-ntp*private non-ntp CDR of NTP surveillance = 55% (95% ci 46% - 68%) Completeness of case ascertainment = 62% (95% CI 52%-77%). Sputum smear-positive was higher at 66% (55%-75%) and 72% (60%-82%), respectively.

Strengths and limitations Case definition identical for all registers. Perfect record-linkage hierarchic combination of three Arabic names used to minimise overmatching (routine in Egypt). Improved completeness through weekly visits (minimise undermatching) and follow up of NTP register. Data collected in a short period of time (but small numbers which may limit the overlap of data collection). Two source interdependency controlled by log linear model using interaction terms. Three interdependency assumed to be negligible.

Exclusion of false positive cases outside the study period or those not having TB. Longitudinal surveillance = more expensive and longer than expected but 1. was required in a resource poor setting to obtain a more accurate estimate. 2. allowed control of possible interdependencies which cannot be carried out in two-source studies). Results available only months after a short surveillance period (seasonal variation?) Cluster random sampling = good representation of the whole population.

Objectives: England study (van Hest et al., 2008) Estimate annual incidence of TB in England Assess the completeness of reporting between 1999-2002

Data Sources HOSPITAL EPISODE STATISTICS cases admitted to NHS hospitals with 1st or 2nd hospital discharge code as TB (International Classification of Diseases-10) ENHANCED TUBERCULOSIS SURVEILLANCE Notification system which collects additions information on demographic and clinical characteristics. Includes both culture confirmed and clinically diagnosed cases of TB. MycobNet Culture confirmed cases of MTB complex from reference laboratories

Time period Interval of more than 1 year between entries in data sources were considered to be separate episodes. Record linkage Duplicates removed. ETS and MycobNet routinely linked then match HES. Matched on core identifiers: age, sex, d.o.b., postcode, name using soundex (not available for HES). Sophisticated software using a points system (human element required)

False positive cases Removed from ETS using Treatment Outcome Monitoring data. Matching of HES to mycobacteria other than tuberculosis (MOTT) database. Excess of unlinked HES cases removed using the Bernoulli parameter (probability of being a true TB cases based on the covariates: days of admission, number of admissions in TB episode, ICD code, rank number of TB diagnosis.

Interdependency 3 source log linear model allows interactions to be examined first and then added to the model as an interaction term. This can be used to correct for interdependency between 2 sources but not if there is dependency between 3 sources.

Results Source % of cases captured ETS 84.1 MycobNet 54.3 HES 41.6 15.9% undernotification to ETS. This improved from 18.8% in 1999 to 13.3% in 2002.

28678 total TB cases observed. 14291 unobserved 28678/42969 = 66.7% complete cases ascertainment. Overall undernotification = 43.8%

Strengths Good completeness of data from cross validation. Excellent record linkage removal of duplicates and false positive cases. 94.9% had an association with a high score suggesting true links. Short time period. Interdependency likely controlled for using log linear model.

Strengths Good database management. Access to data. Expertise in computing, epidemiology and statistics important to work as a team. Data from whole country so representative. (However, TB varies by region of the UK and therefore perhaps not representative of Scotland, Wales or Northern Ireland).

Limitations Case definition varies for different data sources and therefore there are variations in specificity. Still trying to capture all cases in ETS and MycobNet. Cases excluded due to no d.o.b therefore could be more complete. Homogenous population likely violated (age, site of disease). Couldn t stratify due to incompleteness of identifiers from all sources.

Conclusion ALWAYS place a capture-recapture study in the context of limitations of the study. Important to estimate undetected burden in the country HOWEVER May be a more useful to use as a tool to evaluate a surveillance system- e.g England study

Usually all underlying assumptions are violated and therefore the study is only as good as the data sources.

Thank you!