It s good to share... Understanding the quality of the 2011 Census in England and Wales SRA Conference, London, December 2012 Adriana Castaldo Andrew Charlesworth
AGENDA Context: 2011 Census quality assurance and the role of administrative data Data to be matched Data matching challenges and solutions Matching methods and interpretation Substantive results so far...
Context Basic Census Methodology 2011 Census Estimates produced using the Census and Census Coverage Survey (CCS) The Census Coverage Survey is a 1 per cent sample of the population Dual System Estimation (DSE) is then used to estimate the total population Our objective was to Quality Assure these estimates using administrative data.
Available Administrative Data NHS Patient Register Higher Education Statistics Agency (HESA) data English and Welsh School Censuses Electoral Registers Valuation Office Agency data
Low level aggregate analysis Approach Comparison of address and person counts at postcode, OA and LSOA level Reveals differences and similarities How can we examine these in more detail? Record level matching Match record level administrative and Census data together Designing and turning matching into an operational process presented a number of challenges...
Record Level Matching Challenges: Limited time available for QA Research questions not known in advance Quality matching is time consuming Not all data available in advance Incomplete/inaccurate matching variables
Record Matching Methodology Design Matching carried out in CCS clusters Areas that would be difficult to enumerate identified in advance and matched first Use of control areas to provide context and evaluation of matching methods Flexible matching architecture and analytic
Record Matching Methodology Matching Standardisation and cleaning Exact matching Resolution of candidate pairs from probabilistic and TFIDF matching by clerical matchers Final clerical search stage using address and
Results Census to Patient Register (PR) matching PR is the widest coverage source Source of potential challenges Therefore the focus of initial matching effort Patient Register Residuals Patient Register Census/CCS matches Census/CCS Residuals
London: Matches and Residuals
Record Matching Initial Results Overall match results show we matched a large number of records (and remember inner London is about as challenging as it gets!) However there are still large numbers of unmatched Patient Registrations Further investigation of the Patient Register residual therefore is required...
Further Matching Further evidence used to assess presence Census Dummy form returns Other administrative sources Matching to records outside the Local Authority Census Associated address records
Interpretation: Who is Actually Present? Non-Usual Residents PR records unmatched to Census respondents and assessed as not present Unaccounted for Not Present PR records unmatched to Census respondents and assessed present PR/ Census confirmed Usual Residents Present
London Churn: Female Outcomes
London Churn: Male Outcomes
High student population: Female Outcomes
High student population: Male Outcomes
For Comparison Female Outcomes in a Control LA
Developing the Analyses Flag 4 s The results we can see are very different for our inner London, university and control Local Authorities Flag 4 Patient Registrations and students are two factors that drive these differences Flag 4s Flag 4s are Patient Registrations that belong to individuals registering for the first time from abroad comparing the proportion of Flag4s in the residual and matched data sets shows they form a larger part of the residual.
Migration LA
LA Summary: Proportion of F4s and Proportion Unresolved, within CCS Postcode Clusters
Developing the Analyses Students Student Local Authorities have a distinct profile One potential cause is student s failure to re-register after leaving university (evidence does suggest however that universities are having some success in making students register on arrival) By comparing the acceptance dates onto the Patient Register of students in communal halls (generally occupied by first year students) we can gauge how many registrations might be list inflation
Female Students Living in Halls in April 2011 by NHS Authority Acceptance Date
Male Students Living in Halls in April 2011 by NHS Authority Acceptance Date
Further Investigations Planned analysis of the PR residuals addresses and households to identify ghost records Longitudinal matching of the 2012 Patient Register to 2011 data to identify registrations that have been cancelled by GP practices in the year following Census Cluster analysis of all E&W LAs to see whether the typology of LAs identified through matching is mirrored in list inflation patterns nationally Multi-level modelling to summarise results, with individual and area level explanatory variables