Can a Statistician Deliver Coherent Statistics? European Conference on Quality in Official Statistics (Q2008), Rome, 8-11 July 2008 Thomas Körner, Federal Statistical Office Germany
The importance of being coherent
What is coherence? The adequacy of statistics to be reliably combined in different ways and for various uses. (ESS Data Quality Glossary 2003) => Statistics referring to identical reference period, target population and concepts should (ideally) be identical Data sources to be coherent Coherence within one statistics (e.g. monthly vs. quarterly) Coherence between surveys / registers Coherence of surveys / registers with National Accounts Dimensions of coherence Level (e.g. of employment) Trend (e.g. yearly change) Strcture (e.g. unemployment by sex)
What is coherence? (2) 40 39 38 Employed Persons in Germany (in million persons) + 0,7% + 2,7% + 2,2% +1,8% +2,2% 37 36 35 34 33 32 31 30 2005 2006 2007 National Accounts Labour Force Survey Telephone Survey
What is coherence? (3) 100% 99% 98% Differences due to other reasons ( incoherence ) 97% 96% Conceptual differences 95% National Accounts (national concept) Labour Force Survey (national concept)
Sources of Incoherence The case of surveys and registers
The working system of surveys and registers Reality working system Specification of population, units, items Construction of the population frame Selection of survey units (if any) Contacting the units Measurement process Data entry, coding, editing etc. Interpretation Specification discrepancy Coverage errors Sampling errors Nonresponse errors Measurement errors Processing errors Statistical measurement Each error type can contribute to incoherence adapted from: Radermacher/Körner 2006
Lacking coherence between two surveys 5 Unemployed according to the Labour Force Concept (in million persons) 4,5 4 3,5 0,68 m -6,9% - 11,8% 0,82 m 3 2,5 2005 2006 Labour Force Survey Telephone Survey Persons in private households at main residence, 15-74 years old
Two different working systems impact on coherence? Labour Force Survey Telephone Survey Specification Largely identical Sampling Frame Area sampling largely based on census 1987 RDD sample (landline network) Sampling unit Household Person (Kish selection grid) Response rate 95% 52% Data collection mode CAPI (and PAPI) CATI Average interview length 30 min / person 4-7 min /person Rate of proxy interviews 27% 0% Calibration marginals Age, sex, region, nationality Age, sex, region, nationality and registered unemployment Quantification of sources of incoherence extremly complex
Reducing incoherence between surveys Use of standard tools and approaches Sampling frame & design Data collection procedures Calibration marginals Experimental research regarding measurement errors Questionnaire design effects Interviewer effects and mode effects Clear communication of the way of data production However: Full standardisation impossible, nor desirable Incoherences help us learn about sources of errors Use of accouting systems for limited number of variables and breakdowns
Lacking coherence between surveys and registers Registered unemployed (in million persons) 5,5 5 4,5 0,47 m - 13,1% 4 0,53 m - 15,9% 3,5 3 2,5 2006 2007 Unemployment Register Labour Force Survey
Differences in the working systems survey vs. register Labour Force Survey Unemployment register Specification Self declared status In the past week, were you registered as unemployed at the employment agency? Status in the records of the employment agency (data generated from the administrative procedure) Sampling Frame 1% area sampling largely Complete enumeration based on census 1987 Sampling unit Household not relevant Response rate 95% n.a. Data collection mode CAPI (and PAPI) Registration (Analysis according to the criteria currently used) Calibration marginals Age, sex, region, nationality No calibration Differences in specification (and methods applied) make coherent results unlikely; no strictly comparable information from both sources
Reducing incoherence between surveys and registers Experimental research regarding measurement errors Reduction measurement error in surveys and registers Close cooperation with the administrations in charge of the registers Interviewer effects and mode effects Improvement of accordance in specification of concepts However: Respondents can only be asked what they know Statistics production is not the priority objective of administrative registers Clear communication of the methods of data production is vital
Sources of Incoherence The case of surveys / registers vs. National Accounts
Basic differences in the working systems Survey / register Reality Specification of population, units, items Construction of the population frame Selection of survey units (if any) Contacting the units Measurement process Data entry, coding, editing etc. Interpretation Accounting systems Reality Specification of concepts Estimation procedures Consistency with NA Statistical measurement Statistical measurement
6,0 5,0 4,0 3,0 2,0 1,0 Statistisches Bundesamt Incoherences between surveys / registers and National Accounts Marginal employees in LFS, Register and NA (in m persons) 0,0 Labour Force Survey Employment register Employment Accounts Estimation for groups not covered by the sources (consistent with NA) Short term employees "1-Euro-Jobs"
Reconciling incoherent results (1) Standardisation of statistics production Standards for sampling frames, weighting and data collection procedures however: full standardisation not possible nor desirable Combination of different data sources Useof thesourcemostsuitablefora givenvariable Matching of data on the micro level Outstanding problems Identical target population coverage is unlikely in most cases How to achieve consistent data sets despite differing working systems? How to close gaps in combined data sets (imputation?)? Legal aspects of data protection
Reconciling incoherent results (2) Accounting systems Full consistency can be achieved for a limited set of variables Methodological background information needed however: information loss compared to microdata set Benchmarking to additional marginals Precondition: reliable marginals are available in the required breakdowns Possible bias in other variables (risk of uncontrollable effects and internal inconsistency) Communication is everything Definition of priority sources for certain topics and variables Transperency regarding methods and reasons for incoherence however: many users do not care about the details of statistical concepts and methods
Should a statistician deliver coherent statistics at all? YES as users will otherwise be confused YES and NO there are strict limits of data reconciliation data availability trade offs interpretability users will have to live with some remaining degree of complexity NO incoherence teaches us a lot about errors in statistical measurement reconciliation should not remain pure cosmetics studying incoherence requires a wider use of experimental studies of all data sources under consideration coherent results are always suspicious
Many Thanks you for your attention! Thomas Körner Federal Statistical Office Germany, Wiesbaden thomas.koerner@destatis.de