Use of administrative sources and registers in the Finnish EU-SILC survey

Similar documents
Register-based National Accounts

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Using administrative data in production of population statistics; register-based surveys

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

USE OF ADMINISTRATIVE DATA IN POPULATION CENSUSES IN FINLAND. Kaija Ruotsalainen Statistics Finland. TACIS Seminar Paris, 4-6 October 2004

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61

6 Sampling. 6.2 Target population and sampling frame. See ECB (2013a), p. 80f. MONETARY POLICY & THE ECONOMY Q2/16 ADDENDUM 65

Grid. Grid. Grid. Some grids. Grid. Grid. A Grid in Lithuania. BNU 2012, Valmiera Seppo 1

The Finnish Social Statistics System and its Potential

Use of administrative data in statistics Nordic experiences. Kaija Ruotsalainen UN World Data Forum January, Cape Town, South Africa

Vanuatu - Household Income and Expenditure Survey 2010

Can a Statistician Deliver Coherent Statistics?

Use of Registers in the Traditional Censuses and in the 2008 Integrated Census International Conference on Census methods Washington, DC 2014

Outline of the 2011 Economic Census of Cambodia

ESSnet on DATA INTEGRATION

Introduction to the course, lecturers, participants and the European Census 2021

Thailand - The Population and Housing Census of Thailand IPUMS Subset

COUNTRY REPORT: TURKEY

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Session 10: Quality of Register-based Statistics

Register-based National Accounts

Labour force survey in the EU, candidate and EFTA countries

Data sources data processing

National Economic Census 2018: A New Initiative in National Statistical System of Nepal

Data Integration Activities on the Way to the Dutch Virtual Census of 2011

Pacific Training on Sampling Methods for Producing Core Data Items for Agricultural and Rural Statistics

Estimation Methodology and General Results for the Census 2000 A.C.E. Revision II Richard Griffin U.S. Census Bureau, Washington, DC 20233

Labour force survey in the EU, candidate and EFTA countries

Liberia - Household Income and Expenditure Survey 2016

Botswana - Botswana AIDS Impact Survey III 2008

Other Effective Sampling Methods

Sample size, sample weights in household surveys

An Introduction to ACS Statistical Methods and Lessons Learned

Administrative sources and their usage for statistical purposes

Sierra Leone - Multiple Indicator Cluster Survey 2017

Turkmenistan - Multiple Indicator Cluster Survey

Planning for the 2010 Population and Housing Census in Thailand

FINANCIAL LITERACY SURVEY IN BOSNIA AND HERZEGOVINA 2011

Quality assessment in a register-based census administrative versus statistical concepts in the case of households

CENSUS DATA COLLECTION IN MALTA

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

The Dutch Census IPUMS files of 1960, 1971, 2001 and Eric Schulte Nordholt

The main focus of the survey is to measure income, unemployment, and poverty.

Austria Documentation

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

2006 Census Technical Report: Sampling and Weighting

Strategies for the 2010 Population Census of Japan

Country report Germany

The Savvy Survey #3: Successful Sampling 1

Section 2: Preparing the Sample Overview

Estimating the number of rooms and bedrooms in the 2021 Census for England and Wales. An alternative approach using Valuation Office Agency (VOA) data

Comparing the Quality of 2010 Census Proxy Responses with Administrative Records

Benefits of Sample long Form to Enlarge the scope of Census Data Analysis: The Experience Of Bangladesh

AP Statistics S A M P L I N G C H A P 11

Lao PDR - Multiple Indicator Cluster Survey 2006

Using Administrative Records for Imputation in the Decennial Census 1

THE USE OF REGISTERS IN POPULATION, HOUSEHOLDS AND HOUSING CENSUSES IN SLOVENIA

Overview. Scotland s Census. Development of methods. What did we do about it? QA panels. Quality assurance and dealing with nonresponse

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

2010 Census Coverage Measurement - Initial Results of Net Error Empirical Research using Logistic Regression

A Special Case of integrating administrative data and collection data in the context of the 2016 Canadian Census

Stats: Modeling the World. Chapter 11: Sample Surveys

2011 National Household Survey (NHS): design and quality

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Supplementary questionnaire on the 2011 Population and Housing Census FRANCE

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

Regional Course on Integrated Economic Statistics to Support 2008 SNA Implementation

Zambia - Demographic and Health Survey 2007

Presentation of Statistics Denmark. Preben Etwil

Master sampling frames for agricultural, rural and agroenvironmental statistics, methodological and practical issues

Methodology Statement: 2011 Australian Census Demographic Variables

AmericasBarometer, 2016/17

Armenian Experience on Agricultural Census

The Household Survey In The German Census 2011

Ghana - Ghana Living Standards Survey

Overview of Civil Registration and Vital Statistics systems

Regional Workshop on the Use of Electronic Data Collection Technologies in Population and Housing Censuses Bangkok, Jan.

ILO-IPEC Interactive Sampling Tools No. 5. Listing the sample Primary Sampling Units (PSUs)

SESSION 3: ESSENTIAL FEATURES, DEFINITION AND METHODOLOGIES OF POPULATION AND HOUSING CENSUSES: MALAYSIA

E-Training on GDP Rebasing

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

Chapter 3 Monday, May 17th

Vincent Thomas Mule, Jr., U.S. Census Bureau, Washington, DC

Internet Survey Method in the Population Census of Japan. -- Big Challenges for the 2015 Census in Japan -- August 1, 2014

Record linkage definition and examples

1 NOTE: This paper reports the results of research and analysis

5 TH MANAGEMENT SEMINARS FOR HEADS OF NATIONAL STATISTICAL OFFICES (NSO) IN ASIA AND THE PACIFIC SEPTEMBER 2006, DAEJEON, REPUBLIC OF KOREA

SAMPLING. A collection of items from a population which are taken to be representative of the population.

Chapter 1 Introduction

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Guyana - Multiple Indicator Cluster Survey 2014

Use of Multi-Mode Methods in Census Data Collection

Thailand - The Population and Housing Census of Thailand IPUMS Subset

INTERNATIONAL TELECOMMUNICATION UNION

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

Measuring ICT use by businesses in Brazil: The Project of the Brazilian Institute of Geography and Statistic (IBGE)

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

Health Record Linkage at Statistics Canada

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE.

Transcription:

Use of administrative sources and registers in the Finnish EU-SILC survey Workshop on best practices for EU-SILC revision Marie Reijo, Senior Researcher

Content Preconditions for good registers utilisation Register use in the Finnish SILC/IDS, overview Register use by the Finnish SILC/IDS survey stages Sampling design and sample selection Weighting and unit non-response correction Data collection and processing Data analysis Integrated modules, e.g. HCFS 2013

Preconditions: comprehensive and reliable register system Basic registers Major registers (incl. statistical registers by Statistics Finland) Statistics production and releasing by Statistics Finland Efficient information system for collecting registers Register-based census system created with the 1970 census, from 1987 census entirely from administrative sources Totally register-based statistics, e.g. Statistics on taxable income since 1969, Total statistics on income distribution (TSID) since 1995 Unified identification codes, exact matching Registers used for sample based surveys since 1970 s, HBS originates in 1966 and Income distribution statistics (IDS) in 1977 with integrated SILC 2004 Legislative basis for statistical purposes Public approval Best practices (Statistics Finland 2004; UN/ECE 2007; UN 2012; see also Wallgren & Wallgren 2014)

Registers use in the Finnish SILC/IDS, overview Stage Sources Linkage units Methods Aim Sampling (1. phase) Sampling (2. phase) Data collection Data processing Estimation, Weighting The Population Information System The Population Information System, Taxation register Several register sources. Several register sources. The register data on household-dwelling population by Statistics Finland, The TSID data. - Direct use. Sampling frame, sample selection (master sample) and update Person, householddwelling unit Person, enterprise, region Person, enterprise, region Person, dwelling, building, enterprise, region Person, householddwelling unit, region Deterministic record linkage. Deterministic record linkage. Deterministic record linkage. Deterministic record linkage, and methods to derive, estimate impute and code variables, e.g. regression estimation, stratification. Deterministic record linkage, several methods, e.g. regression estimation, calibration methods. Quality analysis Total data, e.g. TSID Person Direct use, Deterministic record linkage. Strata construction, sample selection of selected persons from the master sample by stratum. Auxiliary data to the sample for CATI Blaise questionnaire: data editing in interviews. Replaced interview and substitutive information for target variables: data collection for target variables. Auxiliary information for interviewed data checking and editing, detecting and correcting errors (e.g. inconsistencies at unit level) for target variables. Auxiliary and substitutive information for editing, imputing of missing information for target variables. Using information combined with interviewed or register information to derive and form target variables. Information for unit non-response analysis, unit nonresponse correction, adjusting data to the target (total) population. Using data on crucial frequencies and income and income receiver sums. Data comparisons, unit non-response (e.g. panel attrition) and other analysis

Register sources in sampling Registers: Basic register: Population Information System of the Population Register Centre National Board of Taxes Persons Buildings and dwellings Taxation Data of Statistics Finland: Sample frame: total data copy of persons, buildings and dwellings Master sample, Master sample by stratum SILC/IDS sample

Registers use for two-phase stratified sampling Sample frame of the Population Information System, up-to-date Persons residing permanently in Finland at the end of the year, ordered by domicile code (address) Unified identification codes for persons Selected systematically for the 1 st phase master sample (about 50 000) Over-coverage (persons not in the target population sy t-1;31.12. ) excluded, checked against updated register data Socioeconomic strata for the 2 nd phase sample selection Socioeconomic strata: data linked from taxation register (sy t-2 ) to the persons living in sample person s household dwelling unit -> 12 strata: information on taxable income type and level, defined by the highest earner in the household-dwelling unit SILC/IDS gross sample (about 13 500 persons) selected by simple random sampling with non-proportional allocation from strata Use of taxation registers data for stratification ensures less biased estimates for important output measures.

Register sources in weighting and unit non-response correction Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare Persons, buildings and dwellings Statistics Finland, Taxation... Data: Pensions Population data Social insurance Social assistance Total statistic on income distribution data Education fund State Treasury SILC/IDS Financial Supervision Authority Treasury Ministry of Agriculture and Forestry Statistics: Householddwelling units

Registers use for weighting and unit non-response correction Unit non-response analysis by register data Calibration of non-response adjusted design weights by frequencies and sums from the household-dwelling units and TSID data by Statistics Finland (register household-dwelling population and household-dwellings sy t-1;31.12 and their income for the sy t-1 ): Number of households Sex * age (5-year) groups of household-dwelling population, the oldest age group 85+ Number of members in household-dwelling unit (1,2,..,6+) Region (nuts3, Helsinki and capital area separated) Degree of urbanisation Sums of the 12 income components Number of the 3 income component receivers Standard methods and calibration variables are used over the years

Total disposable household income means by strata, 1 st wave 1000 euros 100 90 80 Mean (sample) 70 60 50 40 30 20 Mean (design weight, nonresponse adjusted) 10 0 Mean (calibrated weight) Source: IDS/SILC sy2015

Total disposable household income means by strata, 4 th wave 1000 euros 100 90 80 Mean (sample) 70 60 50 40 30 20 Mean (design weight, nonresponse adjusted) 10 0 Mean (calibrated weight) Source: IDS/SILC sy2015

Register sources in data collection and processing Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare Persons, buildings and dwellings Taxation Authority Statistics Finland... Treasury Registers, Data: Business register Student register Pensions Population data Statistics: Social insurance Social assistance Total statistic on income distribution data, incl. indebtedness Education fund State Treasury SILC/IDS Financial Supervision Ministry of Agriculture and Forestry Register on degrees Families Householddwelling units

Registers use in data collection and processing Detecting and correcting erroneous responses for target variables during the interview. Auxiliary information is prefilled to householddwelling I wave or housekeeping unit II-IV waves persons in the CATI/CAPI - Blaise questionnaire by exact matching. HH-members sy t are determined first in the interview, if exact match, information is used. Automatic coding during the interview. Editing and coding interviewed data for variables in statistics data base system automatically programmed or manually (loaded to editing system display). Register data linked to persons (exact matching). Forming target variables by record linkage, e.g. data on income, or by editing or imputing non-responded items of objective type of variables by statistical methods. Exact matching. Standard editing rules, if no changes in sources or definitions. Consistencies of data from different sources are ensured for units.

Data collection for variables from registers Registers use have many advantages: e.g. lower response burden and costs, better accuracy Assessing registers exploitation, which is efficient and sufficient enough for the SILC data quality? Relevance? Definitions: SILC variables vs. register variables Opinions, subjective type of data rarely available from registers All factual variables are not available at all from registers Validity of factual data which are available from registers Comprehensiveness and completeness Reference time periods and time points Register data: no information available from interview time point => Data consistency of multipurpose survey data in particular Consistency within domains Consistency between domains Statistical domain registers delay, SILC timeliness Coherence of statistics in statistical system

Case: Income Almost all of the SILC/IDS income from registers, about 98 99 % Statistical data on household dwelling population data by Statistics Finland as base data, many comprehensive registers sources: Earliest register received in April, others mostly in August to November The final taxation register received in November TSID released in December (survey year) Errors may possible (e.g. missing units, missing or erroneous items), then need for updated data from register providers Preliminary error detecting first by Data Collection Unit of Statistics Finland Data filled both in TSID and SILC/IDS sample data base files Common, consistent income classification by detailed register items, information on changes beforehand for data collection and planning Unified data compilation, e.g. edited and derived variables formed to total data and sample, apart from register files and variables. Original register, interviewed and derived variables in separate files of statistics production data base. Contents described in meta data system. Macro and micro checks, sample for error detecting at unit level Early registers for interviewed data editing, checked against final data

Case: Main activity Income from registers for calendar year, many main activity variables filter by PL031(Current=December), definitions are based on person s own perception. Interviewed IDS activity months are edited against registers during the reference year: decision rules are based on income type and level and other factual information on person s economic position. Overlapping activities are allowed for edited IDS months: sum = 12 or >12. SILC PL073 PL090 and PL211A PL211L: PL211L = PL031 (December). Final IDS months: edited to 11 % of persons Final December (PL031): edited to 4 % of persons Final PL073 PL090: edited to 15 % of persons. The number of months for both sources were equal to 85 % of persons. PL211A PL211L: Months are same for 86,5 %, errors corrected for about 2 %, if the same main activity (incl. PL031) lasted for the whole year. No other corrections. Consistency with SILC and IDS months, IDS months used for socioeconomic groups classification.

Case: Housing Discrepancy between household definitions (housekeeping and household-dwelling units): sharing the same dwelling (i.e. rentals) with other household, dispersing across many dwellings Discrepancy between interviewed and register dwellings: incl. variables irrespective of household definition (HH010, HH021): Definitions: household s main vs. permanently or usual residence Measurement error, reference time: responded, registrations Measurement error, quality: responded, registrations However, e.g. dwelling municipality is same for 99 %: + dwelling type (apartments or flats vs. others) for 96 %, + housing tenure for 88 %, but + number of rooms for only 50 % of the sample units(= S-R). Number of rooms differ in detached houses with 5 or more rooms. When detecting dwelling for all persons responsible for accommodation hb080, hb090 the dwelling municipality is same for 99 %, dwelling type 96 % of persons, no changes (see above) Register data is used primarily for automatic editing (erroneous, missing values) of objective type of data, linked to S-R. More efforts for exploitation registers? More efforts for decision rules for validating responded main dwelling of the housekeeping unit.

Data analysis: systematic comparisons of estimates Comparisons with household-dwelling population and TSID data: Analyzing sampling and estimation effect. Variables from registers linked to SILC/IDS sample units, adjusting away household and other definitional effects: comparisons of total sums and frequencies. Household definition Income discrepancy due to interviewed income items Other discrepancies, e.g. income classifications Comparisons of sums, frequencies, classifications with register statistics by Statistics Finland, e.g. NA, TSID. Comparisons of frequencies and sums, classifications with external register statistics, e.g. the ESSPROS statistics by the National Institute for Health and Welfare

Integration HFCS with SILC 2013 survey The Finnish SILC sample for HFCS (2 nd wave) compilation. Clearly defined domain, related to income data Used many register and other statistical data sources (in addition to major registers) and many focused techniques for the hard-tointerview HFCS data: Unit linking from registers (comprehensive sources) Register-based estimation, imputing methods based on available data for statistical units from external sources, e.g. separate valuation, perpetual inventory method Statistical matching from HBS by common register variables, e.g. predictive mean matching, file concatenation Some of the wealth data, e.g. opinion types, were interviewed, Additional variables in calibration Methods are developed further for the next HFCS (3 rd wave) in the 2017 SILC survey, as combined with the SILC ad hoc module on wealth and consumption

Thank you for your attention