ESSnet on DATA INTEGRATION

Similar documents
Economic and Social Council

Strategies for the 2010 Population Census of Japan

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

; ECONOMIC AND SOCIAL COUNCIL

THE 2009 VIETNAM POPULATION AND HOUSING CENSUS

Using administrative data in production of population statistics; register-based surveys

CENSUS DATA COLLECTION IN MALTA

Lessons learned from a mixed-mode census for the future of social statistics

COUNTRY REPORT: TURKEY

An Overview of the American Community Survey

Removing Duplication from the 2002 Census of Agriculture

Record linkage definition and examples

Use of administrative sources and registers in the Finnish EU-SILC survey

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd

Data Integration Activities on the Way to the Dutch Virtual Census of 2011

Economic and Social Council

Planning for the 2010 Population and Housing Census in Thailand

Country report Germany

International Workshop on Economic Census

Register-based National Accounts

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Use of Registers in the Traditional Censuses and in the 2008 Integrated Census International Conference on Census methods Washington, DC 2014

The Dutch Census IPUMS files of 1960, 1971, 2001 and Eric Schulte Nordholt

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA

The main focus of the survey is to measure income, unemployment, and poverty.

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE.

THE ESTABLISHMENT CENSUS IN VIET NAM

Introduction to the course, lecturers, participants and the European Census 2021

Economic and Social Council

Measuring ICT use by businesses in Brazil: The Project of the Brazilian Institute of Geography and Statistic (IBGE)

Overview of Civil Registration and Vital Statistics systems

Austria Documentation

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

LOGO GENERAL STATISTICS OFFICE OF VIETNAM

Regional Course on Integrated Economic Statistics to Support 2008 SNA Implementation

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

Presentation of Statistics Denmark. Preben Etwil

National Economic Census 2018: A New Initiative in National Statistical System of Nepal

Data Processing of the 1999 Vietnam Population and Housing Census

Country Paper : JAPAN

5 TH MANAGEMENT SEMINARS FOR HEADS OF NATIONAL STATISTICAL OFFICES (NSO) IN ASIA AND THE PACIFIC SEPTEMBER 2006, DAEJEON, REPUBLIC OF KOREA

Can a Statistician Deliver Coherent Statistics?

The Use of Population Census

Using Administrative Records for Imputation in the Decennial Census 1

Supplementary questionnaire on the 2011 Population and Housing Census FRANCE

Canada Agricultural Census 2011 Explanatory notes

Prepared by. Deputy Census Manager Zambia

Final technical report on Improvement of the use of administrative sources (ESS.VIP ADMIN WP6 Pilot studies and applications)

Saint Lucia Country Presentation

Country presentation

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Census 2000 and its implementation in Thailand: Lessons learnt for 2010 Census *

Statistics for Development in Pacific Island Countries: State-of-the-art, Challenges and Opportunities

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Using registers E-enumeration and CAPI Electronic map. Census process. E-enumeration. Census moment and census period E-enumeration process

Methodology Statement: 2011 Australian Census Demographic Variables

Italian Americans by the Numbers: Definitions, Methods & Raw Data

Copyright March, Published by:

Demographic and Social Statistics in the United Nations Demographic Yearbook*

Outline of the 2011 Economic Census of Cambodia

THE 2012 POPULATION AND HOUSING CENSUS AN OVERVIEW. NATIONAL BUREAU OF STATISTICS 4 th August, 2011 Dar es Salaam

Use of administrative data in statistics Nordic experiences. Kaija Ruotsalainen UN World Data Forum January, Cape Town, South Africa

Introduction Strategic Objectives of IT Operation for 2008 Census Constraints Conclusion

9 th World Telecommunication/ICT Indicators Meeting (WTIM-11) Mauritius, 7-9 December 2011

USE OF ADMINISTRATIVE DATA IN POPULATION CENSUSES IN FINLAND. Kaija Ruotsalainen Statistics Finland. TACIS Seminar Paris, 4-6 October 2004

Vanuatu - Household Income and Expenditure Survey 2010

RURAL, AGRICULTURAL & FISHERY CENSUS IN VIETNAM

1 NOTE: This paper reports the results of research and analysis

Use of Administrative Data for Statistical purposes: Bangladesh perspective

Keynote Speech for the International Seminar on Population and Housing Censuses in a Changing World. Seoul, South Korea November 27 29, 2012

Chapter 1 Introduction

Working with NHS and Taxfiler data to measure income and poverty in Toronto neighbourhoods

Universal Credit Self-employment guide

Administrative sources and their usage for statistical purposes

FOREWORD. [ ] FAO Home Economic and Social Development Department Statistics Division Home FAOSTAT

Ghana - Ghana Living Standards Survey

The IPUMS-Europe project: Integrating the Region s Census Microdata

MODERN CENSUS IN POLAND

Proposed Information Collection; Comment Request; The American Community Survey

Census Data for Grant Writing Workshop Cowlitz-Wahkiakum Council of Governments. Heidi Crawford Data Dissemination Specialist U.S.

Botswana - Botswana AIDS Impact Survey III 2008

2011 National Household Survey (NHS): design and quality

Gender Situation at The Republic of Tajikistan. Serbia 27 November - 1 December of 2017

Counting the People of Rwanda

Indicators and statistics of Information and Communications Technology

My Tribal Area: Census Data Overview & Access. Eric Coyle Data Dissemination Specialist U.S. Census Bureau

Evaluation of the gender pay gap in Lithuania

Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000

Country Paper : Macao SAR, China

REPUBLIC OF TOGO. Census of Agriculture 2012 of Togo : Overview and experience in collecting gender data. ABOU Hibana

Understanding the Census A Hands-On Training Workshop

Overview of the Course Population Size

Drafted by Anne Laurence 9 Dec 2013

Economic and Social Council

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

How Statistics Canada Identifies Aboriginal Peoples

Session 10: Quality of Register-based Statistics

Transcription:

ESSnet on DATA INTEGRATION WP5. On-the-job training applications LIST OF CONTENTS On-the-job training courses 2 1. Introduction 2. Ranking the application on record linkage 2 Appendix A - Applications on record linkage 4 1. Latvia 4 2. Luxemburg 6 3. UK 7 4. Hungary 8 5. Slovak Republic 9 6. Malta 10 Appendix B - Applications on statistical matching 11 1. Poland 11

On the job training courses 1. Introduction The ESSnet on Data Integration will disseminate record linkage and statistical matching practices, methods and tools by means of two on-the-job training courses. A call for applying was disseminated 1) during the DIME meeting in March 2010; 2) among the contact people of the ESSnet ISAD; 3) among the participants in the last ESTP course on Statistical methodologies for the integration of data sources (Luxemburg, 27-29 May 2009); 4) among the list of users and contact persons of the software tool Relais. On the call deadline, 7 application forms were available: 1 for statistical matching (Poland) and 6 for record linkage (Latvia, Luxemburg, U.K., Hungary, Slovak Republic, Malta). The ESSnet-DI budget can support two on-the-job training courses: one on record linkage and one on statistical matching. Given that there is only one application form in the area statistical matching, the on-the-job training on statistical matching will be held in Poland. During the II ESSnet DI meeting (Den Haag, 27-28 May 2010) the participants agreed to hold the on-the-job training course on statistical matching in the week of the third ESSnet-DI meeting. This meeting will be held at the Regional Statistical Office in Poznan. Hence the budget for this on-thejob training is approximately covered by the budget for the III ESSnet-DI meeting. This allows to have two on-the-job training courses on record linkage. The 6 applications for record linkage are compared according to two aspects: 1) relevance to the topic record linkage; 2) feasibility of the record linkage problem; 3) when two or more proposals are equivalent according to the two previous criteria, ordering will follow a first-come-first-served rule. Applications are available in Appendix A (for record linkage) and Appendix B (for statistical matching). 1. Ranking the applications on record linkage According to the first two aspects for ranking the applications, the following ordering was obtained. 1) O.N.S (UK). The Beyond 2011 project is suited for the application in large scale of data integration methods for social surveys. Probabilistic record linkage is suitable for tackling the project objective and results seem feasible. Results of this project seem to be useful and promising for all the ESS countries in their movement from traditional surveys and censuses to statistics based on the joint use of different data sources. 2) Central Statistical Bureau of Latvia and National Statistics Office of Malta. The two offices propose similar problems for linking social registers. The objective is limited in scope with respect to the one proposed from the ONS. It seems that the problems proposed by the two NSIs can have a solution by the application of probabilistic record linkage. 3) STATEC (Luxemburg) and Statistical Office of the Slovak Republic. The two NSIs propose similar problems for linking enterprise registers (second proposal by STATEC). The nature of the registers in the problem proposed by the Slovak Republic may jeopardize a useful linkage result: the same enterprise or farm should be linked by means of variables

that refer to different units (e.g. the owners of the farm). This is not a problem of key variables affected by errors, as in the probabilistic record linkage case. 4) KSH (Hungary). The problem proposed is not tackled by a record linkage procedure, but by a simple merge by means of unit identifiers. The proposed problem refers to the quality of the linked results, hence it mainly concerns the micro integration part of the project. This part is not covered by the on-the-job training courses. Including also the date of application, the six proposals are ordered in the following way: 1) ONS UK; 2) Central Statistical Bureau of Latvia (application sent on 16-4-2010); 3) National Statistics Office Malta (application sent on 28-4-2010); 4) STATEC - Luxemburg (application sent on 27-4-2010); 5) Statistical Office of the Slovak Republic (application sent 30-4-2010); 6) KSH Hungary. The 6 NSIs will be contacted according to their rank in order to agree on a training schedule. If for any reasons an agreement is not reached, the NSI with a lower rank will be contacted. NSIs that will not be involved in an on-the-job training will have priority for the project course to be given in Rome, September 2011.

Appendix A - Applications on record linkage 1. LATVIA Office: Central Statistical Bureau of Latvia Contact persons: Ieva Slosa (Ieva.Slosa@csb.gov.lv), Martins Liberts (Martins.Liberts@csb.gov.lv) Received on 16-4-2010. Central Statistical Bureau of Latvia (CSB) obtains most part of statistical data by carrying out surveys and using data stored in administrative registers. The role of administrative registers is in increasingly important as there are used in different phases of survey process such as survey sampling, data collection, record linkage, processing, estimation and quality control. The CSB has identified that the lack of knowledge on advanced statistical matching methods and probabilistic record linkage put constraints on broader usage of registers. The main administrative registers used are Population register - Office of Citizenship and Migration Affairs, State Revenue Service, Social Insurance Fund/Agency, State Employment Agency and other. For example The part of the EU-SILC survey data are obtained from administrative sources - Population register, State Revenue Service (SRS) and State Social Insurance Agency (SSIA). The data collected from respondents are merged with administrative registers using personal identification number. Demographic information such as persons name and surname, sex, data of birth, personal identification number is used from Population register. Practically all government transfers data such as pensions and state social benefits are obtained from SSIA. Only information about some minor benefits, which are administrated by local municipalities or pensions paid by other countries and service pensions, which are not administrated by SSIA, is asked in questionnaires. The exception is net employee cash or near cash income, which is available as well from SRS, but it was decided to use information from questionnaires. Gross employee cash or near cash income was obtained counting up net employee cash or near cash income from questionnaires with paid taxes from SRS. Information from SRS is also used for imputation purposes if amount of net employee cash or near cash income is missing in questionnaire and in those cases when SRS information shows higher income than reported in questionnaire. Structural business statistics (SBS) are compiled also by combining survey data with data from State Revenue Service. Variables such as stocks, turnover, other income, fixed assets, expenditures, taxes, wages and salaries, social contributions paid and number of employees are linked with administrative sources for the calculation of Structural Business Statistics data. Characteristics of the data sets The identifier of a persons is identity number in Latvia (http://www.pmlp.gov.lv/en/pakalpojumi/pskd/identity.html). It is the best identifier because it is unique. We have identity numbers of persons in most of administrative registers we are using for a statistical production. We are asking for an identity numbers of respondents in sample surveys (EU- SILC, LFS,...). Sometimes respondents are refusing to report an identity number. The reason could be that respondent wants to keep his identity in secret. We have to use other (non-unique) identifiers to link the survey data with the data from administrative registers - for example name, date of birth, sex. The name can be affected by transcription errors - especially in case of Russian names. The identifier of an enterprise is VAT number in Latvia. It is unique identifier. Respondents can not hide his VAT number, so matching of data in case of business surveys is more straightforward.

Sometimes we have problems to mach data from two registers. One good example is the data from the Population Register and data from the Register of Addresses (please not that I am not using the official names for these registers, I am trying just to give an example). There is an information about the declared living place for all persons in the Population Register. But the information about the declared living place in the Population Register is not linkable with the Register of Addresses. The Population Register is not using the address codes existing in the Register of Addresses (although they have to do it by a law). So there is just a name of a living place in the Population Register, but the name of living place quite often is affected by transcription errors. The names of places are modified in time as well. We are trying to add address codes to the data from the Population Register by ourselves. Through address code we can link the information about the building (the year of construction, number of flats, geographical coordinates,...) with the data about the persons.

2. LUXEMBURG Office: STATEC Contact persons: Nico Weydert (nico.weydert@statec.etat.lu) Received on 27-4-2010. The files to be integrated are twofold but related. From the population census we collect the name of the employer and we use this name to match it against the business register in order to associate the activity sector to the respondent of the population census. We have the files of the population census of 2001 and we could use this exercise to prepare the association for the population census to come in February 2011. The other exercise could be easier. In our business register we have information on legal units: name, address (street name, street number, postal code, town), legal form, administrative identifier ) and we would like to match this data against the Commerce register file with similar, but not identical, information: name, address (street name, street number, postal code, town), legal form, commerce register identifier. For the moment there is not table linking the commerce register identifier and the administrative identifier, but we need such a link for instance in the European Group Register work and also for a balance sheet data office we are setting up.

3. UK Office: ONS Contact persons: Dick Heasman (dick.heasman@ons.gsi.gov.uk) Received on 29-4-2010. My name is Dick Heasman, and on behalf of the Office for National Statistics I wish to apply for on-the-job training in record linkage. I will act as our contact person at the email address above. To give a little background to the data sets we will offer for integration, ONS has established the Beyond 2011 Project, which aims to investigate:- the feasibility of improving population statistics in the UK by making use of integrated data sources to replace or complement existing approaches; and whether alternative data sources can provide the priority statistics on the characteristics of small populations, typically provided by a Census. The data sets we propose to offer will be an extract of Census data and one or more administrative data set, such as patient register data from National Health Service records. Owing to the extreme sensitivity of such data sets the records we would offer would be simulations, while having fields and variable distributions similar to those of the data sets they are meant to simulate. The administrative data set(s) will have some records that do not link to the Census data set, and the Census data set will have some records that do not link to the administrative data set(s). We will simulate various errors or modifications in the administrative data sets which will jeopardise the detection of the true set of linked pairs, but can also keep a version not subject to error or modification that can be used as a check on the success of the linkage. The objective of the data integration will be to establish, using the Census data as a benchmark, whether the administrative data set(s) can provide sufficient coverage of small populations to be used as part of a programme to replace or complement existing approaches to population statistics.

4. HUNGARY Office: KSH Contact persons: Gardos Éva (Eva.Gardos@ksh.hu) Received on 30-4-2010. With reference to your call for on-the-job training KSH has a proposal of a record linkage described below: The two selected data sets are the followings: 1. annual investment expenditure and revenue data of the central government institutions reported in the budget reports of each units, summarized and sent us by the Treasury, 2. HCSO annual survey on the investments of central government institutions. Surveys are filled in by the government units and the Statistical Office processes them. Both data sets are full-scoped; and contain individual data with unique identifier. The objective of the data integration is to obtain most accurate and correct information on investment expenditures and revenues of the central government. The following steps should be done: - to compare the two data sets unit by unit, - to list the missing units from both sets, - to investigate the reasons of missing, - to list the big differences between two data for the same unit, - to investigate the reasons of the differences and - to calculate a new investment data from the two data sets. Regarding the date of the training for the data linkage above we prefer June or September. These would be the most convenient in respect to the workload. We are having negotiations with our colleagues to include further data integration into the program. Let us send the result, if it is positive, next week.

5. SLOVAK Republic Office: Statistical Office of the Slovak Republic Contact persons: Andrej Vallo (Andrej.Vallo@statistics.sk) Received on 30-4-2010. The task to be accomplished is compilation of a register of farms private households for the purposes of the next wave of Farm Structure Survey. The basis for this is a register, created in 2001 census of farms, which should be updated and complemented using several administrative sources. The units of the analysis are private households with agricultural production exceeding the legal thresholds. There are 6 data files to be integrated in a register: 1. Data from 2001 Structural Census of Farms 2. List of farms from Agricultural Paying Agency (based on ownership of land) 3. List of vineyards owners 4. List of cattle breeders 5. List of sheep breeders 6. List of goat breeders These private households have no IDs as enterprises. The identification variables are name of one person (differently structured) address (differently structured) Personal ID or date of birth (date of birth defines the first 6 digits of a 10-digit ID, the 3rd digit defines sex) The integration is complicated by the fact that different persons from the same household may be listed in different files.

6. MALTA Office: National Statistics Office Contact persons: Silvan Zammit (silvan.zammit@gov.mt) Received on 28-4-2010. We are basically interested to link up a number of registers and retrieve demographic information and other details such as telephone numbers, updates of residential addresses and household composition. These sources normally do not have a unique identifier, except for the case of individuals for which we normally rely on the individuals ID card. However, the real problems we face are when we deal with dwellings or when no unique identifier is available. For instance, just to put you in the picture, in Malta there is no official dwellings and population register. For this reason, a number of departments/ministries have their own sources, each with a particular data architecture. The NSO itself, has its own registers which have already been linked with a number of external sources at micro level for updating purposes. However this is not enough and we envisage to link up our registers with other reliable sources. Our registers were compiled during the last Census of population and housing in 2005. However, as you may appreciate, there is a great need to update these registers both at dwelling and (especially) personal level in view of the next Census in 2011. These registers are also used as a sampling frame for a numbers of social surveys conducted locally. For this reason, we have to merge several sources altogether to update our registers. The problems arise since dwelling names, street names etc are stored in different string formats and hence the process of linking up a number of registers normally involves a lot of manual and tedious work. Generally, the registers contain between 5,000 and 150,000 units and contain the variables mentioned above.

Appendix B - Applications on statistical matching 1. POLAND Office: Central Statistical Bureau of Poland Contact persons: Marcin Szymkoviak (M.Szymkowiak@stat.gov.pl) Received on 30-4-2010. Description of PGSS The general goal of the PGSS is the systematic measurement of the trends and consequences of social change in Poland. The PGSS studies individual attitudes, values, orientations and social behavior, as well as measurements of socio-demographic, occupational, educational and economic differentiation of representative groups and strata in Poland. The initially annual (until 1997) and subsequently biennial cycle of repeated surveys with uniform methodological standards and identical indicators allows for systematic analysis of social trends. In this respect PGSS is a unique program for studying systemic change in Poland. PGSS data come from individual interviews with a nation-wide representative sample of adult household members. PGSS data from 2008 contain 1293 respondents. From about 1500 variables 17 were chosen for the purpose of integration, from which 7 are common with DS survey and 10 are distinct. Description of DS The survey comprises many aspects associated with the situation of households and individual citizens. The social indicators, taken into account here, can be divided into three general classes: the demographic and social structure of households, the living conditions of households associated with their material conditions, access to health care services, culture, recreation, education and modern communication technologies, the subjective quality of life, lifestyle, beliefs, attitudes and behaviors of individual respondents. The indices that describe the demographic and social structure of the households are not subject to separate analysis in the present report; they serve only as a means of stratifying the groups of households and individuals in order to enable a comparison of the conditions and quality of life according to various social categories, such as gender, age, education level, place of residence, social and professional status, main source of income, civil status, type of household (created on the basis of the number of families and biological family type) and other criteria. The DS set contains 73 388 units. From about 2000 variables 28 were chosen for the purpose of integration, from which 7 are common with PGSS survey and 21 are distinct. The problem The main problem is to apply different methods of statistical matching and evaluate the quality of integrated data.