Lessons learned from a mixed-mode census for the future of social statistics

Similar documents
The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA

Intercensal population updates

Overview of Civil Registration and Vital Statistics systems

Country report Germany

CENSUS DATA COLLECTION IN MALTA

Economic and Social Council

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE.

Economic and Social Council

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

COUNTRY REPORT: TURKEY

Supplementary questionnaire on the 2011 Population and Housing Census FRANCE

Register-based National Accounts

Strategies for the 2010 Population Census of Japan

A Special Case of integrating administrative data and collection data in the context of the 2016 Canadian Census

Presentation of Statistics Denmark. Preben Etwil

Demographic and Social Statistics in the United Nations Demographic Yearbook*

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland

Economic and Social Council

Country Paper : Macao SAR, China

Internet Survey Method in the Population Census of Japan. -- Big Challenges for the 2015 Census in Japan -- August 1, 2014

Use of Registers in the Traditional Censuses and in the 2008 Integrated Census International Conference on Census methods Washington, DC 2014

Using administrative data in production of population statistics; register-based surveys

An Overview of the American Community Survey

National Population Estimates: June 2011 quarter

ESSnet on DATA INTEGRATION

Economic and Social Council

Methodology Statement: 2011 Australian Census Demographic Variables

5 TH MANAGEMENT SEMINARS FOR HEADS OF NATIONAL STATISTICAL OFFICES (NSO) IN ASIA AND THE PACIFIC SEPTEMBER 2006, DAEJEON, REPUBLIC OF KOREA

The Finnish Social Statistics System and its Potential

Article. The Internet: A New Collection Method for the Census. by Anne-Marie Côté, Danielle Laroche

2020 Population and Housing Census Planning Perspective and challenges for data collection

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd

Methods of Register-based Census in Austria

2 3, MAY 2018 ANKARA, TURKEY

Keynote Speech for the International Seminar on Population and Housing Censuses in a Changing World. Seoul, South Korea November 27 29, 2012

Austria Documentation

National Population Estimates: March 2009 quarter

Report on the First Trial Census of the Register-Based Population and Housing Census (REGREL)

; ECONOMIC AND SOCIAL COUNCIL

Managing different data sources

Introduction to the course, lecturers, participants and the European Census 2021

Can a Statistician Deliver Coherent Statistics?

United Nations Statistics Division Programme in Support of the 2020 Round of Population and Housing Censuses

1 NOTE: This paper reports the results of research and analysis

METHODOLOGY OF AGGREGATION OF POPULATION DATA FROM CENSUS DATA TO GRID DATA

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

Census 2000 and its implementation in Thailand: Lessons learnt for 2010 Census *

Examples of Record Linkage Studies from Norway and Bosnia

USE OF ADMINISTRATIVE DATA IN POPULATION CENSUSES IN FINLAND. Kaija Ruotsalainen Statistics Finland. TACIS Seminar Paris, 4-6 October 2004

Lesson Learned from the 2010 Indonesia Population and Housing Census Dudy S. Sulaiman, BPS-Statistics Indonesia

C O V E N A N T U N I V E RS I T Y P R O G R A M M E : D E M O G R A P H Y A N D S O C I A L S TAT I S T I C S A L P H A S E M E S T E R

Planning for the 2010 Population and Housing Census in Thailand

2. Codify admin. data

Supplementary questionnaire on the 2011 Population and Housing Census BELGIUM

Maintaining knowledge of the New Zealand Census *

Canada Agricultural Census 2011 Explanatory notes

Response ID ANON-TX5D-M5FX-5

Collection and dissemination of national census data through the United Nations Demographic Yearbook *

Methods and Techniques Used for Statistical Investigation

United Nations Demographic Yearbook Data Collection System

National approaches to the dissemination of demographic statistics and their implication for the Demographic Yearbook

2. ADDRESS OF DWELLING: Commune Is this dwelling located on tribal land? YES Tribe NO Neighbourhood or place name Number..Street

2011 National Household Survey (NHS): design and quality

Chapter 1 Introduction

0-4 years: 8% 7% 5-14 years: 13% 12% years: 6% 6% years: 65% 66% 65+ years: 8% 10%

United Nations Demographic Yearbook review

REGISTER-BASED CENSUS OF POPULATION, HOUSEHOLDS AND HOUSING, SLOVENIA, 1 JANUARY 2011

Economic and Social Council

THE 2009 VIETNAM POPULATION AND HOUSING CENSUS

Data sources data processing

METHODOLOGY NOTE Population and Dwelling Stock Estimates, , and 2015-Based Population and Dwelling Stock Forecasts,

Appendix 6.1 Data Source Described in Detail Vital Records

Neighbourhood Profiles Census and National Household Survey

MODERN CENSUS IN POLAND

Understanding the Census A Hands-On Training Workshop

Neighbourhood Profiles Census and National Household Survey

Population Censuses and Migration Statistics. Keiko Osaki Tomita, Ph.D.

Italian Americans by the Numbers: Definitions, Methods & Raw Data

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

REPORT OF THE UNITED STATES OF AMERICA ON THE 2010 WORLD PROGRAM ON POPULATION AND HOUSING CENSUSES

Workshop on the Improvement of Civil Registration and Vital Statistics in SADC Region Blantyre, Malawi 1 5 December 2008

Session 12. Quality assessment and assurance in the civil registration and vital statistics system

Tonga - National Population and Housing Census 2011

Sixth Management Seminar for the Heads of National Statistical offices in Asia and the Pacific

Register-based National Accounts

REPORT ON THE EUROSTAT 2017 USER SATISFACTION SURVEY

The Population Estimation Survey (PESS)

Overview. Tips for Getting Started Principal Records of Genealogical Interest Culture Specific Records Website Demo

Quick Reference Guide

Coverage evaluation of South Africa s last census

Administrative sources and their usage for statistical purposes

SESSION 3: ESSENTIAL FEATURES, DEFINITION AND METHODOLOGIES OF POPULATION AND HOUSING CENSUSES: MALAYSIA

EMERGING METHODOLIGES FOR THE CENSUS IN THE UNECE REGION

Symposium 2001/36 20 July English

Name Position Telephone First contact. [redacted under

SADA. South African Data Archive. Population Census, Statistics South Africa CODEBOOK SADA 0070

2011 POPULATION and HOUSING CENSUS in TURKEY

Transcription:

Lessons learned from a mixed-mode census for the future of social statistics Dr. Sabine BECHTOLD Head of Department Population, Finance and Taxes, Federal Statistical Office Germany Abstract. This paper presents some experience acquired so far in Germany with the 2011 Census, which is based on a new data collection method. The new census method is a combination of administrative register evaluation and field surveys. The biggest challenge posed to German official statistics by the new census model is the fact that combining the data from the different data sources at the level of persons in part requires using plain text information (among other things, name and address) because neither a uniform person identification number (personal ID) nor a uniform building ID is available. The experience acquired with that kind of combining data without numerical identifiers in the 2011 Census can be used to make concrete proposals for the further development of the new German census model. The German experience may also be to the benefit of other countries which have not yet used data from administrative registers for a census be-cause they have similarly difficult conditions regarding the combined use of administrative data and field surveys for conducting a census. 1. Legal and political requirements taken into account when developing the new German census model The last census of population and housing in Germany before the 2011 Census was conducted in 1987, that is, as a complete German-wide survey with all inhabitants being questioned by interviewers. That census was accompanied by massive opposition in relevant parts of the population, including a successful constitutional complaint which led to the prohibition to transmit the data back to the administration. According to that prohibition, which is still in force in Germany today, data that were provided to the statistical offices of the Federation and the Länder for the production of statistics must not be transmitted back to the administration. Also, in its judgment made at the time, the Federal Constitutional Court obliged the legislator to consider a specific question when taking decisions on the data collection methods of future population censuses. The question is whether, as a result of further developments of methods of official statistics and social research as well as of data processing, it is still necessary to conduct a traditional complete count or whether the 1

information required for the census could be obtained through a different collection method which would place a smaller statistical response burden on the population. Therefore, after the 1987 population census, the legislator commissioned the statistical offices to develop a new census method. The following general requirements had to be taken into ac-count: Where possible, existing administrative data should be used for the census, and the cost had to be markedly lower than that of a traditional population census, with the quality of the major census results, in particular the number of inhabitants to be determined in the 2011 Census, having to be at least as good as in former population censuses. As a result, a census model has been developed which combines register evaluation with field surveys. Here are the main pillars of that combined model: The population registers provide the main demographic and geographic data as well as information on family relationships for all individuals (about 86 million data records) that be-long to the target population. A large part of information on the employment of the population can be taken from registers of the labour administration (for about 36 million employees subject to social insurance) and from the administrative files of the public service agencies with personnel (for about 3 mil-lion public officials, judges and soldiers). To cover further compulsory census variables of the EU (e.g. on the educational attainment and the employment of self-employed persons) that are not available from registers, a sample-based household survey is conducted through interviewers among some 10 percent of the population. That household survey is also used to estimate the extent of overcoverage and undercoverage in the population registers and to take account of that issue when determining the official number of inhabitants from the population register data. As there are no registers of buildings and dwellings covering the whole of Germany, the compulsory EU variables of the census of housing must be obtained through a postal survey among all owners of buildings and dwellings (for the total of just under 20 million buildings with residential space, approximately 17.8 million owners have to be questioned). In addition, the census of buildings and housing covers variables (number of persons living in a dwelling and names of two persons) which in combination with the information from the population registers on family relationships can be used to link up individual persons to generate (residential) households. 2

2. Acceptance of the new census method 2.1 High acceptance in the population Compared with the trauma caused among politicians and statisticians by the open rejection of the last population census of 1987 in western Germany by major societal groups, this year s census passed off virtually without any opposition. This is due not only to the changed legal and societal framework conditions (data protection provisions were considerably enhanced after the 1987 population census, the cases of misuse of personal data have so far not involved official statistics, many people today are rather generous in dealing with their own data) but also to the new census method. As data of the population registers are used, only a tenth of the population were directly concerned by the household survey. The postal census of buildings and housing was limited to a population group with a lower potential for opposition (owners of residential space). The number of queries made by real property owners who are concerned because they have not yet received a questionnaire is much larger than the number of those who have not yet complied with their obligation to respond. However, there is still a long way to go before all responses have been received, so that any assessment of whether the method has been accepted by the population is slightly provisional. In particular, no reminders entailing potential fines have been sent yet. 2.2 Considerable mistrust among local authorities It is obvious already today that there is a lack of acceptance among regional and local authorities regarding the census results, which are not even available yet. One of the reasons is the mere fact that understanding the basic principles of the new census method requires some knowledge of sampling theory, whereas people can understand the basic principles of a traditional population census or of a pure register count without having any special knowledge. Understanding the sampling and expansion methods applied in the 2011 Census even requires a very detailed knowledge of sampling methodology. As the official number of inhabitants is one of the underlying factors of the equalisation of revenue system (e.g. revenue-sharing of taxes, horizontal fiscal equalisation), it is a major basis determining the financial resources of the regional and local authorities. Consequently, some municipalities prepare already now to take legal action against the official number of inhabitants that will have been ascertained by the 2011 Census. The main point of criticism is the fact that the number of inhabitants, rather than being obtained through a complete count alone, includes an estimate, which involves a sampling error. What is estimated by means of a sample survey with 3

regard to the municipalities is the share of outdated (= overcounting) or missing entries (= undercounting) in the population register data. That share is then taken into account in determining the official number of inhabitants. The level of the sampling error of the official number of inhabitants will not be available before the census has been completed. However, it is laid down in the Census Act that it should be attempted through the sample design to arrive at a relative standard error of a maximum of 0.5 percent. 3. Combining data from various data sources without personal ID Even though collecting and processing the data from various data sources of the 2011 Census is far from finished, the challenges are already emerging that will have to be overcome when combining the data at the address and personal levels. In Germany there is no personal ID and, considering the data protection legislation, it is inconceivable at present that a standard personal ID could be introduced even in the long term which would be used in all administrative registers. The experience acquired so far in the 2011 Census has shown, however, that this situation does not have to preclude conducting a census where data from traditional primary-statistical surveys have to be combined with data from administrative registers. Data combination is achieved here through a set of variables suited to identify persons (among other things, name at birth, family name, first name, date of birth, place of birth and residential address). However, this requires much more effort than does data combination by means of numerical identifiers. The centre of the 2011 Census in Germany is the register of addresses and buildings (Anschriftenund Gebäuderegister AGR), which contains all addresses of buildings with residential space in Germany. It is just a temporary statistical register that has been set up specifically for purposes of the 2011 Census and may be used only to conduct the 2011 Census. Due to legal provisions, the register and all its data will have to be deleted 6 years after the census reference date at the latest. To set up the AGR, the geo-referenced address data of the land surveying ad-ministration, the data from the population registers (the population registers are maintained in a decentralised way by the approximately 12,500 municipalities in about 5,400 computing centres) and the registers of persons of the Federal Employment Agency were used. Setting up the AGR required combining the data sources, which involved some text matching (street names). The AGR is the universe of the census of buildings and housing. It contains the data required for dispatching the questionnaires (name and address of the owner of the building) and provides the sampling frame for the household survey. In addition, it is used in the data collection and processing phases to combine the data from the various data sources at the level of persons. Generally, that combination is a two-stage process. At the first stage, the personal data records are linked to the 4

relevant address (using the variables of post code, official municipality code, sub-municipality, street name, house number and house number supplement where applicable) and only in the second stage will the matching of persons occur within the relevant address. When combining the data sources to set up the AGR, three weak points emerged. First, renaming of street names (e.g. in the case territorial status changes) is not done simultaneously in the three data sources; second, street names have different spellings (between the three data sources and even within a data source); and third, house number supplements are used in different ways. In most cases, the different spellings of street names were clarified through computer-aided processing, e.g. by standardising abbreviations or removing special characters and numbers. However, many cases remained which had to be settled by manual checking. For the approximately 1.2 million streets in Germany, there were about 5.1 million different spellings in the three data sources used to set up the AGR. Using the knowledge obtained on the different street name spellings, a street thesaurus was compiled, which was used when updating the AGR with new data deliveries. That was necessary because new data deliveries contained the old, non-standardised street names as it was not allowed to transmit the standardised spellings back to the agencies maintaining the registers. So, when integrating the update deliveries into the AGR, it was no longer necessary in most cases to perform either the automated standardisation steps or in particular the time-consuming manual checks. 4. Developing further the new census method 4.1 Using standard address identifiers in the administrative registers Combining different data sources could be done in a much more efficient manner for future mixedmode censuses if uniform standards of address variables were introduced in the administrative registers used for the census. An obvious code that could be used here is the geo-coordinate. As a starting point for developing such a list of addresses, the thesaurus of streets could be used which was developed for the 2011 Census. However, some issues would have to be settled first. For instance, there are licence issues and the question of how the process of up-dating the street thesaurus (mainly adding new street names) can be organised efficiently. An official list of addresses might be linked to the List of Municipalities that is maintained by the statistical offices of the Federation and the Länder. This means concretely that the statistical offices would offer the list of addresses free of charge, with liberal licensing terms that are easy to understand, and through state-of-the-art application programming interfaces (APIs). In turn, this would ensure that a standardised stock of addresses would be available to the statistical offices when they access administrative data. 5

4.2 Abandoning the updating of census results To provide current data on the demographic structure of the population in Germany, the results of the latest census are updated in a detailed regional (down to municipality level) and demo-graphic breakdown (age, sex, marital status and citizenship). This is done by means of data from vital statistics (births and deaths) and from the statistics of arrivals and departures (only if occur-ring across municipality borders). Such updating of population figures is rather error-prone, especially if it has to be done over a long period. It should therefore be attempted in the medium term to replace intercensal population updates by regular evaluations (at least once a year) of the population registers by the statistical offices of the Federation and the Länder. This would involve at least the following two requirements. 1. The overcoverage and undercoverage of the population registers as ascertained by the 2011 Census would have to be within tolerable limits. 2. The population registers, which so far have been decentralised and non-networked, would have to be networked at Land level, so that a Germany-wide register stock would be created. Another considerable quality improvement of a networked register could be achieved by regularly (e.g. once a year) checking and adjusting the register for double entries and by conducting a survey among persons that are registered in the population register only with a secondary residence. It would also be necessary to better account for the requirements of official statistics when the contents of population registers are defined. For example, items that should be stored in the population registers are, for all persons, their former places of residence within the last five years and, for naturalised persons, their former citizenships. Such data are necessary to delimit the target population of the census according to international standard definitions and to record migration variables. 4.3 Setting up a permanent register of buildings and dwellings The census of buildings and housing accounts for half of the costs of the German 2011 Census. In the register-based census conducted in Germany, it is necessary to have that largest primarystatistical survey component because there are no Germany-wide registers of buildings and dwellings. Also, the census of buildings and housing in Germany has to provide auxiliary variables that can be used to generate households and to combine them with the dwellings data. For the 6

purpose, not only data on the building and the dwelling are collected but also up to two names of main tenants/owner-occupiers per dwelling. The 2011 census of buildings and housing was conducted as a postal survey among all owners of residential buildings (in Germany, there are just under 20 million buildings with residential space). To this end, a register of addresses had been set up containing the name and address of the owner for all addresses. The data on the owners of buildings had to be researched in administrative registers (e.g. from the real property tax data kept by the municipalities or from the administrative data of the supply and waste management enterprises). That research has turned out to be highly work-intensive. It also turned out that the quality of the data on owners can differ considerably between regions. The partly poor quality and, in particular, the inadequate up-to-dateness of the data on owners as researched for the 2011 census of buildings and housing not only necessitated huge efforts of data collection but also led to considerable problems of acceptance on the part of the respondents. It is therefore obvious that the German statistical offices would like to extend the success story of population register use and of the business register to include other areas. Therefore, efforts are being made to arrive at a situation where also the data on buildings, dwellings and households can be collected on the basis of registers, which would mean rapid and low-cost data collection. A register of buildings and dwellings still to be set up would create the basis for that intention. Such a register would not only have to contain complete and always up-to-date data on all buildings and dwellings and the relevant basic data needed to meet the requirements of the EU census regulation (year of construction, size of dwelling, ownership structure, type of heating). It would also have to include a unique dwelling identifier, which would be matched with a suitable counterpart in the population register, so that in a future census the required data on households could be provided without synthetisation. By linking the register of buildings and dwellings to the population register, a relation between an individual and a registered dwelling would permanently be available. This would create an informational value added, so that, for example, reliable and upto-date data could be obtained on the structure of (residential) households in a de-tailed regional breakdown. In addition, a register of buildings and dwellings, enriched by basic variables from the population register (number of persons per building and number of persons per dwelling), would provide a well suited and up-to-date sampling frame for household sample surveys that are conducted to meet an ad-hoc information need. The first-time set-up of such a register could be assigned to the statistical offices of the Federation and the Länder because, from setting up the register of addresses and buildings for the 2011 Census, 7

they have acquired wide experience and because, without a new register of buildings and dwellings, they would have to set up a register of addresses and buildings again for the 2021 Census in a few years time. It would be helpful, however, if not only official statistics but also the public administration could benefit from such a register. Therefore, it should not be maintained as a statistics register, because in this case it would not be allowed to transmit data back to the administration; it should rather be maintained outside the protected sphere of official statistics. The availability of such a register would avoid an enormous amount of work for the statistical offices about every ten years when a census is to be conducted. In addition, it would benefit many other official statistics. For example, updating the statistics on building activity would achieve a level of quality and up-to-dateness that has never been achieved so far. Also, the additional survey on the housing situation, which is conducted as part of the microcensus, could completely be abandoned, while losing only a few variables such as rent and energy source of heating; this would further contribute to reducing primary-statistical surveys to what is absolutely necessary. 8