Final technical report on Improvement of the use of administrative sources (ESS.VIP ADMIN WP6 Pilot studies and applications)

Similar documents
Using administrative data in production of population statistics; register-based surveys

Register-based National Accounts

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Strategies for the 2010 Population Census of Japan

Register-based National Accounts

Can a Statistician Deliver Coherent Statistics?

Planning for the 2010 Population and Housing Census in Thailand

Introduction to the course, lecturers, participants and the European Census 2021

ESSnet on DATA INTEGRATION

Session 10: Quality of Register-based Statistics

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

The Dutch Census IPUMS files of 1960, 1971, 2001 and Eric Schulte Nordholt

COUNTRY REPORT: TURKEY

Economic and Social Council

Presentation by Matthias Reister Chief, International Merchandise Trade Statistics

REPORT OF THE UNITED STATES OF AMERICA ON THE 2010 WORLD PROGRAM ON POPULATION AND HOUSING CENSUSES

Economic and Social Council

Country Paper : Macao SAR, China

COUNTRY: Questionnaire. Contact person: Name: Position: Address:

Data Processing of the 1999 Vietnam Population and Housing Census

Use of administrative data in statistics Nordic experiences. Kaija Ruotsalainen UN World Data Forum January, Cape Town, South Africa

5 TH MANAGEMENT SEMINARS FOR HEADS OF NATIONAL STATISTICAL OFFICES (NSO) IN ASIA AND THE PACIFIC SEPTEMBER 2006, DAEJEON, REPUBLIC OF KOREA

Maintaining knowledge of the New Zealand Census *

Economic and Social Council

Canada Agricultural Census 2011 Explanatory notes

2020 Population and Housing Census Planning Perspective and challenges for data collection

WIPO Development Agenda

Lessons learned from a mixed-mode census for the future of social statistics

Census 2000 and its implementation in Thailand: Lessons learnt for 2010 Census *

The 45 Adopted Recommendations under the WIPO Development Agenda

FOREWORD. [ ] FAO Home Economic and Social Development Department Statistics Division Home FAOSTAT

UNFPA/WCARO Census: 2010 to 2020

Data Integration Activities on the Way to the Dutch Virtual Census of 2011

International Workshop on Economic Census

Methodology Statement: 2011 Australian Census Demographic Variables

Use of administrative sources and registers in the Finnish EU-SILC survey

United Nations Statistics Division Programme in Support of the 2020 Round of Population and Housing Censuses

Regional Course on Integrated Economic Statistics to Support 2008 SNA Implementation

EXPERT GROUP MEETING ON CONTEMPORARY PRACTICES IN CENSUS MAPPING AND USE OF GEOGRAPHICAL INFORMATION SYSTEMS New York, 29 May - 1 June 2007

Business Plan Summary

The Accuracy and Coverage of Internet based Data collection for Korea Population and Housing Census

1 NOTE: This paper reports the results of research and analysis

The future development of the Swedish register system

Overview. Scotland s Census. Development of methods. What did we do about it? QA panels. Quality assurance and dealing with nonresponse

Measuring ICT use by businesses in Brazil: The Project of the Brazilian Institute of Geography and Statistic (IBGE)

MODERN CENSUS IN POLAND

AU PAIR NEW ZEALAND Au Pair - Family Contract

ccess to Cultural Heritage Networks Across Europe

United Nations Statistics Division Programme in Support of the 2020 Round of Population and Housing Censuses

National Economic Census 2018: A New Initiative in National Statistical System of Nepal

Draft submission paper: Hydrographic Offices way on EMODnet. Subject : Hydrographic Offices way on EMODnet. Foreword :

The Policy Content and Process in an SDG Context: Objectives, Instruments, Capabilities and Stages

The main focus of the survey is to measure income, unemployment, and poverty.

Workshop on Census Data Processing Doha, Qatar 18-22/05/2008

Statistics for Development in Pacific Island Countries: State-of-the-art, Challenges and Opportunities

Prepared by. Deputy Census Manager Zambia

Facilitating Technology Transfer and Management of IP Assets:

Session 12. Quality assessment and assurance in the civil registration and vital statistics system

Getting the evidence: Using research in policy making

An Overview of the American Community Survey

João Cadete de Matos. João Miguel Coelho Banco de Portugal Head of the Current and Capital Accounts Statistics Unit

Economic Census: Indonesia s Experience. Titi Kanti Lestari. Wikaningsih REGIONAL SEMINAR ON INTERNATIONAL TRADE STATISTICS

COMPARATIVE STUDY ON THE IMPORTANCE OF THE CIVIL REGISTRATION STATISTICS. Patrick Nshimiyimana

Drafted by Anne Laurence 9 Dec 2013

Exploring emerging ICT-enabled governance models in European cities

Traveler Behavior and Values Research for Human-Centered Transportation Systems

Convergence and Differentiation within the Framework of European Scientific and Technical Cooperation on HTA

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

Transforming the Census

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland

SUSTAINABILITY OF RESEARCH CENTRES IN RELATION TO GENERAL AND ACTUAL RISKS

A QUALITY ASSURANCE STRATEGY IN MALAYSIA 2020 POPULATION AND HOUSING CENSUS

The Economic Census and Its Role in Economic Statistics

Data sources data processing

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd

SESSION 11. QUALITY ASSESSMENT AND ASSURANCE IN THE CIVIL REGISTRATION

Armenian Experience on Agricultural Census

GOVERNMENT RESOLUTION ON THE OBJECTIVES OF THE NATIONAL INFORMATION SOCIETY POLICY FOR

Quality assessment in a register-based census administrative versus statistical concepts in the case of households

RACE TO THE TOP: Integrating Foresight, Evaluation, and Survey Methods

SESSION 3: ESSENTIAL FEATURES, DEFINITION AND METHODOLOGIES OF POPULATION AND HOUSING CENSUSES: MALAYSIA

CENSUS DATA COLLECTION IN MALTA

Sixth Management Seminar for the Heads of National Statistical offices in Asia and the Pacific

Economic and Social Council

The work under the Environment under Review subprogramme focuses on strengthening the interface between science, policy and governance by bridging

METHODOLOGY NOTE Population and Dwelling Stock Estimates, , and 2015-Based Population and Dwelling Stock Forecasts,

CBSME-NSR. Priority. Priority 1 Thinking Growth: Supporting growth in North Sea Region economies

THE 2009 VIETNAM POPULATION AND HOUSING CENSUS

METHODOLOGY OF AGGREGATION OF POPULATION DATA FROM CENSUS DATA TO GRID DATA

At its meeting on 18 May 2016, the Permanent Representatives Committee noted the unanimous agreement on the above conclusions.

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Fiscal 2007 Environmental Technology Verification Pilot Program Implementation Guidelines

Buenos Aires Action Plan

Country report Germany

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

Use of Administrative Data for Statistical purposes: Bangladesh perspective

Understanding and Using the U.S. Census Bureau s American Community Survey

DESAin collaboration with the ESCAP, the ECLAC, the ECA, the ESCWAand the ECE ($810,600)

Measurement for Generation and Dissemination of Knowledge a case study for India, by Mr. Ashish Kumar, former DG of CSO of Government of India

Transcription:

Ref. Ares(2017)888280-17/02/2017 Page REPORT 1 (12) 2016-11-03 Claus-Göran Hjelm Final technical report on Improvement of the use of administrative sources (ESS.VIP ADMIN WP6 Pilot studies and applications) Innehåll Description of methodology and procedures... 2 Legal Framework in Sweden... 2 Contacts and communication with data providers... 2 Processing and editing of the data... 4 Example of editing with single or multiple data sources... 5 Quality evaluation... 6 Plan for further changes in the statistical production as a result over the action... 6 Modeling concept... 6 Common Swedish hub for energy data as a national example of new administrative data... 8 Usability of methods and procedures in other contexts... 8 Summary of problems encountered... 10 Methodological... 10 Legal... 10 Practical/technical... 10 Further actions that can improve the use of administrative sources.. 11 Legislation issues... 11 Future plans and work... 11 Lesson s learned... 11

2 (12) Description of methodology and procedures Statistics Sweden (SCB) has for some years a successful cooperation with Data Providers in Sweden concerning administrative data from other authorities. We have historically always have had the legislation to back us up here and this field where this don t apply is new both for us as a producer of statistics and for the data owners. We are in the process of finding routines and agreements here but this part is taking longer time than expected. Legal Framework in Sweden Important laws and regulations at the national level for production of official statistics in Sweden are: Official Statistics Act (Lag 2001:99 om den officiella statistiken) Official Statistics Regulation (Förordning 2001:100 om den officiella statistiken) Public Access to Information and Secrecy Act (Offentlighets- och sekretesslag 2009:400) The Personal Data Act (Personuppgiftslag 1998:204) (will be replaced by Regulation EU 2016/679 on data protection) The legal framework provides Statistics Sweden with the necessary legal rights to collect and process data, and to disseminate aggregated information, while at the same time protecting the rights of individuals, households, businesses, and organizations. Contacts and communication with data providers We have had meetings with Svensk Energi the industry association for the network owners in Sweden. One of the issues that were discussed at these meetings is the access to data on apartment level and the possibility to secure access to this data for the whole of year 2011. Quality aspects have been raised and this demands for a further discussion like the legal issues regarding the relations between the network owners and Statistics Sweden as an NSI. There are other areas where the energy data could make a big difference to the quality of official statistics. It is well known that in the entire Western world today sees a trend of steadily declining response rates in statistical surveys. Statistics Sweden is not spared from this and great efforts are being made to try to prevent and cope with the loss. One of the studies at Statistics Sweden is the hardest hit by huge losses are the Household Budget Survey, HBS. In this study, as in the current situation conducted about every four years, data is collected from a statistical sample of households. Contact has been

3 (12) taking to some important data providers from grocery store where we are interesting in a co-operation. We will have information from scanner data and they will have feed-back with statistics on aggregate level. We are trying to find out a win-win situation were mutually beneficial cooperation. The identifier is the transaction and not the individual. We will try to apply the same approach in our dealings with the network owners in Sweden. We also have had one meeting with the major network owner in Sweden and are in the phase of defining what win-win situations there are for both them and us. We have developed a template for contract between us and them and this is now for investigation both in our and their organisations. This is now only available in Swedish due to the fact that is will be changes from both parties.

4 (12) Processing and editing of the data There are four main roles involved, the data owner, a data recipient, register management and the end user of the administrative data within the organization. We call the bulk of the editing work production editing. It involves the editing tasks with both register populations and variables. Some editing will be part of register maintenance and keeping register quality, some will aim at specific surveys and its (perhaps) integrated microdata and some will be on microdata directed to improve statistical systems. The viewpoint of doing editing work at the earliest possible stage has to be adapted to the multiple uses of register data and other practical circumstances of the administrative data source. The possibilities to contact the sources are normally limited and this makes it hard to make corrections. Caution of using imputation techniques too early should also be applied, if there are many uses and users of the data. Below is the common model we will use for quality assurance of the data from the net owners.

5 (12) Example of editing with single or multiple data sources Two-phase cycle of error microdata, single (primary) source and integrated (secondary) source The figures are an extension of Bakker (2010) and borrows inspiration from the total survey error framework by Groves et.al. Conceptually, it illustrates sources of errors as well as roughly the type of error in a specific register survey, i.e. when there is a determined target population, given target parameters, domains etc.

6 (12) Quality evaluation The data was intended to increase/validate the quality of the last census. The basis for this was that we saw a need to validate if an apartment really was empty and/or if a person was living in the apartment that we have in our register-based census. One thing we can address with this data is the thing regarding uninhabited in i.e. Stockholm. There the official number of uninhabited apartments is around 50.000 apartments but we suspect that most of them are leased in second or even third hand due to the problematic situation, with the lack of apartments and we think that the energy consuming can indicate whether the apartment is inhabited or not. Plan for further changes in the statistical production as a result over the action According to the directives, Statistics Sweden is responsible for producing and disseminating official statistics on number of (but not all) subject matter areas. The recently established Strategy 2020 states that Statistics Sweden meets the current and future needs for reliable statistics as a basis for analysis, discussion and decision-making. It is further declares that To ensure input data for statistical purposes, Statistics Sweden steadily seeks and examines new potential sources of information. We see an increasing use of new administrative sources in order to validate, raise the quality and invent new statistical products in our production environment. We noted that in some countries the discussion is about Big Data, but we have in this report talked about administrative data, from the perspective that data is ordered and have unique identifiers. Modeling concept HBS is an important statistical product that is used in many places both within Statistics Sweden and decision-makers in society. The survey was a high priority and a variety of efforts to deal with the shortfall discussed. One possible measure is to introduce alternative data sources for HBS, and it is in this context that this request belongs. We have noted that this situation is similar to this issue regarding energy data and we need to find models in order to break down aggregated data to lower levels. A development project has started at Statistics Sweden, where we will look at other possible sources of data than telephone interviews, diary and receipts, and evaluate their potential to improve the statistics from HBS. Many of our respondents have limited knowledge of how large the energy

7 (12) consumption for their apartment/house is, for this reason the use of energy data could increase the quality of the HBS survey. This in combination with that energy is costly in Sweden and this is a cold country and the energy costs are a large part of the monthly expenses. HBS has a need to bring in other alternative data sources. The response rate in the survey and then you have discussed, among other things able to bring in electricity consumption in combination with current electricity price. HBS has also tried to bring in other sources where the information is available on an aggregate level. They have since begun to test some models that may have to be used for this purpose when the aggregated data are discussed. Alternative data sources could serve as a support in the work, for example, to validate directly the data collected, as weights or calibration of estimates or to reduce the burden. We haven t got the information on household level as is the measuring object for the survey. Therefore we must have a model procedure. We hope we can use this experience also in this project if we don t the possibility to get the data on apartment level. As an example of how this model thinking can be adopted in other statistical products is worth to mention that we experiment with using scanner data to predict food consumption for different types of Swedish households. Scanner data are files of transactions uniquely identified by a product code. Scanner data from retail are already in use at Statistics Sweden as input to the Consumer Price Index, but so far they have not been used for household statistics, simply since they do not contain any household information. Here we try a model approach to compensate for this lack of information. The primary aim of our approach is to allocate estimates of total household expenditures to domains of the population of households. We assume access to estimates of total household population expenditures on groups of commodities or services. Such estimates can be utilized as benchmark estimates in a design of a survey where consumption patterns among households are studied. In this presentation, we focus on one of the possible methods, where retail sales in a small geographical area, or at a local store, is modeled as a function of the demographic characteristics in the area. The idea is to utilize the reflection of different local demographics in local sales patterns. We think of this method as a projection of consumption on household domains.

8 (12) Common Swedish hub for energy data as a national example of new administrative data There is another Swedish project that are in the process of defining and managing a hub for Swedish energy data and Statistics Sweden takes an active part in this as a requirements definition for the statistical parts of this hub. The type of information in this hub is: 1. Installation data for every apartment 2. Customer data 3. Supplier Exchanges 4. The daily electrical data We see many potential areas for us as a NSI here. The things that can be mentioned is: Energy price statistics, statistics on changes of supplier of energy, energy usage in farming/household, energy use of electoral cars, distinguishing the electricity that is produced in various ways, fossil, renewable electricity and calculation of the over coverage, uninhabited apartments. The problem is that this solution will not be implemented until about 7-8 years. Until then, we must work with other solutions in close cooperation with the data providers in the energy area. Usability of methods and procedures in other contexts What we have addressed is the possibility to use methods from Data Scientist /Big Data technics in the production of statistics from administrative data and vise versa, that is the use of methods we have develop from administrative data in the use of Big Data. As an NSI with long background and experiences from the use of administrative data in the statistical production we have developed technics for data cleaning and checking of administrative data and as we noted that energy data in fact is well structured there is a lot of similarities with the methods we are using in out ordinary statistical production.

9 (12) A other good example of the integration between methods is that we have learned many technics and methods from the Data Scientists who have, for us, new technics and methods what not normally are used in our statistical production. This involves both statistical and infological methods and new approaches in how to make statistical matching, and this is a new are for us. The figure below shows the normal way of how we handle structured (administrative) data in the statistical production:

10 (12) Summary of problems encountered Methodological The net owners don t have the actual address/flat, but instead the delivery point for the electrify distribution. When we addressed the question regarding apartments that supposed to be empty but had a normal energy consuming they referred to the fact that apartments often (in lager city s) are sub rented in second of even third times without the knowledge of the apartment owner. One of the outcomes from this project is that since we have different definitions of apartment between us, for statistical purposes and the network owners, for energy purposes this is not compatible and useful on a micro level. This is not a problem for the quality insurance of the census. Legal The legal situation at Statistics Sweden is today a bit uncertain in the content of using new data sources for evaluation of official statistics. The legislation is under investigation and we can as is it is today not access and use administrative sources that are not direct connected to official statistics. From a R&D perspective are we hopeful that the interpretation shall be in our benefit, but as mentioned earlier is the legal status/situation today not clear. Practical/technical We started with the approach that we should use techniques and methods from the Big Data world, i.e. Data Scientist technics but we ended up in, from our experience, traditional methods and technics from our long use of administrative data. The main reason is that the technics developed for Big Data uses unstructured data and the technics and methods we are using when we make statistics from administrative data demands that the data is well structured, and this is the case here regarding the energy data. The fact that we during the project period not have had access to microdata from the data owners have had the implications that we have to develop methods without any actual real testing. We are curtain that the methods and experiences we have had during the project period are most helpful in the future when the legal and practical issues are removed.

11 (12) Further actions that can improve the use of administrative sources The implementation of the European Data Protection Initiative in Sweden can possibly apply on the possibility for Statistics Sweden as an NSI to get access to data. This is in Sweden, like in most European countries under investigation how this regulation shall be implemented in the different countries. Legislation issues We are in the process of defining the legal basis for our possibilities of using external data for new statistics and validation of already statistical products. For this purpose we have developed a list of checkpoints (i.e. The mandate of us as an NSI, Personal Integrity, Information Security, The principle of public access to official records, Archiving). Future plans and work Depending on the level on data that we in the future will have access to we see that we will form the future project plans on. One other aspect in connection to the quality is the metadata and the knowledge of data, primary from a process data perspective, i.e. how the data is collected and changed. Further on, if this pilots chows good result the formation of good relations with the Data Providers and a stabile process for production of the above mentioned new statistics are essential. The next step is to test the data we (probably) will have in the future, the way forward here is likely to use synthetic data with similar structure as the real data. Lesson s learned All together we have established contacts with industry association for the network owners and the individual companies. This took longer time than we were used to, and had prepared for. The, for us new thing was also that the industry association looked after their members and the individual companies looked after their customers. During the meeting we have discussed and explained how the statistical secrecy works and that we don t can give access to this data to other external parties. In practice this means that we both from the public as well as the private sector have to define and find win-win situations and for both parties. Also a legal solutions that are good for

12 (12) both parties. We also figured out that we used and had different definitions and therefore discussed different thing. We learned that we within Statistics Sweden have to clarify the legal preconditions for deliveries of external data in the form of a legal investigation. In that investigation we afterwards noted that we have to define and check the preconditions in form of legal agreement etc. From a NSI perspective this is new to us thou we are normally used to have legal mandate to use administrative data (normally from other agencies and not private companies). In this content we put a lot of effort to find win-win situations for both us and the companies. This ended up in the situation where we don t see us getting access to data from the network owners on a micro-level but instead in an aggregated form and the focus is now to build models for this evaluation. Bakker, B. (2010) Micro-integration: State of the Art. Paper for the Joint UNECE/Eurostat Expert Group. Meeting on Register-Based Censuses, The Hague, The Netherlands. Hjelm et al (2007) Register-based statistics in the Nordic countries, UNITED NATIONS PUBLICATION Laitila, T., Wallgren, A. and Wallgren B. (2011) Quality Assessment of Administrative Data. Research and Development Methodology reports from Statistics Sweden 2011:2 Wooldridge, J.M. (2002). Econometric Analysis of Cross Section Data, MIT Press, Cambridge, Massachusetts. Wallgren, A. and Wallgren, B. (2007) Register-based Statistics: Administrative Data for Statistical Purposes. New York, Wiley