The IPUMS-Europe project: Integrating the Region s Census Microdata

Similar documents
Collection and dissemination of national census data through the United Nations Demographic Yearbook *

Data Forum on Harmonization and Uses of European Microdata

; ECONOMIC AND SOCIAL COUNCIL

Demographic and Social Statistics in the United Nations Demographic Yearbook*

United Nations Demographic Yearbook Data Collection System

CONTRIBUTIONS OF THE INTERNATIONAL METROPOLIS PROJECT TO THE GLOBAL DISCUSSIONS ON THE RELATIONS BETWEEN MIGRATION AND DEVELOPMENT 1.

Country report Germany

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland

The Dutch Census IPUMS files of 1960, 1971, 2001 and Eric Schulte Nordholt

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

Preparing IPUMS samples for Ireland. Deirdre Cullen Senior Statistican

2011 National Household Survey (NHS): design and quality

Creating Original Datasets. at the Minnesota Population Center. U.S. data How a case gets from the manuscript census into the IPUMS

LOGO GENERAL STATISTICS OFFICE OF VIETNAM

COUNTRY REPORT MONGOLIA

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

Report on the First Trial Census of the Register-Based Population and Housing Census (REGREL)

Working with United States Census Data. K. Mitchell, 7/23/2016 (no affiliation with U.S. Census Bureau)

The 1999 Population Census in the Republic of Kazakhstan CENSUS QUESTIONNAIRE 3C

National Census Geography Some lessons learned and future challenges in European countries

Methodology Statement: 2011 Australian Census Demographic Variables

UK Data Service Introduction to Census

Migration statistics and 2021 Population Census in Spain. Why exchanging microdata? Antonio Argüeso National Statistics Institute (INE) Spain

Overview of Civil Registration and Vital Statistics systems

The Use of Population Census

MODERN CENSUS IN POLAND

1980 Census 1. 1, 2, 3, 4 indicate different levels of racial/ethnic detail in the tables, and provide different tables.

Prepared by. Deputy Census Manager Zambia

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

Creativity and Economic Development

American Community Survey Review and Tips for American Fact Finder. Sarah Ehresman Kentucky State Data Center August 7, 2014

Measuring Romania s Creative Economy

Call 22. Expert Workshop Research uses of high precision census samples. Programme

A Country paper on Population and Housing census of Nepal and Consideration for Electronic data capture

An Overview of the American Community Survey

Country presentation

SAMOA - Samoa National Population and Housing Census 2006

Understanding and Using the U.S. Census Bureau s American Community Survey

Population Censuses and Migration Statistics. Keiko Osaki Tomita, Ph.D.

Using registers E-enumeration and CAPI Electronic map. Census process. E-enumeration. Census moment and census period E-enumeration process

United Nations Statistics Division Programme in Support of the 2020 Round of Population and Housing Censuses

Supplement No. 7 published with Gazette No. 18 dated 30 August, THE STATISTICS LAW (1996 REVISION) THE CENSUS (CAYMAN ISLANDS) ORDER, 2010

POPULATION AND HOUSING CENSUSES

Study Assessment Criteria for Media Literacy Levels

Economic and Social Council

Data Integration Activities on the Way to the Dutch Virtual Census of 2011

The main focus of the survey is to measure income, unemployment, and poverty.

ESSnet on DATA INTEGRATION

Economic and Social Council

Key words: integrated census microdata, IPUMS-International, Millennium Development Goals, literacy, education, gender, Uganda

REGISTER-BASED CENSUS OF POPULATION, HOUSEHOLDS AND HOUSING, SLOVENIA, 1 JANUARY 2011

Strategies for the 2010 Population Census of Japan

2021 Coding Plans. Paul Waruszynski Office for National Statistics

2010 World Programme on Population and Housing Censuses Final Report March 2009 to February 2010

Austria Documentation

SESSION 3: ESSENTIAL FEATURES, DEFINITION AND METHODOLOGIES OF POPULATION AND HOUSING CENSUSES: MALAYSIA

The ONS Longitudinal Study

EMERGING METHODOLIGES FOR THE CENSUS IN THE UNECE REGION

PUBLIC ATTITUDES TOWARDS ROBOTS

The Future of Intangibles

Overview of the 2014 Myanmar Population and Housing Census. Prepared by the Census Office (Department of Population and UNFPA)

The Finnish Social Statistics System and its Potential

THE 2009 VIETNAM POPULATION AND HOUSING CENSUS

Dallas Regional Office US Census Bureau

Introduction to the course, lecturers, participants and the European Census 2021

National approaches to the dissemination of demographic statistics and their implication for the Demographic Yearbook

Sudan Experience in Conducting Population Censuses. Hagir Osman Eljack (corresponding author) & Awatif El Awad Musa.

Economic and Social Council

Evaluation and analysis of socioeconomic data collected from censuses. United Nations Statistics Division

Workshop on the Improvement of Civil Registration and Vital Statistics in SADC Region Blantyre, Malawi 1 5 December 2008

NIANG Mamadou Agence Nationale de la Statistique et de la Démographie (ANSD); Rue de St Louis x Rue de Diourbel Point E Dakar Sénégal Site web:

Italian Americans by the Numbers: Definitions, Methods & Raw Data

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd

Country Paper : Macao SAR, China

THE ECONOMICS OF DATA-DRIVEN INNOVATION

Data Processing of the 1999 Vietnam Population and Housing Census

Lessons learned from a mixed-mode census for the future of social statistics

Statistics for Development in Pacific Island Countries: State-of-the-art, Challenges and Opportunities

A Special Case of integrating administrative data and collection data in the context of the 2016 Canadian Census

Kenya - Population Census IPUMS Subset

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

Taming the Census TIGER:

TOWARDS POPULATION & HOUSING CENSUS OF MALAYSIA, 2020 (DATA COLLECTION WITH INTERNET)

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Central and Eastern Europe Statistics 2005

Economic and Social Council

Public Consultation: Science 2.0 : science in transition

Public Involvement in the Regional Sustainable Development

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA

Bahrain Census Experience

Economic and Social Council

Session 11. UNSD collection of vital statistics

IPUMS-International High Precision Population Census Microdata Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Extracts

Lesson Learned from the 2010 Indonesia Population and Housing Census Dudy S. Sulaiman, BPS-Statistics Indonesia

Maintaining knowledge of the New Zealand Census *

Population and dwellings Number of people counted Total population

Symposium 2001/36 20 July English

Census Data for Transportation Planning

Quick Reference Guide

Introduction Strategic Objectives of IT Operation for 2008 Census Constraints Conclusion

Transcription:

European Population Conference 2006 Topic 9 (Data and Methods) The IPUMS-Europe project: Integrating the Region s Census Microdata Dr. Albert Esteve (Centre d'estudis Demogràfics) Prof. Robert McCaa (Univeristy of Minnesota and Minnesota Population Center) Prof. Anna Cabré (Universitat Autònoma de Barcelona and Centre d'estudis Demogràfics) Abstract.- Census microdata are an invaluable resource for social science and policy research. Other sources such as demographic and labor force surveys often offer greater subject coverage and detail than do census data, but no alternate source offers comparable sample density, chronological depth, and geographic coverage. This paper describes the IPUMS- Europe project, a consortium lead by the Minnesota Population Center and the Centre d'estudis Demogràfics to anonymize, harmonize and distribute census microdata of eighteen European countries from the 1960s to the present. The database will contain anonymized microdata samples, encompassing as many as 50 censuses and totaling more than 70 million person records. Custom-tailored extracts will be delivered, at no charge, to bona fide researchers via the Internet. The new database will allow social scientists to make comparisons across European nations during decades of marked demographic change and extraordinary political and economic restructuring. Introduction.- Census microdata are an invaluable resource for social science research. Other sources such as demographic and labor force surveys often offer greater subject focus and detail than do census data, but no alternate source offers comparable sample density, chronological depth, and geographic coverage. A vast quantity of census microdata covering Europe in the period since the 1960s survives in machine-readable form. For much of Europe, census microdata are either unavailable or restricted, and are therefore seldom used. In the United States and Canada, however, census microdata have been available to researchers for almost forty years and have become an indispensable component of social science infrastructure. Thanks to the support of official statistical agencies in 14 European countries and major funding by the National Institutes of Health and the European Commission Sixth Framework Programme, the Integrated European Census Microdata database, one of the world s largest integrated research infrastructure for the study of human populations, is now under construction. The database will contain anonymized microdata samples encompassing as many

as 50 censuses and totaling more than 70 million person records. The National Institutes of Health (NIH) have awarded the Minnesota Population Center (MPC) a major grant to undertake a five-year initiative to create integrated and fully documented samples of over sixty European censuses and micro-censuses from the 1960s to the present (IPUMS-Europe project). The project will integrate and disseminate the census microdata of Austria, Belarus, Bulgaria, the Czech Republic, France, Germany, Greece, Hungary, the Netherlands, Portugal, Romania, Slovenia, Spain and the United Kingdom. In addition, the Centre d'estudis Demogràfics (CED) has been successful in attracting European Union Sixth Framework Program s support for coordination, dissemination and harmonization. EU funds have already provided for an inaugural workshop, held in Barcelona in July in 2005, at which census experts discussed harmonization strategies to integrate European census microdata across space and time (Coordinating the Integration of European Census Microdata, CIECM project). The Sixth Framework Program will also support a three year initiative to build and European web-based dissemination extract site, housed at the Centre d Estudis Demogràfics, which will make the European microdata and metadata more widely available for scholarly and educational research (Disseminating the Integrated European Census Microdata, DIECM project). Finally, to fully capitalize on the potential of European census microdata, a third project has been approved to design harmonizations for ~50 priority variables for each census & country for which microdata samples are entrusted to the project ensuring that coding schemes that reflects census practices of European states as well as the principles and recommendations of Eurostat with regard to census concepts and nomenclatures (Harmonizing the Integrated European Census Microdata, HIECM project). Confidentiality protections.- The IPUMS-Europe project distributes integrated microdata of individuals and households only by agreement of the corresponding national statistical offices and under the strictest of confidence. These protections involve three elements: 1. dissemination agreements between the University of Minnesota and each NSI 2. user licenses between, on the one hand, the University of Minnesota or other authorized distributor such as the CED, and, on the other, each researcher 3. data protection measures to prevent the identification of individuals, families or other entities in the data. Data Quality.- In addition to providing harmonized codes for variables and accompanying documentation, the IPUMS-Europe project is carrying out a variety of additional tasks to improve data quality, not all of which have been implemented in the first release of the data. These tasks include the following: Cleaning data to eliminate duplicate records, inappropriately merged households, and other errors Developing internal consistency checks to maximize data integrity. This includes, for example, examining consistency between age and marital status, occupation, and

school attendance; looking for persons with multiple spouses for countries in which this is not an accepted custom; and checking for agreement between household and individual characteristics. Implementing allocation procedures to impute values for missing or inconsistent data items, using logical edits together with probabilistic "hot deck" methodology. A data quality flag identifies allocated data items. Creating constructed variables to simplify data analysis, including family interrelationship variables. A system of logical rules identifies the record number within each household of the individual s mother, father, or spouse, if they were present in the household. These pointers allow users to automatically attach the characteristics of these kin or to construct measures of fertility and family composition. Other constructed variables describe family and household characteristics at the individual and household level (such as family and subfamily membership, family and subfamily size, and number of own children). Harmonization.- European census samples employ differing numeric classification systems and reconciliation of these codes is a major effort. Variables must be easy to use for comparisons across time and space. This requires that we provide the lowest common denominator of detail that is fully comparable. On the other hand, we must retain all meaningful detail in each sample, even when it is unique to a single dataset. For most variables, it is impossible to construct a single uniform classification without losing information. Some samples provide far more detail than others, so the lowest common denominator of all samples inevitably loses important information. Composite coding schemes offer a solution. Similar to that used by the International Labor Organization for occupations and industries, we apply composite coding to each variable to retain all original detail, and at the same time provide comparable codes across countries and censuses. The first one or two digits of the code provide information available across all samples. The next one or two digits provide additional information available in a broad subset of samples. Finally, trailing digits provide detail only rarely available. IPUMS-Europe users will have access to at least 50 priority variables harmonized according to intra-european coding schemes and disseminated by the DIECM project. These variables cover all census topics (Demography, Education, Economic Activity, Migration, Household Composition, and Dwelling characteristics) and are available in the vast majority of countries (See Figure 1).

Figure 1. Selected variable topic availability, by country - 2000 Census round AUS BLR BUL CZ FRA GER GRE HUN POL POR ROM RUS SLV SPA UK PERSON VARIABLES Demographic and social Relationship to household head X X X X X X X X X X X X X X X Age X X X X X X X X X X X X X X X Sex X X X X X X X X X X X X X X X Maritial Status X X X X X X X X X X X X X X X Age at first marriage.. X X. X X X X. X.... Citizenship X X X X X X X X X X. X X.. Religion X. X X... X. X X. X. X Language X. X X... X X. X X X X X National and/or ethnic group.. X X... X X. X. X. X Children ever born X X X.. X X X X. X X X.. Education Literacy X X X... X.. X. X. X. School attendance. X X. X X X X X X X X X X X Educational attainment X. X X X X X X X X X X X X X Economics Employment status X X X X X X X X X X X X X X X Time worked X... X X X X. X X.. X X Unemployment duration.... X X.. X. X... X Occupation X X X X X X X X X X X X X X X Industry.... X X X... X X X X X Socio-economic status in employment X X. X.. X. X... X. X Class of worker X X.. X X X X.. X X X X X Place of work X X X X X X X X.. X X X X X Mode of transport to work X. X X X X. X. X.. X X X Length and frequency of journey to work X.. X. X X X.... X X X Disability.. X.... X X X.... X Migration Place of usual residence X X X. X X X X X X X. X X X Size of place, urban/rural X X X. X X X X X X X. X.. Place of birth, within country X. X. X. X. X X X X X X X Place of previous residence. X X X X X X X X X X. X X X Country of birth X X X. X. X. X. X X X X X Reason for immigration. X.... X. X.. X X.. Country of citizenship. X X. X X X X.. X X... Year/period of immigration.... X X X..... X.. HOUSEHOLD VARIABLES Household characteristics Location X X X X X X X X X X X X X X X Tenure status X X X. X. X X. X X X X. X Rent X X....... X X X... Number of vehicles... X X........ X X Living quarters Ownership X X X X X. X X X X X. X X X Vacancy status... X X. X X X X X. X.. Number of occupants. X X... X.. X. X X.. Number of rooms X X X. X. X X X X. X X X X Useful and/or living floor space X.. X X.. X X. X X X X. Facilities Electricity. X X... X.. X X X X.. Water / hot water X X X X.. X X X X X X X.. Sewage.. X X X. X X X X X X X.. Toilet X X X X X. X X X X X. X.. Bathing facilities X X X X X. X. X X X X X. X Type of heating X X X X X. X X X X X X X X X Piped gas. X. X... X X. X X X X. Building characteristics Type with regard to constructuion / use X X X X X. X X X X X X X.. Period of construction. X X X X. X X X. X X X X. Position of dwelling in the building... X X X X... X X X.. Number of dwellings in the building X. X X X. X... X. X.. Construction materials. X X X... X... X X.. Note: a single variable topic in this table can represent multiple variables in the source data. AUS: Austria, BLR: Belarus, BUL: Bulgaria, CZ: Czech Rep., FRA: France, GER: Germany, GRE: Greece, HUN: Hungary, POL: Poland, POR: Portugal, ROM: Romania, RUS: Russia, SLV: Slovenia, SPA: Spain, UK: United Kingdom Potential Impact.- The availability of consistent microdata for all of Europe over a broad time span will have a profound effect on the practice of social science research. The new database will allow social scientists to make comparisons across European nations during decades of

marked demographic change and extraordinary political and economic restructuring, including the shift to free market economies in Eastern Europe and the growth and development of the European Union. In concert with data from other census integration projects, these European data will also stimulate international comparative research across continental boundaries. The data will result in an outpouring of new scientific and policy-relevant research on population aging, economic transformation, demographic change, international migration, and many other topics. The European microdata series will help policymakers and scholars make informed decisions about the most obvious topics of analysis.