Best Practices for Automated Linking Using Historical Data: A Progress Report

Size: px
Start display at page:

Download "Best Practices for Automated Linking Using Historical Data: A Progress Report"

Transcription

1 Best Practices for Automated Linking Using Historical Data: A Progress Report Preliminary; Comments are welcome Ran Abramitzky 1 Leah Boustan 2 Katherine Eriksson 3 James Feigenbaum 4 Santiago Perez 5 World Economic History Congress August Stanford 2 Princeton 3 UC Davis 4 Boston University 5 UC Davis 1

2 The Linking Challenge A recent explosion of research using linked historical data Census to census, other sources to the census In historical records, we usually don t have unique IDs, so we need to link people from one census to another on fields that will not change across censuses Names, Year of Birth, Place of Birth 2

3 James Alexander in

4 James Alexander in 1910 Nice and Simple? 4

5 James Alexander in 1910 Nice and Simple? What if... there are 20 more James Alexander s from Wales with the same age? there is another 29 years old James Alexander? How about another Jmaes Alexander or Jim Alexander? 4

6 Linking is inevitably imperfect Lots of things can go wrong: Enumeration error, transcription error, mortality, return migration, under-enumeration between Census years, and people with same attributes make it impossible to know the correct match with certainty,... 5

7 Linking is inevitably imperfect Lots of things can go wrong: Enumeration error, transcription error, mortality, return migration, under-enumeration between Census years, and people with same attributes make it impossible to know the correct match with certainty,... We face a trade-off Erroneously deeming two unrelated records as a match (Type I error) Erroneously neglecting true matches (Type II error) 5

8 Two General Ways to Link 1. Linking by hand Advantage: We humans trust other humans Disadvantages: Expensive; non replicable; impossible to search for a single record in a census without some use of automated algorithms; names of immigrants may be very hard for US hand linkers 6

9 Two General Ways to Link 2. Linking using automated algorithms Advantages: Rule based, cheap, replicable, can compare any two records Disadvantage: May not match the holistic similarity of different names that humans perform based on experience 7

10 The Goal of this Paper Evaluate various automated methods Discuss their pros and cons discrepancy rates sample size representativeness relative to population of interest Compare to human linkers What do humans and automated linkers see for linking data? Suggest best practices and provide transparent automated linking code 8

11 Paper s Take-Away Automated methods almost entirely reproduce hand links when humans and machine use the same information for linking Hand and automated methods both have some discrepancies with hand links that are based on rich set of genealogical variables But discrepancies are reasonably low In cases of discord, not at all obvious who is right: hand vs. machine 9

12 Paper s Take-Away Automated methods almost entirely reproduce hand links when humans and machine use the same information for linking Hand and automated methods both have some discrepancies with hand links that are based on rich set of genealogical variables But discrepancies are reasonably low In cases of discord, not at all obvious who is right: hand vs. machine We evaluate three commonly used automated methods and find all perform very well All three trade off Type I and Type II errors at different rates 9

13 We Evaluate Three Automated Linking Methods 1. ABE/Ferrie-style: Using exact names/nysiis adjusted/jaro-winkler distance With or without requiring uniqueness within 5 year band 2. Expectation Maximization (EM): combining age and name distance into a single score reflecting the probability that each two records are a true match 3. Machine Learning (ML): train an algorithm with data linked by hand to make matches like a human RA, using various record features 10

14 Need Ground Truth Data to Study Type I vs Type II Errors 100% for-sure ground truth linked census data does not exist 11

15 Need Ground Truth Data to Study Type I vs Type II Errors 100% for-sure ground truth linked census data does not exist 1. Data linked from the Union Army Records to 1900 Census by Dora Costa s team Data was created using trained RAs who had access to extra information not typically available (e.g. pension records) Carefully (and expensively) hand-collected 2. Two independent hand linkers observing only the variables we show the automated methods (names, year of birth, state of birth) Linked by trained Assistant Professor and Undergrad (made very similar links) 11

16 Need Ground Truth Data to Study Type I vs Type II Errors 100% for-sure ground truth linked census data does not exist 1. Data linked from the Union Army Records to 1900 Census by Dora Costa s team Data was created using trained RAs who had access to extra information not typically available (e.g. pension records) Carefully (and expensively) hand-collected 2. Two independent hand linkers observing only the variables we show the automated methods (names, year of birth, state of birth) Linked by trained Assistant Professor and Undergrad (made very similar links) 3. Hand linked Iowa 1915 to to 1940 from two different transcriptions 11

17 How Do Automatic Methods Do? Two ways to assess links and methods PPV is a measure of accuracy or precision Positive Predictive Value: CorrectMatches Matches Of the links made by the algorithm, what share agree with the reference data? TPR is a measure of recall or recall True Positive Rate: CorrectMatches Observations Of the links to be made, what share did the algorithm find? These measures trade off against one another 12

18 How Do Automatic Methods Do? Two ways to assess links and methods PPV is a measure of accuracy or precision Positive Predictive Value: CorrectMatches Matches Of the links made by the algorithm, what share agree with the reference data? TPR is a measure of recall or recall True Positive Rate: CorrectMatches Observations Of the links to be made, what share did the algorithm find? These measures trade off against one another We calculate these measures for our algorithms We include several variations of each algorithm in the following graphs 12

19 13

20 14

21 Automated methods almost entirely reproduce hand links when using the same set of matching variables We have two datasets in which human linkers and automated algorithms made links, all using same information for linking names, year of birth, state of birth 1. Union army data linked by hand to 1900 census 2. Iowa 1915 sons linked by hand to 1940 census 15

22 Linking Union Army to 1900 with Same Information: All Automated Methods Have ~90% Accuracy 1.9 PPV=(#correct/#matches) ABE ABE(DoraB) ABE-JW ABE-JW(DoraB) EM EM(DoraB) ML ML(DoraB) Solid shapes treat human links with limited information as the truth TPR=(#correct/#observations) % agreement between hand and automated methods 16

23 Iowa records linked by hand to 1940 census Over 90% agreement between hand and automated methods 17

24 When the human and machine disagree... who is correct? 18

25 When the human and machine disagree... who is correct? 18

26 When the human and machine disagree... who is correct? 19

27 When the human and machine disagree... who is correct? 20

28 Why is linking so hard? Common names, name changes, limited variables, age misreporting and age heaping, enumeration errors, death, return migration 21

29 Why is linking so hard? Common names, name changes, limited variables, age misreporting and age heaping, enumeration errors, death, return migration But we think transcription errors are a BIG deal Cursive is really hard to read 21

30 Why is linking so hard? Common names, name changes, limited variables, age misreporting and age heaping, enumeration errors, death, return migration But we think transcription errors are a BIG deal Cursive is really hard to read We have two copies of the 1940 Federal Census FamilySearch and Ancestry.com We know for sure which links are true because the same underlying census manuscript pages were the same Allows us to isolate transcription errors 21

31 We find that transcription error is generally high (~10 to 15%), and higher for foreign born 22

32 An Upper Bound for Linking? When we link 1940 to itself Of the links made, automated methods get almost 100% correct But we can only link 43 to 67% of the census Common names Brutally bad transcription 23

33 How do linked samples compare? Ultimately, we want to do economics with the linked data 24

34 How do linked samples compare? Ultimately, we want to do economics with the linked data Will the results of our analysis depend on which method we choose? 24

35 Parameter Estimates Are Very Stable Across Linking Methods We regress occupation score in 1900 on Occupation score in the Union Army data and height in the Union Army data We also regress occupation score on height within the Union Army data Hand and automated linked samples recover similar estimated effects Occupation scores at enlistment correlates positively with occupation score in 1900 All methods estimate similar positive correlations, though only ABE and ML have sufficient statistical power Height on occupation score is a null in Costa data All methods recover that 25

36 Parameter Estimates Are Very Stable Across Linking Methods Occupation Score Intragenerational Elasticity 26

37 Parameter Estimates Are Very Stable Across Linking Methods Height Effects on Occupation Score in Union Army Records 27

38 Why is our takeaway so different than Bailey et al (2017)? Our takeaway on automated methods is more positive Our paper: 3-25% false positives Bailey et al: 25-82% false positives 1. Available matching variables matter Bailey et al LIFE-M data have middle name, exact date of birth, etc Bailey et al assumes hand links are ground truth We find similar discrepancy rates for hand linked and automated samples using same matching variables 28

39 Why is our takeaway so different than Bailey et al (2017)? 2. Bailey et al exclude better-performing automated methods designed to reduce false positive rates and include rarely used methods Widely-used ABE conservative method requiring name uniqueness within 5 year band is only discussed in the online appendix (= 19% false positives ) Better-performing second-generation automated methods are not evaluated (JW, EM) Never and rarely used methods known to have high false positive rates are included (e.g., ABE with Soundex [= 43% false positive ] is never used, random matches [= 54-69% false positive ] are rarely used) 29

40 Thanks! This project is preliminary and comments are welcome! 30

LIFE-M. Longitudinal, Intergenerational Family Electronic Microdata

LIFE-M. Longitudinal, Intergenerational Family Electronic Microdata LIFE-M Longitudinal, Intergenerational Family Electronic Microdata Martha J. Bailey Professor of Economics and Research Professor, Population Studies Center University of Michigan What is LIFE-M? A large

More information

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices]

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices] ONLINE APPENDICES for How Well Do Automated Linking Methods Perform in Historical Samples? Evidence from New Ground Truth Martha Bailey, 1,2 Connor Cole, 1 Morgan Henderson, 1 Catherine Massey 1 1 University

More information

Socio-Economic Status and Names: Relationships in 1880 Male Census Data

Socio-Economic Status and Names: Relationships in 1880 Male Census Data 1 Socio-Economic Status and Names: Relationships in 1880 Male Census Data Rebecca Vick, University of Minnesota Record linkage is the process of connecting records for the same individual from two or more

More information

Learning Objectives. Getting Started With Your Family History. US Census: Population Schedules. Why census data is valuable to family history

Learning Objectives. Getting Started With Your Family History. US Census: Population Schedules. Why census data is valuable to family history Learning Objectives Getting Started With Your Family History Ancestors in the Census outline when US censuses were conducted & when made publicly available locate online & use freely available US censuses

More information

USING CENSUS RECORDS IN GENEALOGICAL RESEARCH AN ONLINE COURSE

USING CENSUS RECORDS IN GENEALOGICAL RESEARCH AN ONLINE COURSE IN GENEALOGICAL RESEARCH AN ONLINE COURSE Syllabus An NGS Online Course IN GENEALOGICAL RESEARCH SYLLABUS Copyright 2009 National Genealogical Society 3108 Columbia Pike, Suite 300 Arlington, Virginia

More information

Census - General info

Census - General info By Clint Williams Quitta family Census - General info Censuses are available from 1790-1940 in ten year increments (except for 1890 and a few other burned or lost records). Note that the most useful censuses

More information

Chapter 1: Economic and Social Indicators Comparison of BRICS Countries Chapter 2: General Chapter 3: Population

Chapter 1: Economic and Social Indicators Comparison of BRICS Countries Chapter 2: General Chapter 3: Population 1: Economic and Social Indicators Comparison of BRICS Countries 2: General 3: Population 3: Population 4: Economically Active Population 5: National Accounts 6: Price Indices 7: Population living standard

More information

Data Integration Activities on the Way to the Dutch Virtual Census of 2011

Data Integration Activities on the Way to the Dutch Virtual Census of 2011 Data Integration Activities on the Way to the Dutch Virtual Census of 2011 Eric Schulte Nordholt Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section

More information

Linking Together the Entire US Population. Joe Price rll.byu.edu

Linking Together the Entire US Population. Joe Price rll.byu.edu Linking Together the Entire US Population Joe Price joe_price@byu.edu rll.byu.edu Two Audacious goals for 2018 [1] Handwriting recognition + NLP Convert records in archives, libraries, churches and courthouses

More information

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233 MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233 I. Introduction and Background Over the past fifty years,

More information

Using Administrative Records for Imputation in the Decennial Census 1

Using Administrative Records for Imputation in the Decennial Census 1 Using Administrative Records for Imputation in the Decennial Census 1 James Farber, Deborah Wagner, and Dean Resnick U.S. Census Bureau James Farber, U.S. Census Bureau, Washington, DC 20233-9200 Keywords:

More information

Searching US Records for Your Immigrant Ancestor

Searching US Records for Your Immigrant Ancestor Searching US Records for Your Immigrant Ancestor Western New York Genealogical Society, http://www.wnygs.org/ 22 March 2014 Dennis Hogan, Slides are Online At http://www.wnygs.org/ &, click on Lectures

More information

Population Censuses and Migration Statistics. Keiko Osaki Tomita, Ph.D.

Population Censuses and Migration Statistics. Keiko Osaki Tomita, Ph.D. Population Censuses and Migration Statistics Keiko Osaki Tomita, Ph.D. Global Compact for Safe, Orderly and Regular Migration Objective 1: Collect and utilize accurate and disaggregated data as a basis

More information

Resources for Family History Project

Resources for Family History Project Resources for Family History Project Historical Record Type St. Louis County Library-History & Genealogy Location/Place of Residence: Census City directories Immigration: Passenger lists Naturalization

More information

Census Records. P. J. Smith

Census Records. P. J. Smith Census Records P. J. Smith What is a census? Regularly occurring and official count of a particular population Apportioning Congressional representatives Apportioning taxes Provides statistics for planning

More information

Follow your family using census records

Follow your family using census records Census records are one of the best ways to discover details about your family and how that family changed every 10 years. You ll discover names, addresses, what people did for a living, even which ancestor

More information

Health, gender and mobility: Intergenerational correlations in longevity over time

Health, gender and mobility: Intergenerational correlations in longevity over time Health, gender and mobility: Intergenerational correlations in longevity over time John Parman September 17, 2017 Abstract Changes in intergenerational mobility over time have been the focus of extensive

More information

Where Do I Begin? Basic Forms Family Group Sheet. Where Do You Start? Basic Forms-Pedigree Chart. Where Do I Begin? 7 October 2017

Where Do I Begin? Basic Forms Family Group Sheet. Where Do You Start? Basic Forms-Pedigree Chart. Where Do I Begin? 7 October 2017 Where Do You Start? Where Do I Begin? GenCOMO October 7, 2017 Start with yourself and work backwards Gather pictures and documents Put your ancestor in a specific time and place Record all the facts Full

More information

SPECIAL FEDERAL CENSUS SCHEDULES AN ONLINE COURSE

SPECIAL FEDERAL CENSUS SCHEDULES AN ONLINE COURSE SPECIAL FEDERAL CENSUS SCHEDULES AN ONLINE COURSE Syllabus An NGS Online Course SYLLABUS Copyright 2009 National Genealogical Society 3108 Columbia Pike, Suite 300 Arlington, Virginia 22204-4370 Telephone:

More information

An Automated Record Linkage System - Linking 1871 Canadian census to 1881 Canadian Census

An Automated Record Linkage System - Linking 1871 Canadian census to 1881 Canadian Census An Automated Record Linkage System - Linking 1871 Canadian census to 1881 Canadian Census Luiza Antonie Peter Baskerville Kris Inwood Andrew Ross Abstract This paper describes a recently developed linkage

More information

Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000

Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000 Figure 1.1 Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000 80% 78 75% 75 Response Rate 70% 65% 65 2000 Projected 60% 61 0% 1970 1980 Census Year 1990 2000 Source: U.S. Census Bureau

More information

Percentage Change in Population for Nebraska Counties: 2010 to 2016

Percentage Change in Population for Nebraska Counties: 2010 to 2016 Percentage Change in Population for Nebraska Counties: 2010 to 2016 Percentage Change in Population: 2010-2016 State of Nebraska Increased by 4.4% from 2010-2016 Population Loss of more than 5% (17 counties)

More information

Digit preference in Nigerian censuses data

Digit preference in Nigerian censuses data Digit preference in Nigerian censuses data of 1991 and 2006 Tukur Dahiru (1), Hussaini G. Dikko (2) Background: censuses in developing countries are prone to errors of age misreporting due to ignorance,

More information

Methods and Techniques Used for Statistical Investigation

Methods and Techniques Used for Statistical Investigation Methods and Techniques Used for Statistical Investigation Podaşcă Raluca Petroleum-Gas University of Ploieşti raluca.podasca@yahoo.com Abstract Statistical investigation methods are used to study the concrete

More information

Monday, 1 December 2014

Monday, 1 December 2014 Monday, 1 December 2014 9:30 10:00 Welcome/opening remarks Introduction of the participants 10:00-11:00 Introduction to evaluation of census data Objectives of evaluation of census data, types and sources

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Section 1 1: Descriptive Statistics: Chapter 1: Introduction to Statistics The first 3 chapters of this course will develop the concepts involved with Descriptive Statistics. Descriptive Statistics is

More information

Introduction to New Jersey Genealogy Regina Fitzpatrick, Genealogy Librarian

Introduction to New Jersey Genealogy Regina Fitzpatrick, Genealogy Librarian Introduction to New Jersey Genealogy Regina Fitzpatrick, Genealogy Librarian Introduction New Jersey is one of the thirteen original colonies, with European settlements dating from the 17 th Century. New

More information

LINKING HISTORICAL CENSUSES: A NEW APPROACH STEVEN RUGGLES

LINKING HISTORICAL CENSUSES: A NEW APPROACH STEVEN RUGGLES LINKING HISTORICAL CENSUSES: A NEW APPROACH STEVEN RUGGLES This article describes a new initiative at the Minnesota Population Center (MPC) to create linked representative samples of individuals and family

More information

POPULAT A ION DYNAMICS

POPULAT A ION DYNAMICS POPULATION DYNAMICS POPULATIONS Population members of one species living and reproducing in the same region at the same time. Community a number of different populations living together in the one area.

More information

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP) Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP) Hochang Choi, Statistical Analyst, Stats NZ Paper prepared for the

More information

Tommy W. Gaulden, Jane D. Sandusky, Elizabeth Ann Vacca, U.S. Bureau of the Census Tommy W. Gaulden, U.S. Bureau of the Census, Washington, D.C.

Tommy W. Gaulden, Jane D. Sandusky, Elizabeth Ann Vacca, U.S. Bureau of the Census Tommy W. Gaulden, U.S. Bureau of the Census, Washington, D.C. 1992 CENSUS OF AGRICULTURE FRAME DEVELOPMENT AND RECORD LINKAGE Tommy W. Gaulden, Jane D. Sandusky, Elizabeth Ann Vacca, U.S. Bureau of the Census Tommy W. Gaulden, U.S. Bureau of the Census, Washington,

More information

Country presentation

Country presentation Country presentation on Experience of census in collecting data on emigrants and returned migrants: questionnaire design; quality assessment; data dissemination; plan for the next round Muhammad Mizanoor

More information

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them factsheet 9 The Census questions A look at the questions asked in Northern Ireland and why we ask them The 2001 Census form contains a total of 42 questions in Northern Ireland, the majority of which only

More information

Techniques on how to use websites for Cherokee Research, Part 1 & 2

Techniques on how to use websites for Cherokee Research, Part 1 & 2 Techniques on how to use websites for Cherokee Research, Part 1 & 2 April 8, 2014 Gene Norris, Genealogist Cherokee National Historical Society, Inc. Tahlequah, Cherokee Nation www.ancestry.com Although

More information

Perry County Pioneers Lineage Society. Rules and Application Procedures

Perry County Pioneers Lineage Society. Rules and Application Procedures Perry County Pioneers Lineage Society Rules and Application Procedures Read these rules and procedures before starting the process Perry County Pioneers is a way to honor those people who settled in Perry

More information

Automating NSF HERD Reporting Using Machine Learning and Administrative Data

Automating NSF HERD Reporting Using Machine Learning and Administrative Data Automating NSF HERD Reporting Using Machine Learning and Administrative Data Rodolfo H. Torres CIMA Session: The Use of Advance Analytics to Drive Decisions 2018 APLU Annual Meeting New Orleans Marriott,

More information

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND Supplementary questionnaire on the 2011 Population and Housing Census Fields marked with are mandatory. INTRODUCTION As

More information

Age Validation in the Long Life Family Study Through a Linkage to Early-Life Census Records

Age Validation in the Long Life Family Study Through a Linkage to Early-Life Census Records Elo, I.T., Mykyta, L., Sebastiani, P., Christensen, K., Glynn, N.W., & Perls, T. (2013). Age validation in the long life family study through a linkage to early-life census records. Journals of Gerontology,

More information

Linking Migration Administrative Migration Records And. The Electoral List For Estimating The Number Of Costa

Linking Migration Administrative Migration Records And. The Electoral List For Estimating The Number Of Costa Linking Migration Administrative Migration Records And The Electoral List For Estimating The Number Of Costa Rican Emigrants. Brenes-Camacho, Gilbert Centro Centroamericano de Poblacion, University of

More information

Workshop on Census Data Evaluation for English Speaking African countries

Workshop on Census Data Evaluation for English Speaking African countries Workshop on Census Data Evaluation for English Speaking African countries Organised by United Nations Statistics Division (UNSD), in collaboration with the Uganda Bureau of Statistics Kampala, Uganda,

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2011 MODULE 3 : Basic statistical methods Time allowed: One and a half hours Candidates should answer THREE questions. Each

More information

Advanced Concepts. Genealogy and History. Genealogy and History

Advanced Concepts. Genealogy and History. Genealogy and History Genealogy and History Advanced Concepts What we call history, our ancestors called current events! Laws defined type and content of records! Laws indicated when and how events occurred in our ancestors

More information

CONTRIBUTIONS OF THE INTERNATIONAL METROPOLIS PROJECT TO THE GLOBAL DISCUSSIONS ON THE RELATIONS BETWEEN MIGRATION AND DEVELOPMENT 1.

CONTRIBUTIONS OF THE INTERNATIONAL METROPOLIS PROJECT TO THE GLOBAL DISCUSSIONS ON THE RELATIONS BETWEEN MIGRATION AND DEVELOPMENT 1. UN/POP/MIG-16CM/2018/11 12 February 2018 SIXTEENTH COORDINATION MEETING ON INTERNATIONAL MIGRATION Population Division Department of Economic and Social Affairs United Nations Secretariat New York, 15-16

More information

Founders and Survivors Linkage Strategy

Founders and Survivors Linkage Strategy Founders and Survivors Linkage Strategy John Bass, University of Tasmania Sandra Silcot, University of Melbourne Len Smith, Australian National University Founders and Survivors Prosopography Database

More information

For Online Publication APPENDIX VII. UP FROM SLAVERY? AFRICAN AMERICAN INTERGENERATIONAL MOBILITY SINCE 1880

For Online Publication APPENDIX VII. UP FROM SLAVERY? AFRICAN AMERICAN INTERGENERATIONAL MOBILITY SINCE 1880 For Online Publication APPENDIX TO UP FROM SLAVERY? AFRICAN AMERICAN INTERGENERATIONAL MOBILITY SINCE 1880 APPENDIX I. APPENDIX II. DATA APPENDIX a. Construction of Samples i. Linked Sample Construction

More information

Traces through time: a case-study of applying statistical methods to refine algorithms for linking biographical data

Traces through time: a case-study of applying statistical methods to refine algorithms for linking biographical data Traces through time: a case-study of applying statistical methods to refine algorithms for linking biographical data Mark Bell, Sonia Ranade The National Archives, Kew, London E-mail: { mark.bell; sonia.ranade

More information

Theodore Thorner b: October 15, 1878 in Poland Minnie Sommerman Parents Children June 1, 1905 August 2, 1906 Petition for Naturalization

Theodore Thorner b: October 15, 1878 in Poland Minnie Sommerman Parents Children June 1, 1905 August 2, 1906 Petition for Naturalization Theodore Thorner b: October 15, 1878 in Poland d: June 02, 1963 in California +Minnie Sommerman b: April 18, 1887 in Austria d: December 14, 1951 m: November 29, 1906 Parents Gedalia Lazer/ Eleazar Chrzadowski

More information

Census Records, City Directories, Maps

Census Records, City Directories, Maps This is a very high-level explanation of the complex topic, census records. An excellent source of detailed information can be found in The Source, A Guidebook of American Genealogy, Loretto Dennis Szucs,

More information

2 3, MAY 2018 ANKARA, TURKEY

2 3, MAY 2018 ANKARA, TURKEY SEVENTH SESSION OF OIC STATISTICAL COMMISSION 2 3, MAY 2018 ANKARA, TURKEY CRVS for the 2020 Round of Population and Housing Census Mr. Nyakassi M.B. Sanyang, The Gambia Presentation Outline Introduction

More information

Digit preference in Iranian age data

Digit preference in Iranian age data Digit preference in Iranian age data Aida Yazdanparast 1, Mohamad Amin Pourhoseingholi 2, Aliraza Abadi 3 BACKGROUND: Data on age in developing countries are subject to errors, particularly in circumstances

More information

Introduction to the course, lecturers, participants and the European Census 2021

Introduction to the course, lecturers, participants and the European Census 2021 Introduction to the course, lecturers, participants and the European Census 2021 Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics THE CONTRACTOR IS ACTING UNDER

More information

End of the Census. Why does the Census need reforming? Seminar Series POPULATION PATTERNS. seeing retirement differently

End of the Census. Why does the Census need reforming? Seminar Series POPULATION PATTERNS. seeing retirement differently Seminar Series End of the Census The UK population is undergoing drastic movement, with seachanges in mortality rates, life expectancy and how long individuals can hope to live in good health. In order

More information

Get Your Census Worth: Using the Census as a Research Tool

Get Your Census Worth: Using the Census as a Research Tool Get Your Census Worth: Using the Census as a Research Tool INTRODUCTION Noted genealogist and author Val D. Greenwood said that, there is probably no other single group of records in existence which contain

More information

CENSUS DATA. No. Rolls Jun 1840 M ,069, Jun 1850 M432 1,009 23,191, Jun 1860 M653 1,438 31,433,321

CENSUS DATA. No. Rolls Jun 1840 M ,069, Jun 1850 M432 1,009 23,191, Jun 1860 M653 1,438 31,433,321 CENSUS DATA No. Year Census Day NARA Series No. Rolls U.S. Population 1 1790 2 Aug 1790 T498 3 3,929,326 2 1800 4 Aug 1800 M32 52 5,308,483 3 1810 6 Aug 1810 M252 71 7,239,881 4 1820 7 Aug 1820 M33 142

More information

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania Working Paper No. 24 ENGLISH ONLY STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) CONFERENCE OF EUROPEAN STATISTICIANS Joint ECE/Eurostat

More information

Learn Where to Find Records. Habit # 34

Learn Where to Find Records. Habit # 34 Learn Where to Find Records Habit # 34 Where do you find family history clues, artifacts and resources Where do you search? So where do you begin your search? Where do you search? Family history clues

More information

Harnessing Census Microdata

Harnessing Census Microdata Harnessing Census Microdata Dr Barry Leventhal, BarryAnalytics Limited MRS CGG Seminar 5 th November 2014 Agenda Introduction to Census Microdata Microdata products from the UK Census Case study applications

More information

Adding value to storage of smartmeter consumer data. Jan Beyea, Ph.D. Consulting in the Public Interest

Adding value to storage of smartmeter consumer data. Jan Beyea, Ph.D. Consulting in the Public Interest Adding value to storage of smartmeter consumer data. Jan Beyea, Ph.D. Consulting in the Public Interest Scientific research potential Energy analysis Social science Energy economics (e.g., elasticities)

More information

Manifold s Methodology for Updating Population Estimates and Projections

Manifold s Methodology for Updating Population Estimates and Projections Manifold s Methodology for Updating Population Estimates and Projections Zhen Mei, Ph.D. in Mathematics Manifold Data Mining Inc. Demographic data are population statistics collected by Statistics Canada

More information

Discovering an Immigrant s Place of Origin

Discovering an Immigrant s Place of Origin Discovering an Immigrant s Place of Origin Presented by Tom Rice, CG Handouts online Handouts at: http://www.heritagehunters.com/origin/ I can be contacted at: info@heritagehunters.com This is a universal

More information

An assessment of household deaths collected during Census 2011 in South Africa. Christine Khoza, PhD Statistics South Africa

An assessment of household deaths collected during Census 2011 in South Africa. Christine Khoza, PhD Statistics South Africa An assessment of household deaths collected during Census 2011 in South Africa By Christine Khoza, PhD Statistics South Africa 1 Table of contents 1. Introduction... 2 2. Preliminary evaluation of samples

More information

Overview of Civil Registration and Vital Statistics systems

Overview of Civil Registration and Vital Statistics systems Overview of Civil Registration and Vital Statistics systems Training Workshop on CRVS ESCAP, Bangkok 9-13 January 2016 Helge Brunborg Statistics Norway Helge.Brunborg@gmail.com Outline Civil Registration

More information

Aboriginal Demographics. Planning, Research and Statistics Branch

Aboriginal Demographics. Planning, Research and Statistics Branch Aboriginal Demographics From the 2011 National Household Survey Planning, Research and Statistics Branch Aboriginal Demographics Overview 1) Aboriginal Peoples Size Age Structure Geographic Distribution

More information

Census Taker User Guide

Census Taker User Guide Census Taker User Guide Now that you have downloaded and installed Census Taker 1.1.1 to your computer from http://www.forthecousins.com/censustaker, here is a look at how to use it. Before you start,

More information

WORLD HEALTH ORGANIZATION - Questionnaire on mortality data

WORLD HEALTH ORGANIZATION - Questionnaire on mortality data WORLD HEALTH ORGANIZATION - Questionnaire on mortality data This questionnaire consists of two sections: the first section deals with overall mortality regardless of causes of death while the second section

More information

Measuring Multiple-Race Births in the United States

Measuring Multiple-Race Births in the United States Measuring Multiple-Race Births in the United States By Jennifer M. Ortman 1 Frederick W. Hollmann 2 Christine E. Guarneri 1 Presented at the Annual Meetings of the Population Association of America, San

More information

Probate Records: Wills and Estates Why to Search and How to Search Allan E. Jordan New York, USA

Probate Records: Wills and Estates Why to Search and How to Search Allan E. Jordan New York, USA Probate Records: Wills and Estates Why to Search and How to Search Allan E. Jordan New York, USA aejordan@aol.com Background Probate is an interesting topic because it s not a source people immediately

More information

OCCGS Civil War Veterans Project. Veteran s Information

OCCGS Civil War Veterans Project. Veteran s Information OCCGS Civil War Veterans Project Veteran s Information Veteran s Name: James A. PEER Birth Date: 21 January 1844 Location: Sidney, Shelby County, Ohio Death Date: 5 December 1931 Location: Long Beach,

More information

Isidore Thorner b: October 15, 1874 in Poland d: October 23, 1933 [per son Jacob Thorner] +Getel d: August Dina Levitch m: 14 Oct 1923

Isidore Thorner b: October 15, 1874 in Poland d: October 23, 1933 [per son Jacob Thorner] +Getel d: August Dina Levitch m: 14 Oct 1923 Isidore Thorner b: October 15, 1874 in Poland d: October 23, 1933 [per son Jacob Thorner] +Getel d: August 1910 + Dina Levitch m: 14 Oct 1923 Parents Gedalia Chrzadowski Thorner born: circa 1854 in near

More information

Using 2010 Census Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Census

Using 2010 Census Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Census Using Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Andrew Keller and Scott Konicki 1 U.S. Bureau, 4600 Silver Hill Rd., Washington, DC

More information

The ONS Longitudinal Study

The ONS Longitudinal Study Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS Aims of the Presentation What is the ONS LS and what data does it contain? What geographical

More information

East -West Population Institute. Accuracy of Age Data

East -West Population Institute. Accuracy of Age Data ON ESTIMATING ANNUAL BIRTH RATES FROM CENSUS DATA ON CHILDREN Lee -Jay Cho, East -West Population Institute and University of Hawaii I. INTRODUCTION For the majority of the world's population, the registration

More information

The 2010 Census: Count Question Resolution Program

The 2010 Census: Count Question Resolution Program The 2010 Census: Count Question Resolution Program Jennifer D. Williams Specialist in American National Government December 7, 2012 CRS Report for Congress Prepared for Members and Committees of Congress

More information

Researching New York City

Researching New York City Researching New York City Class 4 Christopher C. Child, Senior Genealogist of the Newbury Street Press Meet today s presenter Christopher C. Child Senior Genealogist of the Newbury Street Press OVERVIEW

More information

Response ID ANON-TX5D-M5FX-5

Response ID ANON-TX5D-M5FX-5 Response ID ANON-TX5D-M5FX-5 Submitted on 2015-08-27 15:25:10.395503 About you Are you answering this questionnaire on behalf of an organisation or as an individual? Organisation Please tell us a bit about

More information

Date Range Propagation in Genealogical Databases

Date Range Propagation in Genealogical Databases Date Range Propagation in Genealogical Databases Randy Wilson FamilySearch.org Abstract.Genealogical data is rarely complete on a given individual in a particular source. A birth certificate, for example,

More information

COMPARATIVE STUDY ON THE IMPORTANCE OF THE CIVIL REGISTRATION STATISTICS. Patrick Nshimiyimana

COMPARATIVE STUDY ON THE IMPORTANCE OF THE CIVIL REGISTRATION STATISTICS. Patrick Nshimiyimana Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS101) p.4322 COMPARATIVE STUDY ON THE IMPORTANCE OF THE CIVIL REGISTRATION STATISTICS Patrick Nshimiyimana National

More information

Elements of the Sampling Problem!

Elements of the Sampling Problem! Elements of the Sampling Problem! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! 2/1/13 Scheaffer, Mendenhall, Ott, & Gerow,! Chapter 2.1-2.3! 1 Goals for

More information

Response: ABS s comments on Estimating Indigenous life expectancy: pitfalls with consequences

Response: ABS s comments on Estimating Indigenous life expectancy: pitfalls with consequences J Pop Research (2012) 29:283 287 DOI 10.1007/s12546-012-9096-3 Response: ABS s comments on Estimating Indigenous life expectancy: pitfalls with consequences M. Shahidullah Published online: 18 August 2012

More information

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea

2012 UN International Seminar for Global Agenda - The Population and Housing Census. Hyong-Joon Noh Statistics Korea 2012 UN International Seminar for Global Agenda - The Population and Housing Census Hyong-Joon Noh Statistics Korea I II III IV V VI Concepts Background Action Plans Use of Administrative Data Future Plans

More information

The Impact of the Great Migration on Mortality of African Americans: Evidence from the Deep South

The Impact of the Great Migration on Mortality of African Americans: Evidence from the Deep South The Impact of the Great Migration on Mortality of African Americans: Evidence from the Deep South Dan A. Black Seth G. Sanders Evan J. Taylor Lowell J. Taylor Online Appendix A. Selection of States Our

More information

A Guide to the Genealogical Holdings at The Filson Historical Society

A Guide to the Genealogical Holdings at The Filson Historical Society I. Online Databases A Guide to the Genealogical Holdings at The Filson Historical Society 1. Ancestrylibrary.com 2. Fold3 -- Known for its large selection of digitized sources from the Civil War, including

More information

Automatic Processing of Dance Dance Revolution

Automatic Processing of Dance Dance Revolution Automatic Processing of Dance Dance Revolution John Bauer December 12, 2008 1 Introduction 2 Training Data The video game Dance Dance Revolution is a musicbased game of timing. The game plays music and

More information

Estimating Pregnancy- Related Mortality from the Census

Estimating Pregnancy- Related Mortality from the Census Estimating Pregnancy- Related Mortality from the Census Presentation prepared for workshop on Improving National Capacity to Track Maternal Mortality towards the attainment of the MDG5 Nairobi, Kenya:

More information

Canadian Census Records

Canadian Census Records Canadian Census Records Lisa McBride, AG FamilySearch mcbridelw@familysearch.org 15 September 2017 Census records are one of the primary sources for finding family information in Canada. Most of these

More information

Not To Be Quoted or Cited Without Permission of the Author 6/01/03 THE CONCEPT OF THE FAMILY: DEMOGRAPHIC AND GENEALOGICAL PERSPECTIVES

Not To Be Quoted or Cited Without Permission of the Author 6/01/03 THE CONCEPT OF THE FAMILY: DEMOGRAPHIC AND GENEALOGICAL PERSPECTIVES Not To Be Quoted or Cited Without Permission of the Author 6/01/03 THE CONCEPT OF THE FAMILY: DEMOGRAPHIC AND GENEALOGICAL PERSPECTIVES Charles B. Nam Research Associate, Center for Demography and Population

More information

DNA Testing. February 16, 2018

DNA Testing. February 16, 2018 DNA Testing February 16, 2018 What Is DNA? Double helix ladder structure where the rungs are molecules called nucleotides or bases. DNA contains only four of these nucleotides A, G, C, T The sequence that

More information

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT)

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT) WHITE PAPER Linking Liens and Civil Judgments Data Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT) Table of Contents Executive Summary... 3 Collecting

More information

NILS-RSU Introductory Information

NILS-RSU Introductory Information NILS-RSU Introductory Information Jamie Stainer Twitter: @NILSRSU Funded by: The NILS Longitudinal database of people and their major life events based on existing data sources Health card data linked

More information

IrishGenealogy.ie. Friends of Irish Research Richard Reid 08/03/2015

IrishGenealogy.ie. Friends of Irish Research Richard Reid 08/03/2015 IrishGenealogy.ie Friends of Irish Research Richard Reid 08/03/2015 Ireland 32 Counties Ireland 26 Parishes IrishGenealogy.ie This free database holds nearly 3 million transcriptions of pre-20th century

More information

Use of administrative sources and registers in the Finnish EU-SILC survey

Use of administrative sources and registers in the Finnish EU-SILC survey Use of administrative sources and registers in the Finnish EU-SILC survey Workshop on best practices for EU-SILC revision Marie Reijo, Senior Researcher Content Preconditions for good registers utilisation

More information

Automatic record linkage of individuals and households in historical census data

Automatic record linkage of individuals and households in historical census data Automatic record linkage of individuals and households in historical census data Author Fu, Zhichun, M Boot, H., Christen, Peter, Zhou, Jun Published 2014 Journal Title International Journal of Humanities

More information

OVERVIEW. Locate! Research! Analyze! Class 2: Research. Meet today s presenter 9/12/2017. Presentation (45 mins.)

OVERVIEW. Locate! Research! Analyze! Class 2: Research. Meet today s presenter 9/12/2017. Presentation (45 mins.) Locate! Research! Analyze!! Lindsay Fulton Director of Research Services Meet today s presenter Lindsay Fulton Director of Research Services OVERVIEW Presentation (45 mins.) Q&A (15 mins.) Click to expand

More information

ONLINE APPENDIX: SUPPLEMENTARY ANALYSES AND ADDITIONAL ESTIMATES FOR. by Martha J. Bailey, Olga Malkova, and Zoë M. McLaren.

ONLINE APPENDIX: SUPPLEMENTARY ANALYSES AND ADDITIONAL ESTIMATES FOR. by Martha J. Bailey, Olga Malkova, and Zoë M. McLaren. ONLINE APPENDIX: SUPPLEMENTARY ANALYSES AND ADDITIONAL ESTIMATES FOR DOES ACCESS TO FAMILY PLANNING INCREASE CHILDREN S OPPORTUNITIES? EVIDENCE FROM THE WAR ON POVERTY AND THE EARLY YEARS OF TITLE X by

More information

1990 Census Measures. Fast Track Project Technical Report Patrick S. Malone ( ; 9-May-00

1990 Census Measures. Fast Track Project Technical Report Patrick S. Malone ( ; 9-May-00 1990 Census Measures Fast Track Project Technical Report Patrick S. Malone (919-668-6910; malone@alumni.duke.edu) 9-May-00 Table of Contents I. Scale Description II. Report Sample III. Scaling IV. Differences

More information

There is no class tomorrow! Have a good weekend! Scores will be posted in Compass early Friday morning J

There is no class tomorrow! Have a good weekend! Scores will be posted in Compass early Friday morning J STATISTICS 100 EXAM 3 Fall 2016 PRINT NAME (Last name) (First name) *NETID CIRCLE SECTION: L1 12:30pm L2 3:30pm Online MWF 12pm Write answers in appropriate blanks. When no blanks are provided CIRCLE your

More information

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE.

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE. United Nations Economic and Social Council Distr.: General 15 May 2012 ECE/ /CES/2012/55 English only Economic Commission for Europe Conference of European Statisticians Sixtieth plenary session Paris,

More information

What Can the Census Tell You? A census is the procedure of acquiring information about every member of a given population.

What Can the Census Tell You? A census is the procedure of acquiring information about every member of a given population. A PROGRAM PRESENTED TO THE IOWA CITY GENEALOGICAL SOCIETY 25 October 2008 By Gloria Henry, ICGS Query Chair What Can the Census Tell You? A census is the procedure of acquiring information about every

More information

2011 National Household Survey (NHS): design and quality

2011 National Household Survey (NHS): design and quality 2011 National Household Survey (NHS): design and quality Margaret Michalowski 2014 National Conference Canadian Research Data Center Network (CRDCN) Winnipeg, Manitoba, October 29-31, 2014 Outline of the

More information

Forensic Genealogy Meets the Genealogical Proof Standard

Forensic Genealogy Meets the Genealogical Proof Standard Forensic Genealogy Meets the Genealogical Proof Standard By Michael S. Ramage, J.D., CG SM (Copyright Michael S. Ramage 2015) I. Definitions A. Forensic Genealogy: genealogy with legal implications, usually

More information