1 Examples of Record Linkage Studies from Norway and Bosnia EGM on Record Linkage Studies to Assess Completeness of Death Registration Beirut, December 21-22, 2017 ESCWA Helge Brunborg Statistics Norway
Outline Assessing completeness of death registration: 1. Record linkage based on the Norwegian population registration system Using unique ID number (PIN) 2. Record linkage of missing and dead persons in Bosnia and Herzegovina during the armed conflict 1992-1995 Using name, date of birth, etc Neither study is an example of record linkage to assess completeness of death registration: In Norway the completeness is 100%. I show examples of how a good registration system can be used to study mortality and other issues. The Bosnian study is a study of how record linkage can be used to link data from different sources 2
Nordic model of population registration Nordic countries: Norway, Denmark, Finland, Iceland and Sweden Modern system established in the 1960s Other countries with related registers: Netherlands, Belgium, Slovenia, Hungary Without central population register and unique identification number: Italy, Germany, France, UK, USA 3
Modern population registration in Norway 1964: Central Population Register (CPR) established from the 1960 Population Census 1964: Unique ID numbers introduced 1994: All 435 local offices computerised and on-line with the CPR data base 2001: Last census using paper forms 2011: First register-based census
Characteristics of the Nordic model of population registration 1. Central Population Register with on-line links to local offices 2. Unique Personal Identification Number (PIN) 3. CPR continuously updated with data on births, deaths, migrations, address changes, marriages, name changes, citizenship 4. CPR and PIN widely used for administrative purposes, by both public and private institutions 5. Widely used for administration, statistics and research 6. The wide use of the unique PIN makes it simple to link and use data from different administrative registers 7. The register owner the administrative institution has (or should have) a strong interest in keeping and updating its register(s) 8. Frequent use, including statistical use, improves the quality 9. Close cooperation (and trust) between public institutions 10. Data protected through legislation and a data inspectorate 5
Transfer of data to CPR
Distribution of data from CPR
Directorate of Taxes Administrative use: some users of CPR data Income (PIN, age, sex, address, family members, relatives ) Wealth Taxes Municipal treasurers Defense (people eligible to be drafted for military service) Police Passports Indicted and sentenced persons Prisoners Directorate of Roads Names of driving license holders Names of car owners Ministry of education and all educational institutions Enrolled students at all levels Completed exams Highest attained education Social security administration Old-age and disability pensions Unemployment benefits Social support receivers Hospitals (names and ages of residents in district) 8
Most important status (stock) variables in the CPR Personal Identification Number (includes date of birth and sex) Residence status (resident, deceased, emigrated, no permanent address, disappeared) Address Municipality Dwelling number Place of birth (municipality or country) Name (incl. first and middle names) Surname prior to marriage Citizenship Country of immigration Country of emigration Marital status PIN of spouse, mother and father Links between siblings, cousins, children and grandparents All changes in these variables are registered, including the date of change 9
Most important flow variables registered by the CPR Births Deaths Marital changes (marriages incl. same-sex marriages, separations, divorces, annulments ) Emigrations and immigrations Internal moves Address changes Name changes Citizenship changes Gender changes PIN changes 10
All individual historical data are kept Number of residents: 5.2 million Persons in the CPR: About 8 million Nobody (no record) is deleted - but the status (resident, dead, emigrated) is changed Official date of registration and date of entry are recorded for every new data entry 11
Other person registers with PIN Population censuses 1960, 1970, 1980, 1990 and 2001 Later censuses based on administrative registers only and not on census questionnaires Refugees and other immigrants Cause of death Cancer cases Tuberculosis cases Medical personnel Prescriptions Vaccinations Soldiers Bank accounts Insurance registers Income and wealth Pensions Educational activity and attainment, incl. examination results Occupation 12
Statistics and research based on data from CPR: examples Register-based censuses Distribution of women by number of children, including childlessness Total fertility rate by education Life expectancy by occupation and by education Making life histories on Births Marriages and divorces Migrations: External, internal, all For how long have immigrants been living in Norway? How many people reimmigrate to Norway? Integration of immigrants: How do they do with regard to Education Labour force activity Income Social transfers Crime 13
Stages of the development of a CRVS system and a Central Population Register 1. Basic CR Registration of births and deaths Limited local recording of information on paper Birth and death certificates are issued 2. Computerized register of births and deaths Local and/or national level 3. Computerized register of all vital events, including Marriages and divorces Immigrations and emigrations Internal migrations
4. Civil Registration with PIN (Personal Identification Number) Assigning a unique personal identification number (PIN) to all new born PIN entered into birth register PIN on birth certificates and other documents Assigning a PIN to all new immigrants 5. Comprehensive (Central) Population Register All residents are registered with name, date of birth, PIN and address Register is regularly updated with data on new events 6. Integrated system of registers Links between registers of persons, properties (incl. land titles) and companies based on PINs
Concluding remarks Completeness of death registration in Norway is about 100% Perhaps minor under registration of deaths abroad and for persons dying in Norway who are not residents of Norway No point in using record linkage to estimate the completeness of death registration Instead, the registration system may be used to estimate mortality measures by variables such as occupation and education
Missing and dead in Bosnia Worked for the International Criminal Tribunal for the former Yugoslavia (ICTY) Asked to estimate the number of missing and dead after the fall of Srebrenica in 1995 Verify and confirm the identity of all victims Verify deaths if possible To be used in investigations and trials at the ICTY 17
Objective Srebrenica Project Determine the minimum number of dead and missing persons related to the fall of the enclave on 11 July 1995 18
Methodology Compare lists of missing persons compiled by ICRC (International Committee of the Red Cross) PHR (Physicians for Human Rights) Determine Srebrenica-related places of disappearance Determine Srebrenica-related dates of disappearance Compare list of missing persons with post-war lists of voters 19
Data sources Main data source: Lists of missing and dead from ICRC (International Committee of the Red Cross) This list was merged with several other lists, including List of dead and missing from PHR (Physicians for Human Rights) Population census 1991 Voter s register 1997 and 1998 Refugees IDPs (Internally Displaced Persons) Persons known to be dead Exhumation records 20
Main Variables ID number Date of birth Sex Name First name, Surname, Father s name Municipality /Place of residence, birth, voting Religion or ethnicity Variables often missing or wrong, partly or fully Methods to improve the data: correct data fuzzy matching visual inspection of duplicates and difficult cases 21
Methodology Data analysis method: dbase The lists of missing and dead were first checked for duplicates, cleaned and merged Records in the missing list were compared with records for the same variables in other lists of events or stocks Name (First, Family, Father s), Date of birth, Sex Unique matches were marked Another search for matches was run, with new criteria For example by adding place of birth and/or place of residence Multiple and other matches were inspected visually This was repeated until no more likely matched occurred Conservative approach: Questionable matches were dropped Final outcome: A list of names with name, age, sex, place of birth or residence, place last seen alive, etc. 22
Problems with the linkage Missing variables Missing data Incomplete or data Fuzzy data Errors in data Duplicates How to identify duplicates? When are duplicates real and when are they not? Which of two duplicates do we keep? Linking dead and missing (in 1995) with Muslim men enumerated in Srebrenica in the 1991 census: 87% match 23
Problem: Spelling of Names Surname, First name, Father s name Errors often caused by optical scanning Correctly spelled names required to match records from different sources Methods to correct names: Substituting most common errors (e.g. Q LJ) Algorithms based on Bosnian syntax (e.g. id ić) Comparing names of household members (e.g. Mvsić Musić) Manually checking lists of names Using lists of correct names (surnames, first names) Result: The number of different surnames (including errors) in a list of voters was reduced from 87,000 to less than 40,000 names 24
Srebrenica-related missing or dead persons On both ICRC and PHR lists + 5,712 On ICRC list only + 1,586 On PHR list only + 192 Srebrenica-related missing persons = 7,490 registered by ICRC and/or PHR Found in Voters Registers 1997 and - 9 1998 Srebrenica-related victims, excl. = 7,481 persons found in Voters Registers Found alive by ICRC since Jan. 1997-6 (identities unknown to us) Minimum number of victims = 7,475 Helge Brunborg, Torkild Hovde Lyngstad and Henrik Urdal: Accounting for genocide: How many were killed in Srebrenica? European Journal of Population 19: 229-248, September 2003. ). Reprinted in E Tabeau (ed.) Conflict in Numbers. Casualties of the 1990s Wars in the Former Yugoslavia. Testimonies vol.33, Helsinki Committee for Human Rights in Serbia. Belgrade 2009. http://www.helsinki.org.rs/doc/testimonies33.pdf 25
Srebrenica-related missing men 1200 As per February 2000: 7427 men (and 48 women) Based on lists of missing persons 1000 800 600 400 ` 200 0 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 Age 85-89 26
Age distribution of missing persons and exhumed bodies 1 2 2 4 1 3 1 1 1 1 0 0 0 0 0 90 80 70 60 50 40 30 20 10 0 82,2 Missing (7481 persons) 73,6 Exhumations (1900 bodies) 26,4 17,5 0,0 0,4 8-12 years 13-24 years 25+ years 27
Estimating the number of missing persons from Srebrenica, using dual recording methodology (capture-tag-recapture) 28
Probability of being missing for Muslim men from Srebrenica 60 % 50 % Additional missing probabiliy due to 1991-95 normal mortality Unadjusted missing probability 40 % 30 % 20 % 10 % 0 % 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 Age at the end of 1995 29
Development over time of Srebrenica-related dead and missing Minimum numbers Feb. 2000: 7,477 70 identified as dead (1%) Nov. 2005: 7,661 2,054 identified as dead (27%) April 2009: 7,905 5,274 identified as dead (67%) April 2013: 6,849 identified as dead (ca 86%) Final number: Probably about 8000, with more than 90% identified as dead Exhumations and DNA analysis result in Persons on the ICTY missing list identified as dead Additional identified dead persons found Some bodies are found that cannot be identified but in Srebrenica-related graves 30
Further work Record linkage used to estimate the consequences of the armed conflict in many areas, including: Death toll in the siege of Sarajevo Source: E Tabeau, M Żółtkowski and J Bijak, 2002, POpulation Losses in the Siege of Sarajevo 10 September 1992 to 10 August 1994. Research Report prepared for the case of Stanislav Galić (IT-98-29-I). Reprinted in E. Tabeau (ed.) Conflict in Numbers. Casualties of the 1990s Wars in the Former Yugoslavia. Testimonies vol.33, Helsinki Committee for Human Rights in Serbia. Belgrade 2009. http://www.helsinki.org.rs/doc/testimonies33.pdf Ethnic composition in municipalities to the number of Missing variables Source: Tabeu and Bijak, 2005, War-related Deaths in the 1992-1995 Armed Conflicts in Bosnia and Herzegovina: A Critique of Previous Estimates and Recent Results, European J. Pop. 21: 2-3 31
Concluding remarks Completeness of death registration in Norway is about 100% No point in using record linkage to estimate the completeness of death registration Record linkage relatively easy for estimating mortality measures by variables such as occupation and education Record linkage necessary for estimating and validating the number of missing and dead after the fall of Srebrenica in Bosnia Complicated because unique identifiers not available in all data sources Errors in the data: cleaning necessary