Data mining in the Dutch Civil Registration from 1811-present Gerrit Bloothooft 1,2,3, Kees Mandemakers 2, Leendert Brouwer 3, Matthijs Brouwer 3 1 Universiteit Utrecht / 2 IISG KNAW / 3 Meertens Instituut KNAW The Netherlands Paris workshop 9-10/12/2010 1
names in family trees Paris workshop 9-10/12/2010 2
intergenerational (family names) Hoeksema I:1 1895-1959 I:2 1902-1987 II:6 II:5 1935 Hoeksema III:9 III:8 1961 III:10 Hoeksema (Hoeksema) IV:4 1986 IV:5 1993 Paris workshop 9-10/12/2010 3
family names Patrilinear» daughters keep family name of father Start ~17 th century, compulsory 1811 Limited geographic spread Linguistic properties» language (dialect)» suffixes (-s(e)ma, -stra, -ing -ink,..)» length» patronyms, occupation, provenance, Paris workshop 9-10/12/2010 4
intergenerational (first names) Johannes I:1 1895-1959 I:2 1902-1987 Cornelia Maria II:6 II:5 1935 Willem Dirk III:9 III:8 1961 Corrie III:10 Jan Priscilla IV:4 1986 IV:5 1993 Semantha Paris workshop 9-10/12/2010 5
first names Traditional naming Modern naming» little effect of social class (in The Netherlands)» after grandparents in prescribed order» dominant until 1960» fashion during one generation» correlation with education and income, lifestyle geographic spread socio-onomastic features (modern) Paris workshop 9-10/12/2010 6
ideal source: Civil Registration Names Dates and places of birth, marriage, death Family relations (parents, partners) Occupation» not in modern CR Paris workshop 9-10/12/2010 7
restrictions CR Privacy restrictions modern: since 2000 available for research (in NL)» no identification of individuals allowed» but names are intended to identify Digital availability modern: since 1994 historical: certificates digitized by volonteers» from 1811-1909 (birth), 1811-1934 (marriages), 1811-1959 (death) [50% completed]» LINKS project to reconstruct families Paris workshop 9-10/12/2010 8
modern data (full population) first names (selection 2006: 16+6 million) all first names date, place and country of birth current residence (postal code) id id of parents (and their names, date, place and country of birth) family names (selection 2007: 16 million) prefix and family name date, place and country of birth current residence Paris workshop 9-10/12/2010 9
different names first names 500.000 300.000 in first position 5.000.000 as full name family names 314.000 Paris workshop 9-10/12/2010 10
online name corpora first names www.meertens.knaw.nl/nvb June 3, 2010 family names www.meertens.knaw.nl/nfb December 3, 2009 Paris workshop 9-10/12/2010 11
on show all presentations absolute & relative first names name & gender # as first name, # as following name (totals 2006) births per year (since 1880) places of birth (for 2006 population) 468 municipalities in 2006 explanations (20.000, ~4000 extensive) family names name # in 2007, # in 1947 places of residence in 2006, provinces in 1947 explanations (90.000, ~ 4000 extensive) relational network of names Paris workshop 9-10/12/2010 12
privacy issues shown: all names not shown: any figure < 5» Norway: <3, France: <5, Belgium: all if identifiable:» not on map for small municipalities (by rules)» unless in telephone directory Paris workshop 9-10/12/2010 13
name search exact begins with pick one from list ends with contains pick one from list pick one from list advanced regular expression aggregated data matthijs matt ijs th ^ma.*(ij y)s$ Paris workshop 9-10/12/2010 14
family names Paris workshop 9-10/12/2010 15
main page external links to among others the national bureau for genealogy network of related names Paris workshop 9-10/12/2010 16
surname maps Janse Janssen Jansen 100 km relative figures Paris workshop 9-10/12/2010 17
properties of sets of surnames regular expression: stra$ results in 483 surnames on -stra Paris workshop 9-10/12/2010 18
surnames on -stra Protestant Industry and mines 100 km Catholic Paris workshop 9-10/12/2010 19
first names Paris workshop 9-10/12/2010 20
popularity of Gerrit (numbers) Paris workshop 9-10/12/2010 21
popularity of Gerrit (relative) Paris workshop 9-10/12/2010 22
from tradition to fashion Traditional naming Jan Kevin Nelly Jayden Ingrid Femke Paris workshop 9-10/12/2010 23
map of Gerrit (relative) Paris workshop 9-10/12/2010 24
co-variation: from many to few assumption parents chose names for their children that fit their social environment traditional names (traditional Dutch latinized) Frisian, Arabic, Turkish names English, French, Scandinavian, Spanish, Italian names names from the Old Testament names from history and culture names from nature.. analysis of names of children found in the same family Paris workshop 9-10/12/2010 25
first names and religion Traditional Dutch names Traditional Protestants Dutch bible belt Paris workshop 9-10/12/2010 26
ap of urrent rst names Typical name group per postal code area Traditional Latinized Traditional Dutch Old Testament Frisian PreModern Dutch Elite French Nordic French Modern Dutch Modern Italian Spanish English Arabic & Turkish Paris workshop 9-10/12/2010 27
socio-economics of first names Socio-economic data at family level Names of children in the family known for 281.751 households (2000-2005) with children (questionnaire)» name group and» income» highest education» lifestyle profile Paris workshop 9-10/12/2010 28
two dimensions 2 traditional trendy 1 0-1 Italian-Spanish Arabic2 Arabic1 Turkish English Modern French Mixed(Nordic) Dutch-preModern Dutch-Modern Hebrew Frisian Elite Traditional -2-2 -1 0 1 low income high income 2 Paris workshop 9-10/12/2010 29
migration we know family relations (in the database of first names) placesof birth descendants, where were they born ancestors, where were they born Paris workshop 9-10/12/2010 30
descendants of people from Sneek places of birth of grandchildren from males born in Sneek between 1880-1900 This example concerns the town of Sneek, but in the interactive application any municipality can be chosen Paris workshop 9-10/12/2010 31
ancestors of people from Sneek places of birth of great-grandparents from current male inhabitants of Sneek who are between 30 and 50 years of age Paris workshop 9-10/12/2010 32
extension to 1811 automatic family reconstruction in progress for full population» 50% of CR certifcates available; completed in 2020» many record linkage issues to be solved» ~15 million persons 1811-1930 Historical Sample of The Netherlands» 78.000 life courses manually reconstructed» unbiased sample 1812-1922 Paris workshop 9-10/12/2010 33
in conclusion names are exciting challenges for: pattern recognition space-time representations historical, linguistic and (socio)onomastic studies (inter)national data sharing (& privacy issues) Paris workshop 9-10/12/2010 34