Y-Chromosome Haplotype Origins via Biogeographical Multilateration

Similar documents
TribeMapper Report for Michael Maglio

Big Y-700 White Paper

Meek DNA Project Group B Ancestral Signature

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Every human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary

DNA and Ancestry. An Update on New Tests. Steve Louis. Jewish Genealogical Society of Washington State. January 13, 2014

Case Study Pinpointing the Grace English Paternal Ancestral Genetic Homeland

In-depth search advice. genetic. homeland

Understanding your Results

Pedigree Reconstruction using Identity by Descent

[CLIENT] SmithDNA1701 DE January 2017

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

Recent Results from the Jackson Brigade DNA Project

Using Y-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

The Meek Family of Allegheny Co., PA Meek Group A Introduction

Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl

DNA Haplogroups Report

DNA study deals blow to theory of European origins

What Can I Learn From DNA Testing?

TRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter

The African Origin Hypothesis What do the data tell us?

Steve Harding, *Turi King and *Mark Jobling Universities of Nottingham & *Leicester, UK

Your mtdna Full Sequence Results

Contributed by "Kathy Hallett"

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

DNA Opening Doors for Today s s Genealogist

Halley Family. Mystery? Mystery? Can you solve a. Can you help solve a

Genetic Genealogy Journey DNA Projects by Debbie Parker Wayne, CG SM, CGL SM

Mitochondrial DNA (mtdna) JGSGO June 5, 2018

DNA Testing What you need to know first

CLAN DONNACHAIDH DNA NEWS No 1

DNA CHARLOTTE COUNTY GENEALOGICAL SOCIETY - MARCH 30, 2013 WALL STREET JOURNAL ARTICLE

Autosomal DNA. What is autosomal DNA? X-DNA

DNA TESTING. This is the testing regime for FamilyTreeDNA. Other SNP tests were ordered from Yseq.

Pinpointing the BLAIR Paternal Ancestral Genetic Homeland. A Scottish Case Study

The Meek Family of Allegheny Co., PA Meek Group A Introduction

Eller DNA Project. Status Report for Nashville EFA Conference----July 25, Tom Eller, DNA Project Administrator

Genetic Genealogy. Rules and Tools. Baltimore County Genealogical Society March 25, 2018 Andrew Hochreiter

23 March I will try and summarize the Y-DNA male line test results for both of you and the other members of the Stubbs DNA Project:

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

Genetic Genealogy Journey Why Is My Cousin Not on my DNA Match List? Debbie Parker Wayne, CG SM, CGL SM

Autosomal-DNA. How does the nature of Jewish genealogy make autosomal DNA research more challenging?

DNA Basics. OLLI: Genealogy 101 October 1, ~ Monique E. Rivera ~

DAR POLICY STATEMENT AND BACKGROUND Using DNA Evidence for DAR Applications

Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM

Chart 2 Group A, 37-Marker Level Entire R1b-M222 Group Generations to Include MRCA at 99% Probability

Population Structure and Genealogies

Welcome to this issue of Facts & Genes, the only publication devoted to Genetic Genealogy.

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

An O-F3288 Y DNA Discovery for Patrilineal Descendants of James Revell (Accomack) By Marie A. Rundquist, DNA Project Administrator November 2018

DNA The New Genealogy Frontier Hope N. Tillman & Walt Howe Charlestown October 14, 2016

Ernie Ebayley s Adventure in DNA-Land. A Resource for Beginning Your Own Adventure into Genealogical Genetics

Before India: Exploring Your Ancestry With DNA By David G. Mahal

Appendix III - Analysis of Non-Paternal Events

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

DNA Testing. February 16, 2018

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Using Autosomal DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

THE BASICS OF DNA TESTING. By Jill Garrison, Genealogy Coordinator Frankfort Community Public Library

From Sticky Mucus to Probing our Past: Aspects and problems of the Biotechnological use of Macromolecules

Y-DNA Genetic Testing

An Introduction to Genetic Genealogy

Ancestral Recombination Graphs

Finding a Male Hodge(s) Descendant for Y-Chromosome DNA Testing. Prepared by Jan Alpert

Unified Growth Theory

GEDmatch Home Page The upper left corner of your home page has Information about you and links to lots of helpful information. Check them out!

Human origins and analysis of mitochondrial DNA sequences

Meek/Meeks Families of Virginia Meek Group F Introduction

How a DNA Project has produced discoveries in the Meates One- Name Study not possible with paper records alone

No Journal of North Minzu University Gen.No.143

Ancestral Origins of Baltic N-Z ver /

The DNA Case for Bethuel Riggs

Coalescent Theory: An Introduction for Phylogenetics

Family Tree DNA Genetic Genealogy Started Here

Gene coancestry in pedigrees and populations

Goals of the AP World History Course Historical Periodization Course Themes Course Schedule (Periods) Historical Thinking Skills

Through the Lens of Genetics, Genographic Project and University of Pennsylvania Scientists Illuminate the Ancient History of Circumarctic Peoples

DNA: Statistical Guidelines

Ewing Surname Y-DNA Project Article 8

FREQUENTLY ASKED QUESTIONS ABOUT THE OWSTON/OUSTON DNA PROJECT

Clan Donnachaidh DNA report extracts from newsletters in 2006

BETTER TOGETHER: MAKING YOUR CASE WITH DOCUMENTS AND DNA BCG-sponsored Webinar ( Patricia Lee Hobbs, CG

CPSP118G Earth, Life & Time Colloquium, Semester 2 Your Family, the Historical Perspective: Phase Two

Origins: Coffey/Keogh Families By Fred Coffey. ONLINE:

Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes.

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

y-haplogroups I1 and R1b in European Countries, plus Ancient Migrations within Europe

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Pizza and Who do you think you are?

Unit 2: Paleolithic Era to Agricultural Revolution

Comparative method, coalescents, and the future

Genetic Project - April 2002

Research Centre of Medical Genetics of the Russian Academy of Medical Sciences Russia, Moscow 3

John Doe Knight Premium Male DNA Ancestry Report

Putting the genes into genealogy

! FTDNA! Ancestry. ! 23andMe. ! Medical Considera,ons. ! Iden,fying family medical history. ! Communica,ng with the medical community

New Advances Reconstructing the Y Chromosome Haplotype of Napoléon the First Based on Three of his Living Descendants

Supplementary Information

Bioinformatics I, WS 14/15, D. Huson, December 15,

Transcription:

Y-Chromosome Haplotype Origins via Biogeographical Multilateration Michael R. Maglio Abstract Current Y-chromosome migration maps only cover the broadest-brush strokes of the highest-level haplogroups. Existing methods generalize geographic patterns based on large population genetic frequency and diversity. New tools are required to illuminate our nomadic, stationary and genealogical histories. Biogeographical Multilateration (BGM) illustrates directional flow as well as chronological and physical origins at the individual haplotype level. Introduction Traditional genealogy and its reliance on paper records can only take us as far back as records exist. This is perhaps 300 or 400 years. It could be as much as 1000 years if we can connect to wealthy or royal families. Y-chromosome testing can illuminate our haplogroup origins. Genetic migration maps (Fig. 1) show our history from over 100,000 years to about 10,000 years ago. That leaves a large gap of time and geographic location for our nomadic ancestors and only covers the Fig. 1 Migration routes based on male Y- chromosome data. Source: The Genographic Project broad-brush strokes of the highest-level haplogroups. Heat maps get us down to the distribution of high level SNPs within the haplogroup. These distributions are based on the current locations of test populations (Underhill et al 2001). We must be careful not to misinterpret the genetic gradient of an organic process (Chikhi et al 2002). How do we get to the migration patterns at the individual haplotype level? We need the ability to map the four phases of our history. The phases: historical (current to 400 years +) the portion of our family history that is well documented, stationary (~500 to ~1500 years) our ancestors made the rural to urban transition (Malanima 2007), staying roughly in the same location for centuries, nomadic (~1500 to 10,000 years) multiple millennia of migration and origin (10,000 years +) the approximate birthplace of our haplogroups. Genealogy can cover the historical and Y- chromosome test results reveal the origin. New tools are required to resolve the stationary and nomadic phases. There is a fifth phase Out of Africa (OoA). This phase is common across the majority of haplogroups. Anthropological and genetic evidence shows our nomadic ancestors migrating across the Neolithic at about 1 km per year. The agricultural revolution required our ancestors to settle and farm. It also allowed them to build larger families and communities. With each generation, the population dispersed to exploit available resources (Hazelwood et al 2004) Fig. 2(a).

P a g e 2 Fig. 2 Population dispersal; (a) with directionality (b) directionality in an environment with no geographic obstacles and equal resources (c) real world dispersal bounded by geographic obstacles [mountains, rivers & bodies of water] (d) post rural to urban migration Given an environment with no geographic boundaries and equal resources, Fig. 2(b), this would create a population expansion wave in a direction away from the land currently occupied and farmed by their extended tribe. Real world boundaries, mountains, rivers, bodies of water and varying resources, impacted these waves of migration, Fig. 2(c). These migrations continued until the rural to urban transition began in the Middle Ages, Fig. 2(d). This rural exodus marks the ending of the nomadic phase as a portion of our ancestors sought greater opportunities in the cities, ultimately becoming geographically stationary. Methods As part of the data collection, record selection is restricted to those containing the most distant known paternal ancestor (MDKPA) and self-reported origin. This origin may be the result of genealogical research or completely anecdotal. Traditionally, genetic data collection is done in situ to validate the geographic component, limiting the number of collection sites. The self-reported geographic origin increases the number of locations, yet introduces a potential margin of error due to its nature. As you will see, the margin of error is trivial. Fig. 3 Phylogenetic tree (n=10) haplogroup I-L22 Fig. 4 Bilateration analysis

P a g e 3 Data sets consist of multiple records from a 37 STR marker haplotype and corresponding SNP (YCC 2002). In this exercise, I-L22 is used. Time to most recent common ancestor (TMRCA) is generated to a 95% confidence (Walsh 2001) using FTDNA derived mutation rates. This output is then used by the Neighbor-joining method, which is part of the PHYLIP package for inferring phylogenetic relationships. A phylogenetic tree, Fig. 3, and chronological distances are produced for each data set. Data points are mapped using genealogical origin and a radius drawn on a Mercator projection calculated using the upper value of the Neolithic migration rate of 30 km per 25 years or 1.2km/yr (Cavalli- Sforza 2002, Hazelwood et al 2004). The resulting intersection between pairs, Fig. 4, represents the approximate location of the common ancestor. A Time Difference of Arrival (TDoA) approach is used for detecting the origin (Peter et al 2013). Traditional TDoA uses two or more beacons with known locations and a measurement of the time it takes to receive a signal from each. The time is converted to a distance. A current location can then be deduced. In this analysis, the beacons are the paternal ancestor geographic origins. The signal (electromagnetic wave) is the population expansion wave measured in time, TMRCA, converted to a distance using the migration rate. A location for the common ancestor can then be deduced. Bilateration analysis of pair data proceeds through the network of phylogenetic data illustrating haplotype directional flow as well as chronologic and geographic origins. Discussion Suppose that each Neolithic generation spawned a new set of villages, Fig. 5. This doesn t mean that the entire previous village up and left. There were those that stayed, they had an investment in land and resources. There were those that migrated, looking for opportunities. In a perfect scenario, there would be a descendant from the original tribe in each village. Across time and geography, Fig. 6 Tracing genetic markers Fig. 5 Serial founder effect genetic differentiation increased. Each village would carry only a subset of the genetic diversity (Deshpande et al 2009) of the previous village and have a unique genetic signature. Collecting and comparing y-dna

P a g e 4 Fig. 7 Bilateration scenarios, (a) intersecting ranges (b) non-intersecting ranges (c) range within a range would show us the exact path that each tribal branch took as they migrated. Unfortunately, there isn t a perfect scenario. Not every genetic branch or even every village survived. Only a small fraction of y-dna has been tested and made available for comparison. We are left with fragmented data. The self-reported origin that accompanies the y-dna in this study identifies the ancestral location in the stationary phase. Taking into consideration that not all villages still exist and that the rural to urban transition consolidated the population, we should not expect to be able to trace a genetic line exactly back geographically. An approximate path can be determined by walking backwards first through STR mutations and eventually SNP mutations, following genetic breadcrumbs. Take any two haplotypes with self-reported origins and generate a TMRCA. Multiply the years by 1.2 to get distance to most recent common ancestor (DMRCA). Initial analysis confirms the upper bounds of the Neolithic migration rate, 1.2km/yr. Using this rate allows the solution to converge in fewer steps. Using a Mercator projection, plot the two circles at the ancestral origins with the DMRCA as the radius. The perimeter specifies the range for the common ancestor and the intersection(s) indicates the possible location. Bilateration is used to visually mark the geographic location of the common ancestor. If necessary, the exact coordinates could be calculated. There are three bilateration scenarios to consider (Cota-Ruiz et al 2012). In Fig. 7(a), two ranges meet tangentially, creating a single intersection or they overlap, creating two intersections. In the case of two intersections, additional haplotype samples are used to disambiguate. Multiple bilaterations turn into a multilateration analysis. In Fig. 7(b), two ranges do not intersect. This may indicate Fig. 8 Bilateration analysis of paternal ancestors PA03 and PA04 identifying common ancestor CA1.

P a g e 5 Table 1. Neighbor-joining branch lengths from Paternal Ancestors to Common Ancestor with corresponding distances (1.2 km/yr) Paternal Ancestor Common Ancestor TMRCA DMRCA (km) PA01 2 137.1 165 PA02 5 695.6 835 PA03 1 106.8 128 PA04 1 253.1 304 PA05 3 453.7 545 PA06 3 356.2 428 PA07 6 612.5 735 PA08 8 297.2 357 PA09 8 362.8 435 PA10 2 372.8 447 that one or both migration rates were higher than average. This is most common when a body of water separates the two samples. It may also indicate that one of the self-reported origins is in error. In the case of a body of water, the common ancestor has the potential to exist on either coast, represented by points a 2a and a 2b. Disambiguation, employing additional bilaterations, is required. In Fig. 7(c), one migration range exists completely within the second range. This suggests that the migration rate of the sample with the Fig. 9 Bilateration analysis of paternal ancestors PA01 and PA10 identifying ambiguous common ancestor CA2a or CA2b. Table 2. Neighbor-joining branch lengths from Common Ancestors to their Common Ancestor with corresponding distances (1.2 km/yr) Common Ancestor Common Ancestor TMRCA DMRCA (km) 1 4 402.7 483 3 4 54.7 66 4 6 276.2 332 5 2 249.4 299 6 7 38.4 46 7 5 70.3 84 8 7 34.7 42 larger range was actually slower or the rate of the smaller range was faster, or both cases are true. As with the previous scenarios, which generated multiple common ancestor locations, a complete multilateration can determine the correct point. Phylogenetic data from additional analyses is available in Appendix 1. Stepping through the analysis sample data in Table 1, Fig. 8 shows the migrations ranges of paternal ancestors PA03, having a radius of 128 km and PA04 with a radius of 304 km. For this first pair there is nearly an exact intersection, labeled CA1 for their common ancestor. Fig. 9 illustrates the intersection between paternal ancestors PA01 and PA10. The overlapping regions create two ambiguous intersecting points, CA2a and CA2b. The location of downstream common ancestor CA5 needs to be determined in order to identify the true CA2. Analysis continues through the tabulated data, generating a series of common ancestor points. Fig. 10 shows the process at the placement of common ancestor CA5 based on paternal ancestor PA02 and just prior to placing CA6. CA5 relates to CA7 by a distance of 84 km and CA7 is related to CA8 by 42 km. This allows us to remove CA8a and keep CA8b. There is still uncertainty

P a g e 6 around the location of CA5, which creates CA5a, CA5b, CA7a and CA7b. The location of CA6 is constrained by PA07, CA4 and CA7. At this stage of the process, there are two CA4s and two CA7s. The only location for CA6 that satisfies 332 km from CA4 and 46 km from CA7, places CA6 between CA7a and CA8b as seen in Fig. 11. This eliminates CA7b and CA4b. These eliminations cascade and deliver a fully disambiguated solution. Fig. 11 Multilateration analysis is required for disambiguation. Fig. 10 Multilateration analysis prior to full convergence and disambiguation. origin with a Frisian Coast connection as a staging area. In an expansion of this data set,we would expect to see paternal ancestor records and the resulting common ancestors from those North Sea regions. The phylogenetic tree root distance can give us an estimated age of each common ancestor. CA2 is the oldest at 1,290 years ago. CA5, CA6, CA7 and CA8 cluster together in the Four of the eight common ancestors, CA5, CA6, CA7 and CA8, cluster over a body of water and water travel increases the migration rate. That would put these common ancestors on the European mainland. Conclusions A small sample of 10 records was used in this analysis for simplification. Much larger data sets are recommended and would be required to determine the genetic flow in a greater geographic and chronologic view. Fig. 12 shows the phylogenetic tree connecting the plotted nodes of paternal ancestors and common ancestors. Fig. 13 displays a simplified migration pattern. This haplotype potentially has a Scandinavian Fig. 12 Fully networked biogeographical phylogenetic tree (n=18).

P a g e 7 next age range of 830 to 975 years ago. The haplogroup, dates and locations are all consistent with the Norse and Viking raids on the British Isles. Web Resources Y-Utility: Y-DNA Comparison Utility, http://www.mymcgee.com/tools/yutility.ht ml?mode=ftdna_mode References Fig. 13 Simplified genetic flow. In the event that haplotype data does not have self-reported origins, biogeographical multilateration (BGM) has the potential to narrow the range as the analysis can be used to solve for an unknown location. Finding the previously unidentified historical homeland can aide in genealogical research. Clustering of common ancestor data may indicate the stationary phase sites. As the common ancestor sites span the continent, we can see the intermediate nomadic locations that connect to the origins of our haplogroups. BGM can be a major tool in developing genetic migration patterns at the individual haplotype level to bridge the gap between the modern era and the maps of our Neolithic origins. Acknowledgements I thank all of the DNA donors who have made their results publically accessible for review. Special thanks to Dean McGee for making his DNA analysis website available. Cavalli-Sforza LL (2002). Demic diffusion as the basic process of human expansions. Examining the farming/language dispersal hypothesis. Cambridge: McDonald Institute for Archaeological Research, 79-88. Chikhi L, Nichols RA, Barbujani G, & Beaumont MA (2002). Y genetic data support the Neolithic demic diffusion model. Proceedings of the National Academy of Sciences, 99(17), 11008-11013. Cota-Ruiz J, Rosiles JG, Sifuentes E, & Rivas-Perea P (2012). A low-complexity geometric bilateration method for localization in wireless sensor networks and its comparison with least-squares methods. Sensors, 12(1), 839-862. Deshpande O, Batzoglou S, Feldman MW, & Cavalli-Sforza LL (2009). A serial founder effect model for human settlement out of Africa. Proceedings of the Royal Society B: Biological Sciences, 276(1655), 291-300. Hazelwood L, Steele J (2004) Spatial dynamics of human dispersals - Constraints on modelling and archaeological validation. J ARCHAEOL SCI, 31 (6) 669-679 Malanima P (2007) Decline or Growth? European Cities and Rural Economies 1300-1600. University of Vienna Peter BM, & Slatkin M (2013). Detecting range expansions from genetic data. arxiv preprint arxiv:1303.7475. Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ & Cavalli-Sforza LL (2001). The

P a g e 8 phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Annals of human genetics, 65(1), 43-62. Walsh B (2001). Estimating the time to the most recent common ancestor for the Y chromosome or mitochondrial DNA for a pair of individuals. Genetics, 158(2), 897-912. Y-Chromosome-Consortium (2002) A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res 12:339 348