DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding by Dr. Ing. Robert L. Baber 2014 July 26 Rights reserved, see the copyright notice at http://gengen.rlbaber.de http://gengen.rlbaber.de Robert L. Baber 0
Topics 1. DNA Basics types of DNA and their inheritance DNA structure, STR, SNP Haplogroups 2. Y DNA Marker Tables 3. Ancestral Trees paternal only vs. both parents all vs. only branching ancestors (MRCAs) 4. Properties of Ancestral Trees usual assumptions 5. Mutation Graphs 6. Comparison of Marker Tables, Ancestral Trees and Mutation Graphs differences, similarities, correspondence 7. Summary http://gengen.rlbaber.de Robert L. Baber 1
1. DNA Basics: Types of DNA Each cell in the human body contains many copies of mitrochondrial DNA (mtdna) outside the nucleus and 23 pairs of chromosomes in the nucleus. The 23rd pair consists of either o one X and one Y chromosome (male) or o two X chromosomes (female) http://gengen.rlbaber.de Robert L. Baber 2
1. DNA Basics: Inheritance of the Types of DNA Y DNA (the Y chromosome) is passed from father to son unchanged except for occasional mutations. mtdna is passed from mother to each child, son or daughter, unchanged except for rare mutations. Each male receives mtdna from his mother but does not pass it on to his children. The rest of the DNA (22 chromosome pairs and the X chromosomes) is called autosomal DNA. Random selections from the autosomal DNA of the two parents are combined and passed down to the child. http://gengen.rlbaber.de Robert L. Baber 3
1. DNA Basics: DNA Structure DNA has a helical structure, that is, the structure of a twisted or spiral ladder. Each rung in the ladder consists of a pair of submolecules, each called a nucleobase or simply base. I.e. each rung in the ladder is a base pair. There are four different types of bases in DNA: adenine, cytosine, guanine and thymine, abbreviated A, C, G and T respectively. Any sequence of rungs in the DNA ladder can be specified or described by the corresponding sequence of the letters A, C, G and T. http://gengen.rlbaber.de Robert L. Baber 4
1. DNA Basics: DNA Size The 23 pairs of chromosomes together contain more than 3 billion bp (3,000,000,000 base pairs). The X chromosome contains about 155 million bp (155,000,000 base pairs). The Y chromosome contains about 59 million bp (59,000,000 base pairs). Each copy of the mtdna contains 16,569 positions. http://gengen.rlbaber.de Robert L. Baber 5
1. DNA Basics: STR A short tandem repeat, usually abbreviated STR, is a sequence of base pairs within a chromosome in which a shorter subsequence is repeated a number of times. Such an STR is also often called a marker. The value of the marker is the number of times the shorter subsequence is repeated within the STR. STRs are commonly used in Y DNA analyses at the family (surname) level. http://gengen.rlbaber.de Robert L. Baber 6
1. DNA Basics: SNP A single nucleotide polymorphism or SNP is a single base pair (a single rung in the DNA ladder) that occurs in different forms, i.e. in both an original and in a mutated form. Mutated forms of certain SNPs are, for example, used as a basis for defining haplogroups and haplosubclades at all levels. One example is the SNP M75. A mutation of the relevant base from the original G to the mutated A identifies a particular subclade of Y haplogroup E. SNPs also convey many other types of genetic information. http://gengen.rlbaber.de Robert L. Baber 7
1. DNA Basics: Haplogroups In the course of time, mutations giving rise to SNPs have occurred in the Y chromosome. These mutations subdivide repeatedly the collection of Y chromosomes in humans, forming a hierarchical relationship between the individuals in the population. The results of the first subdivision are called haplogroups. The subgroups are called subclades. There are (currently) 20 top level Y haplogroups, each denoted by a letter of the alphabet. Each of these haplogroups is further subdivided hierarchically into subclades. mtdna is similarly classified into haplogroups and subclades. http://gengen.rlbaber.de Robert L. Baber 8
1. DNA Basics: Haplogroups Two notational forms are currently in use for Y haplogroups and their subclades. The first denotes a subclade by a sequence of letters and digits specifying in full detail the subclade s hierarchical position in the haplogroup tree, e.g. E1b1b1a1b1a4. As the tree grows, these names become ever longer and less readable. Also, as modifications are made to the structure of the tree as new information becomes available, the names are changed to reflect the new hierarchy. For example, E1b1b1a1b1a6 was an earlier name for the current E1b1b1a1b1a4. http://gengen.rlbaber.de Robert L. Baber 9
1. DNA Basics: Haplogroups The second notational form consists of the highest level letter followed by the SNP defining the lowest level subclade. For subclade E1b1b1a1b1a4 in the previous slide, the second notational form is E L241. This notation will not change. Of course, if a person is retested and found to be in a lower level subclade than previously known, then his subclade notation will change to the SNP defining his newly known lowest level subclade. E.g. if a person previously known only to be in subclade E V13 is retested as L241 positive, then his subclade will be changed to E L241, which is a subclade of E V13. The second notational form is not (yet?) used for mtdna. http://gengen.rlbaber.de Robert L. Baber 10
2. Y DNA Marker Tables A marker table contains each tested person s marker values. Such a sequence of marker values is called a haplotype. A convenient form for a marker table is one data row for each tested person s marker values (haplotype). one data column for each marker. A marker (column) with only one value (all tested persons have the same value) can be omitted. http://gengen.rlbaber.de Robert L. Baber 11
2. Y DNA Marker Tables If there are n different values for a marker one value can come from the common ancestor, every other value, i.e. n 1 values, must come from a mutation. Add n 1 over all markers. The result is the minimum number of mutations needed to explain the observed Y DNA test results assuming a common ancestor. http://gengen.rlbaber.de Robert L. Baber 12
2. Y DNA Marker Tables: Example Marker group: 1 12 13 25 26 37 38 67 68 111 Marker name: DYS DYS DYS DYS DYS DYS DYS DYS Y G DYS CDYb Name 439 449 464c 570 481 617 710 589 A10 650 Bert 12 33 16 22 35 22 13 11 Carl 13 16 23 14 11 Fred 13 33 17 22 35 23 14 34 0 13 22 Guy 13 33 16 22 36 23 13 35 11 12 21 Hugh 13 33 16 22 35 22 13 34 11 13 22 Jim 14 32 16 23 35 22 13 Paul 14 32 16 23 35 22 13 min # mutations 2 1 1 1 1 1 1 1 1 1 1 At least 12 mutations from a common ancestor are needed to explain these results. http://gengen.rlbaber.de Robert L. Baber 13
2. Y DNA Marker Tables: Example Again: One value for any particular marker can come from the common ancestor. Each other value for that marker must come from a mutation. Therefore, for any marker with n different values, n 1 mutations must have occurred at least. Duplicate or exactly reversing mutations are uncommon in ancestral trees typical of family research, with fewer than about 50 to 100 generations along all paths. In larger trees, duplicate or reversing mutations occur sometimes. Then more than the minimum number of mutations as calculated above will have occurred. http://gengen.rlbaber.de Robert L. Baber 14
3. Ancestral Trees Several different types family tree, with mothers and fathers, brothers and sisters, spouses, uncles, aunts, cousins, etc. paternal lines only, with fathers, sons, brothers, uncles, male cousins, only (e.g. no spouses). paternal lines only, with tested persons and their most recent common ancestors (MRCAs) only. A Y DNA ancestral tree (paternal lines only) can show how the various persons Y DNA marker values were derived from those of the common ancestor. http://gengen.rlbaber.de Robert L. Baber 15
3. Ancestral Trees: Example A8 A7 A6 A4 A5 A1 A2 A3 Bert Carl Fred Guy Hugh Jim Paul Tested persons: Bert, Carl, Fred, Guy, Hugh, Jim and Paul http://gengen.rlbaber.de Robert L. Baber 16
3. Ancestral Trees: Tested Persons and Branching Nodes (MRCAs) Only A8 A7 A6 A4 A5 A2 Bert Carl Fred Guy Hugh Jim Paul http://gengen.rlbaber.de Robert L. Baber 17
3. Ancestral Trees: Ancestors Marker Values If, for example, Carl and Fred have the same value 13 for some marker, but their MRCA had a different value for that marker, then two identical mutations and hence more than the minimum number would be needed. Typically, therefore, their MRCA will have the same value. This principle can be used to determine many marker values of ancestors. A2, 13 Carl, 13 Fred, 13 http://gengen.rlbaber.de Robert L. Baber 18
3. Ancestral Trees: Ancestors Marker Values More generally, if any two persons in an ancestral tree have the same value of a marker, then every person on the ancestral path between those two persons will typically have the same value of that marker. Otherwise, more than the minimum number of mutations would be required. A4, 16 A2, 16 Carl, 16 Fred, 17 Guy, 16 http://gengen.rlbaber.de Robert L. Baber 19
4. Properties of Ancestral Trees Each node represents one person. A top node is present and represents the common ancestor. Ancestor descendant relationship is shown. The numbers of generations between nodes are usually known and indicated. A Y DNA ancestral tree branches downward only. http://gengen.rlbaber.de Robert L. Baber 20
4. Properties of Ancestral Trees Definition: A minimal ancestral tree is a paternal line only ancestral tree including at least two bottom level persons, the MRCA1 of all bottom level persons, and the various intermediate MRCAs only and in which only the minimum possible number of mutations occurred. In such an ancestral tree all upper level nodes are branching nodes. A minimal tree is usually assumed until shown to be impossible for the given Y DNA marker values. MRCA1 MRCA2 MRCA3 MRCA4 A B C D E http://gengen.rlbaber.de Robert L. Baber 21
4. Properties of Ancestral Trees MRCA1 MRCA2 MRCA3 MRCA4 A B C D E If, in a minimal tree, there is a mutation crossing over the dashed line, then the bottom level person receiving the mutation will have a unique value of the marker in question. That is, no other bottom level person will have that marker value. If, on the other hand, there is no mutation crossing over the dashed line, then at least two nodes will have the same haplotype. In this example, nodes A and B will have the same haplotype, as will nodes D and E. http://gengen.rlbaber.de Robert L. Baber 22
4. Properties of Ancestral Trees MRCA1 MRCA2 MRCA3 MRCA4 A B C D E Therefore, in a minimal tree either some bottom level person has a unique value OR at least two bottom level persons have the same haplotype. If there is neither a unique value nor two bottom level persons with the same haplotype, then the tree is not minimal; it requires more than the minimum number of mutations. http://gengen.rlbaber.de Robert L. Baber 23
4. Properties of Ancestral Trees: Additional Property for Mutations Restating: IF two or more persons (haplotypes) have a common ancestor AND the minimum number of mutations suffices to explain how the haplotypes derive from a common ancestor THEN either at least two persons have the same haplotype OR at least one person has a unique value. This property enables us to derive the structure of an ancestral tree from the Y DNA marker values only. Such derivation is the topic of another document on this web site. A minimal tree is usually assumed until shown to be impossible for the given Y DNA marker values. http://gengen.rlbaber.de Robert L. Baber 24
5. Mutation Graphs A mutation graph is a graph connecting several different haplotypes with each other. is, in effect, a compressed, abstract version of an ancestral tree. has no top. does not indicate the direction of ancestral relationships. can be derived from a marker table. http://gengen.rlbaber.de Robert L. Baber 25
5. Mutation Graphs Each node represents a haplotype (not a person). Each line between nodes represents one or more mutations. A mutation graph has the branching structure of a tree; it contains no cycle. A mutation graph shows the mutational relationships between the several persons haplotypes. http://gengen.rlbaber.de Robert L. Baber 26
5. Mutation Graphs Initial assumption for constructing/deducing a mutation graph: A minimal ancestral tree exists for the tested persons and their MRCA. Deriving a mutation graph from a marker table (Y DNA test results) is the subject of another unit in this series. http://gengen.rlbaber.de Robert L. Baber 27
5. Mutation Graphs The mutation graph below corresponds to the example of the marker table in Section 1 and the ancestral tree in Section 2 above. Carl H1 Hugh Bert H2 Fred Guy Jim Paul http://gengen.rlbaber.de Robert L. Baber 28
5. Mutation Graphs: Deducing an Ancestral Tree But... where is the top? One cannot tell from the mutation graph alone. In the case of the preceding slide, we know that the tested persons Bert, Carl, Fred, Guy, Hugh, Jim and Paul are at the bottom level of the tree. But only Fred, Guy and Jim/Paul are outer, or end, or bottom nodes, that is, nodes with only one connecting line each. The others Bert, Carl and Hugh must be extracted from their nodes. (They share their nodes with some of their ancestors.) The top (common ancestor of all) is still unknown. It could be any ancestor of Carl and Fred, of Guy, of Hugh, of Bert, or of Jim and Paul. Picking up any of these nodes lets the rest of the mutation graph fall into the form of an ancestral tree, with the tested persons at the bottom level. Any of these trees would be a possible ancestral tree for the persons tested. A known top can be introduced: the mode of the haplotypes of other surnames in the same haplosubclade, a known earlier ancestor or a more distant relative. http://gengen.rlbaber.de Robert L. Baber 29
6. Comparison of Marker Tables, Ancestral Trees and Mutation Graphs A marker table lists for each tested person his marker values (haplotypes). An ancestral tree shows how the various persons descended from their common ancestor and how the various persons marker values were derived from those of the common ancestor. A mutation graph shows the mutational relationships between the several persons haplotypes. http://gengen.rlbaber.de Robert L. Baber 30
6. Comparison: an Ancestral Tree and its Mutation Graph H2 A8 Each green dashed border encloses one node in the mutation graph. A7 A6 H1 A4 A5 A1 A2 A3 Bert Carl Fred Guy Hugh Jim Paul http://gengen.rlbaber.de Robert L. Baber 31
6. Comparison: an Ancestral Tree and its Mutation Graph Each green dashed border in the preceding slide encloses one node in the mutation graph. One node in the mutation graph includes all persons in the ancestral tree with the same haplotype. Persons on the lines between nodes in the mutation graph have haplotypes that are intermediate between the connected nodes. http://gengen.rlbaber.de Robert L. Baber 32
6. Comparison: the Ancestral Tree and its Mutation Graph Node in Mutation Graph Bert Carl Fred Guy Hugh Jim/Paul H1 H2 Node(s) in Ancestor Tree Bert, A1, A7 Carl, A2 Fred Guy Hugh, A6 Jim, Paul, A3, A5 A4 A8 http://gengen.rlbaber.de Robert L. Baber 33
6. Comparison: an Ancestral Tree and a Mutation Graph Property ancestral tree mutation graph each node represents one person one haplotype has a top node yes no lines between nodes indicate direction of ancestral relationship mutations between nodes yes persons, generations per node 1 person http://gengen.rlbaber.de Robert L. Baber 34 no not necessarily (0 or more) at least 1 (1 or more) nodes per haplotype 1 or more 1 number of generations between nodes often identified and shown 1 or more cannot be determined or shown Ancestral trees and mutation graphs are closely related, but also fundamentally different.
7. Summary A marker table lists the tested persons Y DNA marker values. is a basis for analysis. An ancestral tree shows the relationships between tested persons and their ancestors. A mutation graph shows the mutational relationships between haplotypes. All three are useful in family genealogical research. http://gengen.rlbaber.de Robert L. Baber 35
7. Summary From a marker table one can derive a mutation graph. From a mutation graph one can derive an ancestral tree. From a marker table and an ancestral tree one can deduce ancestors Y DNA. http://gengen.rlbaber.de Robert L. Baber 36
7. Summary Marker Table Mutation Graph Ancestors Y DNA Ancestral Tree http://gengen.rlbaber.de Robert L. Baber 37
Our ancestors never completely died. They continue to live in our DNA. http://gengen.rlbaber.de Robert L. Baber 38
The End http://gengen.rlbaber.de Robert L. Baber 39