DNA, Ancestry, and Your Genealogical Research- Segments and centimorgans Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018 1
Today s agenda Brief review of previous DIG session Structure of Chromosome Inheritance of DNA Uses of shared DNA Retrieving amounts of shared DNA from Ancestry Expected amounts of shared DNA from basic genetics Detailed review of most recent Shared cm Project results Online interactive charts of Shared cm Project results Comparison of shared DNA between siblings 2
Identifying DNA Segments DNA composed of strands of large molecules known as nucleotides Chromosome two strands of DNA bound together as a double helix by nucleotide base pairs: Adenine and Thymine A-T Cytosine and Guanine C-G 99.9% of DNA in humans is identical SNP Single Nucleotide Polymorphism - a difference in a single nucleotide along DNA strands which are otherwise identical. The parts that differ can be identified by SNPs which are used as markers in DNA testing. Segments - Differences in DNA are identified by patterns of hundreds or thousands of SNPs. These patterns of SNPs are used to identify DNA segments. DNA Base Pairs Adenine Thymine Cytosine Guanine G-C T-A DNA Segment on chromosome 1 for Person A SNP 1 Identical sequences of hundreds of base pairs SNP 2 A-T A-T DNA Segment on chromosome 1 for Person B 3
Mapping and Measuring DNA Chromosome Maps The genealogically interesting parts of chromosomes are the bits in which people differ from each other. Mega Base Pairs Physical distance along a chromosome is measure in millions of base pairs referred to as Mb. centimorgans Different sections of the chromosome are more likely than others to change when child DNA is formed. DNA sections which have the same probability of changing are measured in cm for centimorgans. Segments Segment Map identified by SNP markers Segments SNPs markers identify DNA segments. The start and end of segments are specified in Mb. The length of segments are given in cm. Adapted from NHGRI Fact Sheets Genome.gov 4
The origin of IBD segments is depicted via a pedigree of 12 individuals. Segments which are Identical by Descent (IBD): Example of 1 st Cousins Each box (male) and circle (female) represents a chromosome pair for the named person. For example, bars could be the chromosome pair for chromosome 1. Due to crossing over, offspring inherit recombinant chromosomes of their parents. The first order cousins in the bottom row, Karen and Louis, share one IBD segment (borders marked by grey lines). Both have inherited this IBD segment from the same individual, namely their grandfather Carl (orange colored chromosome in the top row). Albert Bertha Carl Donna Edward Fiona Gregory Helen Ian Janice Karen Louis Adapted from Gklambauer, Wikimedia Commons 5
Degree of separation The degree of separation is a measure of the distance between two people on separate lines of a Most Recent Common Ancestor (MRCA) descendant tree. Genealogically - the number of ancestors and descendants between the two people on a descendant tree Genetically - the number of DNA parent-child relationships between two people. The larger the degree of separation between two people, the less DNA they will share. 6
Uses of Shared DNA Data in Genealogy 1. Total Shared DNA Measures total amount shared DNA in cm and Number of Segments a. Disprove or help confirm possible relationships identified using records-based genealogy b. Suggest possible relationships to matches identified through DNA testing 2. Segment Analysis Identifies individual DNA segments inherited from specific ancestors a. Determine Most Recent Common Ancestor shared with relatives 7
DNA Data from Ancestr y Obtaining the Shared DNA Data from an Ancestr y Match Screen Go to a screen for one of your Shared Matches Click on the icon following the Confidence: level The number of cm and segments will be displayed. Click on the notes box. icon to open the Copy and paste the number of cm and segments into the notes box. The notes box may be viewed on the Shared Matches pages next to the match s name. You may also record the data in an Excel spreadsheet. 8
DNA Data from Ancestry Downloading Data Using DNAGedcom.com Downloads from AncestryDNA Ancestry screens only download raw DNA file File is useful for uploading to ftdna, and several third-party tools such as gedmatch and Genome Mate Pro but not for direct analysis by genealogists File contains only data for tester does not contain any information about tester s matches. DNAGedcom can download Ancestry DNA data and match data to a csv file which can be used by Excel. The capability to download Ancestry data requires the DNAGedcom PC software which costs $ 5/month. Ancestry Matches, Trees, and Shared Matches with DNA data can be downloaded. All Ancestry matches with greater than about 7 cm are downloaded. For me, this is over 13,000 matches. Downloads can take from 30 minutes to several hours. Instructions for use of DNAGedcom PC software may not match current PC version Ancestry Data Downloaded using DNAGedcom 1. Match Data Match name Number of people in match s pedigree tree Range of possible relationships Ancestry s level of confidence in match Amount of shared DNA in cm Number of shared segments Any notes you may have entered for match 2. Shared Matches (ICW) Data Match name Shared match (ICW) name The amount of DNA shared between the match and the shared match is NOT included. This is different from gedmatch which does include this data. 9
% shared 100% (Method I)/50% (Method II) Average autosomal DNA shared by pairs of relatives, in percentages and centimorgans Total cm shared half-identical (or better) Degrees of Separation Relationship 3400.00 Identical twins (monozygotic twins) 50% 3400.00 1 Parent/child 50% (Method I)/37.5% (Method II) 2550.00 2 Full siblings 25% 1700.00 Grandparent/grandchild, aunt-or-uncle/niece-or-nephew, half-siblings 25% (Method I)/23.4375% (Method II) 1593.75 Double first cousins 12.5% 850.00 4 First cousins, great-uncle or aunt/great-nephew or niece, 6.25% 425.00 5 First cousins once removed, half first cousins, 6.25% 425.00 Double second cousins 3.125% 212.50 6 Second cousins, first cousins twice removed, half first cousin once removed 1.563% 106.25 7 Second cousins once removed, half second cousins 0.781% 53.13 8 Third cousins 0.391% 26.56 9 Third cousins once removed 0.195% 13.28 10 Fourth cousins, third cousins twice removed Adapted from ISOGG Wiki Autosomal DNA Statistics 10
Probability that Relationship Between Two Cousins May Be Detected AncestryDNA includes more processing steps than the other companies which allows them to claim better detection. Ancestry documents the following steps: Underdog identifies maternal and paternal chromosomes in an autosome (phasing) based on segment statistics. J-Germline finds matching DNA samples in Ancestry database Timber statistically weights chromosome segments based on occurrence in population. Relationship Family Tree Degree of Separation 23andMe AncestryDNA DNA Family Finder First cousins 4 100% 100% 100% Second cousins 6 100% 100% >99% Third cousins 8 89.7% 98% >90% Fourth cousins 10 45.9% 71% >50% Fifth cousins 12 14.9% 32% >10% Sixth cousins 14 4.1% 11% Courtesy of ISOGG Wiki Cousin Statistcs Remote (typically less than 2%) [2] 11
The Shared cm Project What is it? Data collection project to gather data on the amount of shared DNA in cm for various known relationships. The latest published version of the their results is based on over 25,000 known relationships. Analysis project which presents charts and tables showing the amount of shared DNA in cm for a range of relationships. Why is it needed? Most of the similar published sets of charts and tables seem to be less well documented and may be less accurate than the Shared cm Project. Data Collection Projects Other charts and tables based on data collections don t seem to provide information about the numbers of samples they re based on or which testing companies the samples come from. They also don t indicate which are the most recent versions. An is example is the green chart from DNA Detectives. Simulation Projects Simulation studies make many assumptions about the population they re modeling (e.g. the birth rates) and the biology of chromosome inheritance (e.g. recombination rates) which are not always well documented and may not match the population of interest to the genealogist. They also don t disclose sensitivity studies which indicate how their results change based on changes to simulation assumptions. 12
Contents of Shared cm Project - Version 3 pdf File Title Page Most Valuable Pages for Use with Ancestry DNA Results 1 The Shared cm Project Version 3.0 1 2 Using the Shared cm Project 2 3 Table 1. The Cluster Chart 3 4 Figure 1. The Relationship Chart 4 One page chart summarizing multi-company results 5 Histograms 5 6 Table 2. Relationship Histograms 6-21 7 Table 3. Company and Endogamy Breakdown 22-29 Important to use company specific results for more distant relationships 8 Table 4. Relationship Chart 30-31 Two page table summarizing multi-company results 13
Review Shared cm Project Shared cm Project pdf 14
Interactive Access to the Share cm Project Chart DNAPainter.com 15
Comparison of Ancestr y and Shared cm Project Relationship Ranges Ancestry has published the expected range of shared cm used by Ancestry for estimating various relationships. The Shared cm Project is a collaborative data collection project managed by Blaine Bettinger. The Project has published ranges of shared cm based on DNA test results for more than 25,000 known relationships. The chart shows the averages and ranges of shared cm for various relationships from Ancestry and the Shared cm Project. Some substantial differences are shown for the shared cm ranges for Ancestry and Shared cm Project. The amounts of shared cm as a percent of the total chromosomal cm from the Shared cm Project and from basic genetics are also shown. Genetic Degrees of Separation Ancestry Predicted Relationship Ancestry's Range of Shared DNA (cm) Shared cm Project Observed Relationship Shared cm Project Ancestry Average (cm) Shared cm Project Ancestry 5th - 95th Percentile Range (cm) Shared cm Project Average Percent Shared Expected Percentage of Shared cm from Basic Genetics 1 Parent/Child 3475 Parent/Child 3445 3283-3671 50.0% 50.0% 2 Immediate 2400-2800 2585 37.5% 37.5% 2309-2841 Family Siblings 2 Grandparent 1729 1245-2184 25.1% 25.0% 3 Close Family 1450-2050 Aunt/Uncle/Ni 1720 25.0% 25.0% 1428-1998 ece/nephew 4 1C 680-1150 1C 849 636-1094 12.3% 12.5% 4 Great Great 876 12.7% 12.5% 406-1491 Aunt/Uncle Aunt/Uncle 5 1C1R 1C1R 420 215-635 6.1% 6.3% 6 2C 200-620 2C 219 93-390 3.2% 3.1% 6 1C2R 1C2R 217 67-384 3.1% 3.1% 7 2C1R 2C1R 112 31-221 1.6% 1.6% 8 3C 90-180 3C 64 14-146.93%.78% 8 2C2R 2C2R 64 18-141.93%.78% 9 3C1R 3C1R 39 9-100.57%.39% 10 3C2R 3C2R 30 7-69.44%.20% 10 4C 20-85 4C 29 7-68.42%.20% 11 4C1R 4C1R 22 6-50.32%.10% 12 5C - 8C 6-20 5C 18 6-39.26%.05% 13 5C1R 5C1R 16 6-39.23%.02% 16
Name Differences in Ancestr y Shared DNA between Siblings Relation Ancestry Predicted Relationship (cousins) Walter Walter's Sister Walter s Brother Ancestry Ancestry Shared Shared Shared Shared Shared Predicted Predicted DNA DNA DNA DNA DNA Relationship Relationship (cm) segments (cm) segments (cm) (cousins) (cousins) Shared DNA segments Sister sibling Immediate 2,490 65 Immediate Immediate 2,317 61 Brother sibling Immediate 2,626 54 Immediate Immediate 2,626 54 Match 1 1C 1st 915 38 1st 676 36 1st 1,079 45 Match 2 2C 2nd 338 20 2nd 235 14 3rd 199 10 Match 3 2C1R 3rd 168 9 3rd 96 5 4th 29.7 3 Match 4 3rd 157 8 3rd 97 5 3rd 111 8 Match 5 2C1R 3rd 130 8 3rd 155 8 3rd 160 9 Match 6 3C 3rd 113 6 3rd 106 7 3rd 154 7 Match 7 3C 3rd 109 6 3rd 95 7 3rd 96 6 Match 8 3C 3rd 107 6 4th 69 5 3rd 165 9 Match 9 3C 3rd 98 4 4th 31 3 3rd 191 7 Match 10 3rd 95 6 4th 71 6 4th 26.9 3 Match 11 4th 72 5 missing 4th 39 5 Match 12 3C 4th 68 4 4th 21.5 2 4th 55 3 Match 13 4th 63 4 3rd 123 6 4th 38 3 17
Questions? 18