Large scale kinship:familial Searching and DVI. Seoul, ISFG workshop

Similar documents
ICMP DNA REPORTS GUIDE

Primer on Human Pedigree Analysis:

4. Kinship Paper Challenge

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing

Supporting Online Material for

Non-Paternity: Implications and Resolution

Methods of Parentage Analysis in Natural Populations

1/8/2013. Free Online Training. Using DNA and CODIS to Resolve Missing and Unidentified Person Cases. Click Online Training

Free Online Training

1) Using the sightings data, determine who moved from one area to another and fill this data in on the data sheet.

Pedigrees How do scientists trace hereditary diseases through a family history?

DNA Parentage Test No Summary Report

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

Developing Conclusions About Different Modes of Inheritance

Manual for Familias 3

DNA: Statistical Guidelines

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

BAYESIAN STATISTICAL CONCEPTS

Pedigree Charts. The family tree of genetics

Puzzling Pedigrees. Essential Question: How can pedigrees be used to study the inheritance of human traits?

Halley Family. Mystery? Mystery? Can you solve a. Can you help solve a

Statistical methods in genetic relatedness and pedigree analysis

DAR POLICY STATEMENT AND BACKGROUND Using DNA Evidence for DAR Applications

Pizza and Who do you think you are?

DNA Basics. OLLI: Genealogy 101 October 1, ~ Monique E. Rivera ~

Statistical Interpretation in Making DNA-based Identification of Mass Victims

DNA Parentage Test No Summary Report

Lecture 1: Introduction to pedigree analysis

DNA Parentage Test No Summary Report

[CLIENT] SmithDNA1701 DE January 2017

Using Y-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

TRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter

Contributed by "Kathy Hallett"

THE BASICS OF DNA TESTING. By Jill Garrison, Genealogy Coordinator Frankfort Community Public Library

Automated Discovery of Pedigrees and Their Structures in Collections of STR DNA Specimens Using a Link Discovery Tool

Pedigree Reconstruction using Identity by Descent

On identification problems requiring linked autosomal markers

Chapter 2: Genes in Pedigrees

Chromosome X haplotyping in deficiency paternity testing principles and case report

Autosomal DNA. What is autosomal DNA? X-DNA

What Can I Learn From DNA Testing?

Two-point linkage analysis using the LINKAGE/FASTLINK programs

Autosomal-DNA. How does the nature of Jewish genealogy make autosomal DNA research more challenging?

KINSHIP ANALYSIS AND HUMAN IDENTIFICATION IN MASS DISASTERS: THE USE OF MDKAP FOR THE WORLD TRADE CENTER TRAGEDY

DNA Testing. February 16, 2018

Using Autosomal DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Introduction to Autosomal DNA Tools

An Introduction. Your DNA. and Your Family Tree. (Mitochondrial DNA) Presentation by: 4/8/17 Page 1 of 10

Ewing Surname Y-DNA Project Article 8

Using Pedigrees to interpret Mode of Inheritance

NON-RANDOM MATING AND INBREEDING

Your Family 101 Beginning Genealogical Research

Genesis and Genetics Matthew Price

Laboratory 1: Uncertainty Analysis

University of Washington, TOPMed DCC July 2018

1.4.1(Question should be rather: Another sibling of these two brothers) 25% % % (population risk of heterozygot*2/3*1/4)

Getting the Most Out of Your DNA Matches

Using X-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

Spring 2013 Assignment Set #3 Pedigree Analysis. Set 3 Problems sorted by analytical and/or content type

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

GEDmatch Home Page The upper left corner of your home page has Information about you and links to lots of helpful information. Check them out!

DNA Solu)ons for Brick Walls And Adop)on

have to get on the phone or family members for the names of more distant relatives.

Lutz Roewer, Sascha Willuweit Dept. Forensic Genetics, Institute of Legal Medicine and Forensic Sciences Charité Universitätsmedizin Berlin, Germany

Genetic Identity and

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Inbreeding and self-fertilization

and g2. The second genotype, however, has a doubled opportunity of transmitting the gene X to any

Statistical DNA Forensics Theory, Methods and Computation

Linkage Analysis in Merlin. Meike Bartels Kate Morley Danielle Posthuma

Gene coancestry in pedigrees and populations

DNA Testing What you need to know first

Probability and Counting Rules. Chapter 3

Inbreeding and self-fertilization

Genetics. 7 th Grade Mrs. Boguslaw

DNA Parentage Test No Summary Report

Statistical DNA Forensics Theory, Methods and Computation

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Genetic Genealogy. Rules and Tools. Baltimore County Genealogical Society March 25, 2018 Andrew Hochreiter

Revising how the computer program

Bias and Power in the Estimation of a Maternal Family Variance Component in the Presence of Incomplete and Incorrect Pedigree Information

Lecture 6: Inbreeding. September 10, 2012

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Objective: Why? 4/6/2014. Outlines:

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

The Mismatch Between Probable Cause and Partial Matching

Enhanced Kinship Analysis and STR-based DNA Typing for Human Identification in Mass Fatality Incidents: The Swissair Flight 111 Disaster

Mathematics 'A' level Module MS1: Statistics 1. Probability. The aims of this lesson are to enable you to. calculate and understand probability

TDT vignette Use of snpstats in family based studies

Meek DNA Project Group B Ancestral Signature

DNA for Genealogy Librarians. Patricia Lee Hobbs, CG Local History & Genealogy Reference Associate Springfield-Greene County Library District

Mitochondrial DNA (mtdna) JGSGO June 5, 2018

Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes.

Compound Probability. Set Theory. Basic Definitions

Genealogical Research

Find JCD Project Date: Identification-DNA Process Updated:

Transcription:

Large scale kinship:familial Searching and DVI Seoul, ISFG workshop 29 August 2017

Large scale kinship Familial Searching: search for a relative of an unidentified offender whose profile is available in a DNA database, with the intention of identifying the offender indirectly DVI: search a list of unidentified persons against each other and against a list of missing persons, with the intention of making identifications Familial searching is used to generate investigative leads; it is generally impossible to achieve a very high power (probability to find relatives if present) without having too many false positives In DVI it is important not to overlook any identifications Both are large scale applications; statistical properties need to be understood for optimal application and understanding of results. 2

Familial Searching Compute likelihood ratios for paternity (PI) or for being sibs (SI) Most efficient strategy (in terms of number of false positives per true positive) is to extract everyone whose LR exceeds a prespecified threshold t Large t: fewer false positives, but also less probability to find a relative Small t: more probability to find relative, but also more false positives The theory of Block 2 can be used to make ROC curves, in which we plot Log 10 P LR > 10 t H d, P LR > 10 t H p ) = Log 10 FPR 10 t, TPR 10 t as a function of t This can be done for a specific profile, or averaged 3

ROC curve Sibling Index (Averaged over profiles) Black:NGM Dotted:SGMPlus TPR: True Positive Rate FPR: False Positive Rate E.g.: on NGM profiles, LR-threshold of 10000 corresponds to FPR of about 10^-5 and TPR about 0.6 4

ROC curve Paternity Index (Averaged over profiles) 5

Different case, different challenge If someone has very common alleles then he ll share common alleles with his relatives. These give rise to a low LR, since it s fairly easy to obtain them by chance Since it s easy to obtain them by chance, relatively many unrelated people will genetically look like they could be related Hence, relatives hard to find If someone has very rare alleles then he s likely to share rare alleles with his relatives. These give rise to a high LR, since it s fairly hard to obtain them by chance Since it s hard to obtain them by chance, relatively few unrelated people will genetically look like they could be related Hence, relatives easier to find 6

SI, SGMPlus loci, various profiles 7

Siblings: LR-ranking in 100.000 SGM+ profiles 8

Fully Bayesian interpretation Assuming that, in a database with N individuals There is at most one relative of a given kind The probability of this relative being person i is π i and the probability of there not being any relative is π 0 = 1 Suppose the obtained likelihood ratios are r 1,, r N. Set r 0 = 1. Then the probability that person i is the relative we look for is π i r i N j=0 π j r j Interpretation: the posterior probability is proportional to the prior probability and to the LR; for people outside the database no information is obtained. N i=1 π i 9

Case of equal priors If all π i are the same then the LR for the hypotheses 1. The database contains a relative of the specified type (H1) 2. The database does not contain a relative of the specified type (H2), is equal to 1 N r N i i=1 i.e., the average obtained LR for all individuals. Note that, if there is no relative in the whole database, we expect all r i to be 1, and so also the LR for the two hypotheses above is 1 in expectation, which it also needs to be in accordance with general theory. If there is no relative in the database, we expect the sum of all LR s that we obtain to be equal to the number of people searched against. 10

Analogy with possibly tricked deck of cards Familial search database size N Deck of N=52 cards Person 1 has PI=N and all other PI=0 Person 1 is parent/child of unknown offender N i=1 PI i = N No evidence that the database contains a relative If it does contain a relative, it has to be person 1 First card drawn is ace of spades Deck is tricked as aces of spades only LR in favour of tricked as aces of spades is 52; LR in favour of other tricks is 0; sum of 52 LR s is 52. No evidence that the deck is tricked If it is tricked, it has to be tricked as aces of spades. 11

Example 12

Strategy at the NFI 1. Autosomal search by PI and SI LR-threshold for further investigation equal to 1000, irrespective of number of loci compared This is enough to warrant interest but not nearly enough for identification Carry out additional DNA testing until either LR<1000 (stop) or LR>1,000,000,000 When no more additional testing can/needs be done, also compute Halfsib index for other types of relatedness Report any PI/SI/HSI equal to 100,000 or more 2. Y-chromosomal search: further type all profiles with at most one difference 13

By the book Utrecht: serial sex offender, unknown FS top ranked SI equals 5 million Additional profiling: SI> 10^9, Y-STR 22/23 match, mtdna 1 mismatch A brother of the database person was arrested and turned out to yield a direct dna match. Convicted. 14

Not by the book Familial search yields woman with PI=39000 Further testing: parent-child relation excluded Further testing: mitochondrial profiles match Age of woman + moment of crime: full/ maternal half siblings unlikely; paternal half-sibs? Mito-match best explained by maternal relationship Most support for autosomal and mitochondrial profiles: woman is sister of offender s mother. This was indeed verified with a full match. 15

Familial Searching illustrates The utility of LR distributions for case pre-assessment: ROC curves for the specific profile at hand, or averaged out to judge applicability of the method The irrelevance of LR distributions once the LR s have been calculated: posterior probabilities depend only on evidence we have, not on evidence we could have had That evidence should not be interpreted in terms of false rates: even if the false positive rate is very small, it may be that most of the positives are false positives That one should be careful with an equal prior for paternity testing: doing this with familial searching could lead to a probability of paternity >99,99% for several fathers which is clearly absurd. A LR alone can not be used to conclude about the true relationship. 16

Disaster Victim Identification Large lists of missing persons MP i and of unidentified individuals UI j Usually software would compute LR i,j, the LR for MP i = UI j versus Mp i is not related to Ui j If missing persons are related to each other, then neither hypothesis may be true In order not to overlook a possible identification, NFI uses a uniform mutation model Real mutations are often single step but a uniform model can be helpful when inconsistencies between genotypes are due to silent alleles or clerical errors 17

Pedigree with several MP s MN MG FN FG FGS MS1 MS2 MS3 M F FS1 FS1P FS2 FS2P Squares: men V X Y Z J K L Circles: women Reported Killed Not Available Reference Sample V1 V2 V4 V3 V5 : 5 victims 18

Hypothesis choosing With our 5 victims and 12 missing persons, we can construct many hypotheses: H 1 specifying that some victims are some MP s: number of possible such propositions is: 1 victim: 60 possibilities 2 victims: 1320 possibilities 3 victims: 13200 possibilities 4 victims: 59400 possibilities 5 victims: 95040 possibilities. H 2 : can specify that all victims are unrelated to the MP s, but relations between the victims are also possible. Huge number of combinations! For conceptual and computational reasons: choice to start with 1 victim versus 1 MP, equality vs. unrelatedness. Result: 60 Likelihood Ratios. 19

Resulting LR>100 victim 1-m V-f LR = 6x10 4 victim 2-m FS2-m / F-m FN-f J-m / K-m / L-f LR = 2x10 4 LR = 8x10 3 LR = 4x10 2 victim 3-f M-f LR = 1x10 9 victim 4-m Y-m / Z-f LR = 2x10 5 victim 5-f FS2-m / F-m J-m / K-m / L-f FN-f LR = 4x10 4 LR = 5x10 2 LR = 1x10 2 20

Since Identification P(DNA Victim 3 is M) P(DNA Victim 3 is unrelated to M) = 10 9 and no other LR s relating Victim 3 to the pedigree are large, we decide that Victim 3 is M. Now we can continue in the same way as before, calculating 4*11=44 LR s: 21

Calculation 2: 4 victims with pedigree victim 1-m V-f GENDER!! LR = 2x10 6 victim 2-m FS2-m / F-m FN-f J-m / K-m / L-f LR = 2x10 4 LR = 8x10 3 LR = 4x10 2 victim 3 = M victim 4-m Y-m / Z-f LR = 2x10 5 victim 5-f FS2-m / F-m J-m / K-m / L-f FN-f LR = 4x10 4 LR = 5x10 2 LR = 1x10 2 22

Combined Likelihood Ratio For analogous reasons, decide that Victim 1 is V. The joint LR for (Victim 3=M, Victim 1=V) vs. unrelated is the product of the two LR s that we ve computed: let G(V3) be the DNA-profile of victim 3 and analogous for G(V1). Then P(G(V3),G(V1) V3=M, V1=V) P(G(V3),G(V1) unrelated) = P(G(V3) V3=M,V1=V) P(G(V1) G(V3),V3=M,V1=V) P(G(V3) unrelated) P(G(V1) G(V3), unrelated) = 10 9 2 10 6 = 2 10 15. 23 However: the unrelatedness assumption is questionable: V3 and V1 have large LR for being parent-child! If we choose as alternative hypothesis that they are parent-child unrelated to the pedigree, then the combined LR is again one billion.

Pedigree so far MN MG FN FG FGS MS1 MS2 MS3 V3 F FS1 FS1P FS2 FS2P V1 X Y Z J K L Missing persons Not Available V2 Reference Sample V4 V5 : 3 victims left Victim added 24

victim 1 = V Calculation 3: 3 victims with pedigree victim 2-m F-m LR = 1x10 9 FS2-m LR = 5x10 6 FN-f LR = 4x10 6 Gender mismatch! victim 3 = M victim 4-m Y-m / Z-f LR = 2x10 5 victim 5-f FS2-m / F-m J-m / K-m / L-f FN-f LR = 2x10 5 LR = 1x10 3 LR = 6x10 2 25

Branching LR for (V2=F) versus (V2 unrelated to pedigree) is 10 9 LR for (V2=FS2) versus (V2 unrelated to pedigree) is 5 10 6. This is not a contradiction: it just shows that the alternative hypothesis (V2 unrelated to pedigree) is far less likely than these two propositions. Conclusion: without other information (i.e. equal prior odds) the probability that V2=F is (almost) 99,5 %; the probability that V2=FS2 is (almost) 0,5 %. Continuing with both pedigrees in this case does not alter these probabilities: we get high LR s for Victim 4=Y, Victim 5=FN. These positions in the pedigree can not distinguish between F and FS2 either. So: high LR s alone are not sufficient to draw hard conclusions. 26