Methods of Parentage Analysis in Natural Populations

Similar documents
Lecture 6: Inbreeding. September 10, 2012

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis

Inbreeding and self-fertilization

Inbreeding and self-fertilization

Lecture 1: Introduction to pedigree analysis

1) Using the sightings data, determine who moved from one area to another and fill this data in on the data sheet.

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing

Investigations from last time. Inbreeding and neutral evolution Genes, alleles and heterozygosity

Revising how the computer program

Developing Conclusions About Different Modes of Inheritance

Decrease of Heterozygosity Under Inbreeding

Large scale kinship:familial Searching and DVI. Seoul, ISFG workshop

Population Genetics 3: Inbreeding

Using Pedigrees to interpret Mode of Inheritance

Chapter 2: Genes in Pedigrees

Bayesian parentage analysis with systematic accountability of genotyping error, missing data, and false matching

4. Kinship Paper Challenge

Spring 2013 Assignment Set #3 Pedigree Analysis. Set 3 Problems sorted by analytical and/or content type

Two-point linkage analysis using the LINKAGE/FASTLINK programs

Pedigree Reconstruction using Identity by Descent

BIOL Evolution. Lecture 8

DNA: Statistical Guidelines

Pedigree Charts. The family tree of genetics

Primer on Human Pedigree Analysis:

Puzzling Pedigrees. Essential Question: How can pedigrees be used to study the inheritance of human traits?

Objective: Why? 4/6/2014. Outlines:

Linkage Analysis in Merlin. Meike Bartels Kate Morley Danielle Posthuma

PopGen3: Inbreeding in a finite population

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

NON-RANDOM MATING AND INBREEDING

CONGEN. Inbreeding vocabulary

BIOL 502 Population Genetics Spring 2017

ville, VA Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX

Bias and Power in the Estimation of a Maternal Family Variance Component in the Presence of Incomplete and Incorrect Pedigree Information

Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

ICMP DNA REPORTS GUIDE

Pedigrees How do scientists trace hereditary diseases through a family history?

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

Genetics. 7 th Grade Mrs. Boguslaw

The Pedigree. NOTE: there are no definite conclusions that can be made from a pedigree. However, there are more likely and less likely explanations

Gene coancestry in pedigrees and populations

Bottlenecks reduce genetic variation Genetic Drift

Genetic Analysis for Spring- and Fall- Run San Joaquin River Chinook Salmon for the San Joaquin River Restoration Program

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

TDT vignette Use of snpstats in family based studies

STUDENT LABORATORY PACKET

Determining Relatedness from a Pedigree Diagram

BIOINFORMATICS ORIGINAL PAPER

An Optimal Algorithm for Automatic Genotype Elimination

Parentage analysis. Every person receives a unique set of genetic information from their parents - half from Mom and half from Dad

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond

Illumina GenomeStudio Analysis

CONDITIONS FOR EQUILIBRIUM

1.4.1(Question should be rather: Another sibling of these two brothers) 25% % % (population risk of heterozygot*2/3*1/4)

Exercise 4 Exploring Population Change without Selection

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74

Eastern Regional High School. 1 2 Aa Aa Aa Aa

Need a little help with the lab?

[CLIENT] SmithDNA1701 DE January 2017

University of Washington, TOPMed DCC July 2018

How to Solve Linkage Map Problems

U among relatives in inbred populations for the special case of no dominance or

Kinship and Population Subdivision

Population Structure. Population Structure

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

Every human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary

KINALYZER, a computer program for reconstructing sibling groups

COMBINATORIAL RECONSTRUCTION OF HALF-SIBLING GROUPS

Using Meiosis to make a Mini-Manc

NIH Public Access Author Manuscript Genet Res (Camb). Author manuscript; available in PMC 2011 April 4.

On identification problems requiring linked autosomal markers

Forensic use of the genomic relationship matrix to validate and discover livestock. pedigrees

DAR POLICY STATEMENT AND BACKGROUND Using DNA Evidence for DAR Applications

VIPER: a visualisation tool for exploring inheritance inconsistencies in genotyped pedigrees

SNP variant discovery in pedigrees using Bayesian networks. Amit R. Indap

Package pedantics. R topics documented: April 18, Type Package

Relative accuracy of three common methods of parentage analysis in natural populations

COMMUNITY UNIT SCHOOL DISTRICT 200 Science Curriculum Philosophy

COMBINATORIAL RECONSTRUCTION OF HALF-SIBLING GROUPS

Automated Discovery of Pedigrees and Their Structures in Collections of STR DNA Specimens Using a Link Discovery Tool

Supporting Online Material for

Contributed by "Kathy Hallett"

DNA Testing. February 16, 2018

KINSHIP ANALYSIS AND HUMAN IDENTIFICATION IN MASS DISASTERS: THE USE OF MDKAP FOR THE WORLD TRADE CENTER TRAGEDY

DNA Parentage Test No Summary Report

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations

DNA Testing What you need to know first

Non-Paternity: Implications and Resolution

9Consanguineous marriage and recessive

Maximum likelihood pedigree reconstruction using integer programming

DNA Parentage Test No Summary Report

Guidelines. General Rules for ICAR. Section 1 - General Rules

I genetic distance for short-term evolution, when the divergence between

Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves

Transcription:

Methods of Parentage Analysis in Natural Populations Using molecular markers, estimates of genetic maternity or paternity can be achieved by excluding as parents all adults whose genotypes are incompatible with offspring under consideration

Important tool for: Parentage Breeding systems Difference between observed and actual, mate choice Estimating reproductive success of males and females and the traits of successful males and females (e.g., body size, social status) Managers Breeding assessments for captive individuals Identification of individuals dispersing into the population Population estimates Mark recapture estimates using genetic signatures of known family groups. Identifying dead-beat dads (a human issue)

Methods Exclusion Earliest and conceptually simplest technique Categorical and fractional likelihood Complete exclusion is not possible Assigns progeny to non-excluded parents based on likelihood scores derived from their genotypes Genotype Reconstruction Uses multilocus genotypes of parents and offspring to reconstruct the genotypes of unknown parents contributing gametes to a progeny array for which one parent is known a priori

Exclusion Based on Mendelian rules of inheritance Uses incompatibilities between parents and offspring to reject particular parent offspring hypotheses. Female Genotype: A/A Offspring Genotype: A/B Excluded: A/C Not Excluded: A/B *Powerful when there are few candidate parents and highly polymorphic genetic markers available. Impractical if the pool of candidate parents becomes large -Due to the large number of loci needed to yield a single non-excluded parent. Many exclusion programs can allow the user to specify the number of mismatches necessary for an exclusion to be considered valid, making the method more robust to the difficulties imposed by mutations or scoring errors.

Categorical Allocation Categorical allocation uses likelihood-based approaches to select the most likely parent from a pool of non-excluded parents. This method involves calculating a logarithm of the likelihood ratio (LOD score) by: Determining the likelihood of an individual (or pair of individuals) being the parent (or parents) of a given offspring divided by the likelihood of these individuals being unrelated. Offspring are assigned to the parent (or parental pair) with the highest LOD score. LOD = 0 or negative - offspring are unassigned. Contrary to strict exclusion methods, likelihood-based allocation methods usually allow for some degree of transmission errors due to genotype misreading or mutation.

Fractional allocation The fractional allocation method assigns some fraction, between 0 and 1, of each offspring to all non-excluded candidate parents. The portion of an offspring allocated to a particular candidate parent is proportional to its likelihood of parenting the offspring compared to all other nonexcluded candidate parents. Single parent and parent pair likelihoods are calculated in the same way as in the categorical allocation method. Assumes genotypes are known from all parents in the population and that one parent is known for the offspring under consideration. The fraction of offspring (O=k) awarded to a candidate male j (MP=j) conditional on female I (FP=i) is denoted by F ij : LOD=ln(L 1 /L 2 ) Natural log of the ratio of 2 likelihoods ln(1)=0 Male LOD F ij 1 0.5 0.8 2 0.4 0.2

Parental Reconstruction Uses the multilocus genotypes of parents and offspring to reconstruct the genotypes of unknown parents contributing gametes to a progeny array for which one parent is known a priori. Existing techniques reconstruct the minimum number of parental genotypes necessary to explain the data set. For the case in which the mother is known, all possible paternal genotypes consistent with at least one progeny in the data set are tested in combination to determine which minimum set of paternal genotypes can explain the entire progeny array. Extremely computationally intensive using algorithms Especially for progeny arrays with more than six fathers. Genotypes in offspring array A / A A / C C / C C / D Female alleles (known) A* B C* D E Male alleles (unknown) A B C D E x x x x x

Prior to analysis Collection of data for parentage analysis is just as important as the management of the compiled data set. Ideal situation - large groups of offspring are collected from known mated pairs of adults Molecular techniques needed only to verify the truth. Estimating parentage is still accurate if offspring can be collected in family groups with their mothers and a complete sample of males from the population is obtained. As we lose sample size the likelihood that the missing samples contain the true parental genotype increases, along with our ability to correctly assign offspring. Jones and Arden (2003) emphasized the importance of knowing the constraints of your particular study. The proportion of adults that can be sampled, the techniques and markers to be used, and how the analysis will proceed is critical in the design of an experimentally or hypothesis driven research design.

Markers? Statistical power is increased as a function of (a) the number of loci used (b) allelic diversity and heterozygosity Analyses assume Hardy-Weinberg (i.e., you can infer population genotype frequencies and the expectation of observing a genotype at random in the population from the frequency of alleles) Microsatellites are the most powerful for biological systems. # of loci used will depend upon exclusion probabilities. Analytical techniques can apply to any dominant marker Amplified Fragment Length Polymorphisms (AFLP s) Uni-parentally inherited cytoplasmic (e.g., mitochondrial) markers.

x x x x x No BC male in pool of candidates Are offspring data more consistent with 1 or >1 male parents?

Genetic Determination of Parentage Situation an offspring or progeny array of unknown parentage is found and we wish to assign parents one parent (usually the female) is known neither parent is known Analyses become more computationally difficult when fewer data are available assignment of one vs two parents numbers of progeny in progeny array number of loci available (high or low) characteristics of loci (allelic diversity and distribution of allele frequencies) with k alleles there are (k 2 +k)/2 possible genotypes background information and setting in which offspring and parents are placed Analyses become more computationally difficult depending on whether analyses are based on exclusions or in cases where all putative parents (or parental pairs) can t be excluded, when parentage is assigned on the basis of probabilities

Uses of paternity analysis 1. In absence of observational data on movements, analyses provide a measure of the distances males (or their gametes) moved. 2. Actual rates of selfing and outcrossing can be obtained (plants) as can relative to inbred vs outbred matings 3. Genetic relatedness of progeny from a singe female can be determined proportion of progeny that are full ½ sibs 4. Number of paternal individuals which have fertilized a single female can be determined 5. Paternity studies provide insights into sperm competition and sperm storage 6. Relative reproductive contributions of males as a function of phenotype or other ecological correlates can be determined 7. Differential survival and fertility of offspring from specific events can be followed 8. In absence of known pedigrees, analysis aid design of breeding programs

Evaluation of Statistical Power Probability of exclusion Exclusion= mis-match [putative parent not possible given genotypes of offspring and parent(s)] Probability of exclusion and probability of assignment: probability of finding a specific genotype in the population 1. number of loci assayed 2. degree of polymorphism 3. allele frequency distribution 4. number of potential parents 5. number of progeny Inclusion Inclusion= no mis-matches However, programs account for the possibility of mutation or error and for the fact that you have not sampled all possible parents

Estimating the likelihood of multiple paternity or maternity Based on exclusions for example, in a clutch the most alleles you can have at a single Mendelian locus is 4 (e.g., both parents heterozygous for different alleles). Based on probabilities of concurrent paternity even when multiple paternity is not observed on the basis of presence of foreign alleles, there is often a non-zero probability that the genotypes of the progeny array are consistent with multiple parentage. This should be tested against the probability of single parentage. Statistical power increases as a function of the number of loci, allelic diversity, and number of offspring in the clutch

Descriptive Statistics Heterozygosity Heterozygosity Expected heterozygosity H e = n (1 - Σp i2 ) n-1 Where n is the number of individuals used to determine the allele frequencies and p i is the frequency p of the ith allele Observed heterozygosity H o = N AB /N Where AB represents a heterozygous genotype (i.e. A and B are alleles)

Descriptive Statistics Hardy-Weinberg Equilibrium Genotype AA AB BB Genotypic Frequencies p 2 2pq q 2 Expected Np 2 N2pq Nq 2 Observed NAA NAB NBB

Estimating the likelihood of parentage in absence of exclusions (after Meagher and Thompson, 1986) Consider an ordered triplet of genotypes (g B, g C, g D ) at a single autosomal locus for 3 individuals (B, C, D). We are interested in identifying triplets consisting of an offspring (B) and the maternal (C) and paternal (D) parents. The statistical properties of triplets under different relational situations are: (UU) B, C, and D are unrelated and thus the triplet contains neither parent of B (QU) C is the parent of B but D is unrelated and the triplet contains 1 parent (QQ) C and D are the parents of B and thus the triplet contains both parents

Estimating the likelihood of paternity given non-exclusion (con t) The probabilities of these triplets will be denoted as P(g B,g C,g D R) where the relationship R is one of the 3 previous possibilities (UU, QU, QQ) P(g B,g C,g D UU) = P(g B )*P(g C )*P(g D ) P(g B,g C,g D QU) = P(offspring gb parent gd)*p(gc)*p(gd) or T(gB gd,--)*p(gc)*p(gd) P(g B,g C,g D QQ) = P(offspring gb parents gc,gd)*p(gc)*p(gd) or T(gB gc,gd)*p(gc)*p(gd) Which relationship is more likely given the data [P(R data)] use LOD P(gi) is the expected frequency of the i th genotype (under Hardy-Weinberg) and T denotes the transmission probabilities from putative parents to offspring

Assignment Statistics LOD Scores Likelihood T(gB gc,gd)*p(gc)*p(gd) Where T is the probability of allele transmission of parents (C and D) to Offspring B; and gb, gc, and gd are the genotypes of offspring individual B and candidate parents C and D Likelihood ratio L(H1,H2 D) = P(D H1) P(D H2) Where H1 is the hypothesis that the candidate parental pair is the true parental pair and H2 is the hypothesis that another candidate parental pair is the true parental pair and D denotes the data in the form of offspring and parental genotypes LOD scores (Logarithm of Odds) used in instances where there is more than one possible relationship in order to demonstrate which is more likely LOD = log e P(D H1) P(D H2)

A fish example using real data to show formula are applied Here, population allele frequencies of the A and a alleles were estimated to be 0.3 and 0.7, respectively so expected frequencies of each genotype can be estimated. C Possible D Possible mother father AA aa p 2 = 0.09 q 2 = 0.49? B Aa Offspring whose parents we wish to find 2pq=0.42 Likelihood relationship QQ= T(gB gc,gd)* )*P(gC)*P(gD) = 1 * 1 * 0.09 * 0.49 = 0.0441 Likelihood relationship P(g B,g C,g D UU) ) = P(g B )*P(g C )*P(g D ) = 0.42 * 0.09 * 0.49 So the probability of adults C and D being the parents =0.0185 are about 3 times more likely than 2 random adults from LOD = log e (0.0441/0.0185) the population based just on this one locus.