Kinship and Population Subdivision

Similar documents
NON-RANDOM MATING AND INBREEDING

Population Structure. Population Structure

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Comparative method, coalescents, and the future

Bottlenecks reduce genetic variation Genetic Drift

Population Genetics 3: Inbreeding

Lecture 6: Inbreeding. September 10, 2012

Inbreeding and self-fertilization

BIOL Evolution. Lecture 8

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

Forward thinking: the predictive approach

Inbreeding and self-fertilization

CONGEN. Inbreeding vocabulary

Decrease of Heterozygosity Under Inbreeding

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Chapter 2: Genes in Pedigrees

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Population Structure and Genealogies

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Detecting inbreeding depression is difficult in captive endangered species

BIOL 502 Population Genetics Spring 2017

Spring 2013 Assignment Set #3 Pedigree Analysis. Set 3 Problems sorted by analytical and/or content type

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

Analysis of geographically structured populations: Estimators based on coalescence

Pedigree Reconstruction using Identity by Descent

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Objective: Why? 4/6/2014. Outlines:

Investigations from last time. Inbreeding and neutral evolution Genes, alleles and heterozygosity

Methods of Parentage Analysis in Natural Populations

Received December 28, 1964

Pedigrees How do scientists trace hereditary diseases through a family history?

Determining Relatedness from a Pedigree Diagram

Supporting Online Material for

U among relatives in inbred populations for the special case of no dominance or

2 The Wright-Fisher model and the neutral theory

Statistical methods in genetic relatedness and pedigree analysis

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing

Coalescent Theory: An Introduction for Phylogenetics

ICMP DNA REPORTS GUIDE

University of Washington, TOPMed DCC July 2018

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis

Exercise 4 Exploring Population Change without Selection

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

Lecture 1: Introduction to pedigree analysis

Laboratory 1: Uncertainty Analysis

Alien Life Form (ALF)

Developing Conclusions About Different Modes of Inheritance

Conservation Genetics Inbreeding, Fluctuating Asymmetry, and Captive Breeding Exercise

Genetic Research in Utah

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

PopGen3: Inbreeding in a finite population

Linkage Analysis in Merlin. Meike Bartels Kate Morley Danielle Posthuma

SCRIPT. Voltage Dividers

Printer Model + Genetic Algorithm = Halftone Masks

Functions of several variables

Common ancestors of all humans

DNA: Statistical Guidelines

How to Solve Linkage Map Problems

NIH Public Access Author Manuscript Genet Res (Camb). Author manuscript; available in PMC 2011 April 4.

CIS 2033 Lecture 6, Spring 2017

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

Using Autosomal DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

Puzzling Pedigrees. Essential Question: How can pedigrees be used to study the inheritance of human traits?

BIOLOGY 1101 LAB 6: MICROEVOLUTION (NATURAL SELECTION AND GENETIC DRIFT)

Bioinformatics for Evolutionary Biologists

Outcome 7 Review. *Recall that -1 (-5) means

Chapter 12 Gene Genealogies

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

4. Kinship Paper Challenge

can mathematicians find the woods?

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

The Coalescent Model. Florian Weber

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

Characterization of the global Brown Swiss cattle population structure

Bioinformatics I, WS 14/15, D. Huson, December 15,

Autosomal DNA. What is autosomal DNA? X-DNA

Characterization of the Global Brown Swiss Cattle Population Structure

The Coalescent. Chapter Population Genetic Models

CSE 100: RED-BLACK TREES

Not To Be Quoted or Cited Without Permission of the Author 6/01/03 THE CONCEPT OF THE FAMILY: DEMOGRAPHIC AND GENEALOGICAL PERSPECTIVES

An Idea for a Project A Universe for the Evolution of Consciousness

On identification problems requiring linked autosomal markers

The Two Phases of the Coalescent and Fixation Processes

Remember that represents the set of all permutations of {1, 2,... n}

DAR POLICY STATEMENT AND BACKGROUND Using DNA Evidence for DAR Applications

INFERRING PURGING FROM PEDIGREE DATA

Human Pedigree Genetics Answer Key

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

Cover Page. The handle holds various files of this Leiden University dissertation

Probability and Genetics #77

STAT 536: The Coalescent

SET THEORY AND VENN DIAGRAMS

MODERN population genetics is data driven and

Full Length Research Article

Introduction to Autosomal DNA Tools

Transcription:

Kinship and Population Subdivision Henry Harpending University of Utah The coefficient of kinship between two diploid organisms describes their overall genetic similarity to each other relative to some base population. For example, kinship between parent and offspring of 1/4 describes gene sharing in excess of random sharing in a random mating population. In a subdivided population the statistic F st describes gene sharing within subdivisions in the same way. Since F st among human populations on a world scale is reliably 10 to 15%, kinship between two individuals of the same human population is equivalent to kinship between grandparent and grandchild or between half siblings. The widespread assertion that this is small and insignificant should be reexamined. KEY WORDS: coefficient of kinship; coefficient of relationship; inclusive fitness. COEFFICIENT OF KINSHIP It is easy to understand why parental care evolved in many lineages: parents and offspring share genes so that parental effort devoted to offspring is in fact effort devoted to the parent s own genes. Hamilton (1964) formalized this insight and extended it to arbitrary degrees of relationship. When Hamilton and others described the theory they often spoke in terms of gene identity by descent, thinking for example of the one half of the nuclear genes the in a diploid offspring that are identical to those in the parent. Many authors also spoke of shared genes. Neither of these descriptions is completely accurate. I may share many genes with, say, an onion, but this Please address correspondence to Henry Harpending, Department of Anthropology, University of Utah, Salt Lake City, UT 84112, USA; email: henry.harpending@anthro.utah.edu. Population and Environment, Vol. 24, No. 2, November 2002 2002 Human Sciences Press, Inc. 141

142 POPULATION AND ENVIRONMENT gene sharing is not relevant to the evolution of social behavior within humans. A better way to think of kinship, relationship, and Hamilton s theory is to think of gene sharing in excess of random gene sharing. A parent shares many more than half his genes with an offspring, but in a random mating population half those genes are surely identical because they came from the parent, while gene sharing with the other half of the child s genome is just what is shared with any random member of the population. While Hamilton wrote his theory in terms of the coefficient of relationship, most population geneticists reason instead with the coefficient of kinship. Once kinship is known, relationship follows immediately from a simple formula (Bulmer, 1994). Here is the definition of kinship between person x and person y: pick a random gene at a locus from x and let the population frequency of this gene be p. Now pick a gene from the same locus from y. The probability that the gene in y is the same as the gene picked from x, p y is p y = F xy + (1 F xy )p. An interpretation of this is that with probability F xy the genes are the same, with probability 1 F xy they are different, in which case the probability of identity is just the population frequency p (Harpending, 1979). Rearrangement gives the definition of the coefficient of kinship: F xy = (p y p)/(1 p) (1) Kinship coefficients in a random mating diploid population are simple and well known. For example, pick a gene from me, then pick another gene from the same locus from me. With probability 1/2 we picked the same gene, while with probability 1/2 we picked the other gene at that locus. Therefore the probability that the second gene is the same as the first is just 1/2 + p/2, and substitution of this conditional frequency in the formula for kinship shows that my kinship with myself is just 1/2. The same reasoning leads to the well known values of 1/4 with my child, 1/8 with my grandchild, my half-sib, or my nephew, and so on. It is very important that the coefficient of kinship not be confused with the coefficient of relationship. These are conceptually and numerically different creatures. The coefficient of relationship can be thought of as fraction of shared genes between two organisms. This coefficient is familiar to many biologists since W. D. Hamilton developed his famous theory of kin selection in terms of the coefficient of relationship. However

143 HENRY HARPENDING most subsequent development of the theory has been in terms of kinship coefficients. In a random mating diploid population the relationship between the two co-efficients is simple: the coefficient of relationship is just twice the coefficient of kinship. This simple rule of thumb breaks down as soon as any complications like inbreeding or population structure are introduced. The best general definition of the coefficient of relation R xy between individuals x and y is (Bulmer, 1994) R xy = F xy /F xx. where F xy is the kinship between x and y and F xx is the kinship of x with himself. This has the interesting property that it is not necessarily symmetric: R xy is not in general equal to R yx. POPULATION SUBDIVISION Most of the applications of Hamilton s theory in biology have used kinship and relationship derived from genealogical relationships. For example, parental care evolves, we think, because parents and offspring share genes. But gene sharing (in excess of random gene sharing, always) can arise in other situations. In a subdivided population, individuals share genes with other members of the same deme, and these shared genes are fuel for evolution by inclusive fitness effects in exactly the same way that pedigree relationships like that between parent and child are fuel for evolution by inclusive fitness effects. I derive here the relationship between population subdivision and kinship in a very simple case, but the formulae apply much more generally than this simple derivation implies. At this point I must mention that these derivations apply to large populations. In the case of small groups ( trait groups, as D. S. Wilson calls them) I would have to consider that if we pick a gene from an individual, the frequency of that gene in the rest of the deme gene pool is slightly reduced. An exact treatment of small demes leads to annoying algebraic terms of order 1/n where n is the deme size. I am concerned with large groups and I ignore these terms. Consider a population made up of two demes of exactly the same size and a genetic locus with exactly two alleles. The conclusion of the algebra below is that the familiar statistic that describes population subdivision, F ST, is precisely kinship between members of the same deme. In other words genetic differences between demes imply genetic similarity within demes, and F ST is just the coefficient of kinship between members of the same deme

144 POPULATION AND ENVIRONMENT due to the population structure. For example F ST among human populations is about 1/8, and this is just the coefficient of kinship in a single population between grandparent and grandchild, uncle and nephew, or half-sibs. In a diverse world, members of the same population are related to each other to the same degree that grandparents and grandchildren are related to each other in a single population. There are two demes of equal size labelled A and B. At a locus the frequency of a gene is p A in deme A and p B in deme B. The frequencies in the two demes of the alternate allele are q A and q B. The overall mean frequencies are simply p and q. It is convenient to use a slightly different notation to describe the gene frequencies: p A = p +δ p B = p δ so of course q A = q δ q B = q +δ Now imagine that we pick a gene at random from the population, then pick another gene from the same locus from the same deme. What is the coefficient of kinship within demes? In order to find this we use the formula (1) above. With probability 1/2 we pick someone from population A initially, and with probability p A we pick the allele whose frequency is p A. With probability q A = 1 p A we pick the alternate allele. Putting these possibilities into equation (1) we have F = (1/2)p A (p A p)/q + (1/2)p B (p B p)/q + (1/2)q A (q A q)/ p + (1/2)q B (q B q)/p Using the substitutions above, this becomes F = {(p +δ)(δ) + (p δ)( δ)}/2q + {q δ)( δ) + (q +δ)(δ)}/2p = 2δ 2 /2q + 2δ 2 /2p and since p + q = 1

145 HENRY HARPENDING F = 4δ 2 /4pq =δ 2 /pq This is simply the F ST genetic distance between the two populations the variance of the gene frequency divided by the mean gene frequency multiplied by its complement. When F ST is reported for a collection of populations, it is essentially an average of all the pairwise population F ST statistics. The statistic is computed for each allele at each locus, then averaged over all loci. Many studies agree that F ST in world samples of human populations is between ten and fifteen percent. If small long-isolated populations are included, the figure is usually somewhat higher. A conservative general figure for our species is F ST 0.125 = 1/8. This number was given by Cavalli- Sforza in 1966, and a widely cited paper by Lewontin (1972) argued at length that this is a small number implying that human population differences are trivial. An alternative perspective is that kinship between grandparent and grandchild, equivalent to kinship within human populations, is not so trivial. For further discussion see Klein and Takahata (2002, pp. 387 390). Kinship in a Subdivided Population Equation 1 and its derivation shows that if we pick a gene at random from a population of two demes and find that that its overall frequency is p, then the frequency of that gene in the same deme is on average p same = p + (1 p)f ST while the frequency of that gene in the other deme is on average p other = p (1 p)f ST. Using equation 1 and these relations we can derive kinship and relationship coefficients within and between demes easily. An individual s coefficient of kinship with someone from his own deme is just F ST while his kinship with someone from the other deme is F ST. What about kinship with oneself in a subdivided population? Pick a gene from an individual, then pick another at random from the same individual: with probability 1/2 we picked the same gene and with probability 1/2 we picked the other one, in which case the probability it is the same is

146 POPULATION AND ENVIRONMENT p + (1 p)f ST. Therefore p self = 1/2(1 + p + (1 p)f ST ) Using equation 1, we find that F self = 1/2(1 + F ST ) rather than the simple 1/2 kinship with self in a single random mating population. It is simple to derive familiar family kinship coefficients in the same way: for example kinship with a child when the other parent is from the same deme is F child = 1/4 + 3F ST /4 and so on. In general, if the kinship in a random mating population with a relative is 1/x, then in a subdivided population the kinship with that same relative is F relative of degree x = 1/x + (1 x)f ST /x (2) What about kinship with a relative who is a hybrid between the populations? Consider, for example, a child whose other parent is from the other deme. Pick a gene from the parent: the probability of picking the same gene from the child is 1/4, the probability of picking a gene from the child not identical to the first but from the same deme as the parent is 1/4, and the probability of picking a gene from the other deme is 1/2. Putting these together, the probability of the picking the same gene is p hybrid offspring = 1/4 + 1/4(p + (1 p)f ST ) + 1/2(p (1 p)f ST ). Using equation 1, this becomes F hybrid offspring = 1/4 F ST /4. In general the same derivations shows that kinship with a hybrid relative of degree x, meaning a relative with whom kinship in a random mating population would be x, is F hybrid relative of degree x = 1/x F ST /x. (3)

147 HENRY HARPENDING The difference between equations 2 and 3 is just F ST, the difference between kinship with an intra-demic relative and a hybrid relative. Notice also that as x becomes large, equation 2 shows that kinship with a random member of the same deme is F ST and kinship with an otherwise unrelated hybrid offspring is 0. REFERENCES Bulmer, M. (1994). Theoretical Evolutionary Ecology. Sunderland, Massachusetts: Sinauer. Cavalli-Sforza, L. L. (1966). Population structure and human evolution. Proceedings of the Royal Society Series B, 164, 362 379. Hamilton, W. D. (1964). The genetic evolution of social behavior, parts 1 and 2. Journal of Theoretical Biology, 7, 1 51. Harpending, H. (1979). The population genetics of interactions. American Naturalist, 113, 622 630. Klein, J., & Takahata, N. (2002). Where Do We Come From: The Molecular Evidence For Human Descent. Berlin: Springer. Lewontin, R. C. (1972). The apportionment of human diversity. Evolutionary Biology, 6, 381 398.