Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application Population genetics Phylogenetics 1
Coalescence The merging of ancestral lineages going back in time. Rosenberg & Nordborg 2002 History 2
History Ewens (1972) sampling formula Griffiths (1980) molecular variation 1940 s 1950 s 1960 s 1970 s 1980 s Gustave Malecot s path toward the coalescent Harris (1966) and Lewontin & Hubby (1966) begin measurements of molecular variation Watterson (1974-76) gene frequencies Wakeley 2009 History: According to Kingman Ewens (1972) sampling formula Kingman s (1982) pub on the coalescent Watterson s gene frequencies Hudson (1990) review of the coalescent Wakeley s (2009) text on Coalescent Theory 1970 s 1980 s 1990 s 2000 s 2010 s Genealogical connection? Hudson and Tajima (1983) pubs on similar topic 1974: Australia and the Wright-Fisher model of evolution Wakeley 2009, Nordborg 2001, Kingman 2000 3
Kingman s argument The Wright-Fisher model as equivalent to rule that each member of a generation chooses its mother at random from the previous generation and each member s choice is independent 2 members of same generation have a probability (1 N -1 ) r of having different ancestors r generations back (if N > ) Trace back lines until they coalesce or the number of lines is reduced to one, by means of a Markov chain Kingman 2000 Kingman s Moral Articles on coherent random walks are very mathematically heavy If equations were thought about probabilistically, then family tree wouldn t have been overlooked Simplification: mutation is non-recurrent (mutant is independent of the parent) Those who analyze stochastic models should always lift their eyes from their equations to ask what they actually mean. Kingman 2000 4
Coalescence Definitions and Descriptions Coalescence The merging of ancestral lineages going back in time. Rosenberg & Norborg 2002 5
Lines of Descent Crandall & Templeton 1993 Coalesce Vs. Diverge http://home.cc.umanitoba.ca/%7eumbagher/39.769/presentation/presentation.html 6
Population Genetics Understand forces that produce and maintain genetic variation within species Mutation, recombination, natural selection, population structure, and random transmission of genetic material from parents to offspring Coalescent theory is a part of theoretical population genetics Wakeley 2009 Coalescent Theory Describes the connection between demographic history and genetic data, and provides a framework for extracting information from samples of DNA sequences Often too simple to explain all aspects of variation Wakeley 2009 7
Coalescent Theory Describes the genetic ancestry of a sample of sequences and makes predictions about patterns of genetic variation Gene genealogy set of ancestral relationships among the members of the sample Times to common ancestry Gene genealogies are unobservable and are treated like random variables in a statistical setting Wakeley 2009 Lines of Descent Genealogy Crandall & Templeton 1993 8
The Model Assumptions and Uses Population Genetics: Natural population Fundamental problems: 1) no replication of experiment, only one run of evolution is available to be studied 2) starting conditions of the experiment are unknown Allelic states are statistically dependent because of linkage and shared ancestry mutation, recombination, and coalescence of lineages in the ancestry of the sample Rosenberg & Nordborg 2002 9
Population Genetics: Natural population Heuristics don t fully account for uncertainty from inherent randomness of evolution Solution past modeled stochastically and model constructs random genealogies the coalescent To model a genealogy, you need to consider recombination and coalescence of lineages Rosenberg & Nordborg 2002 Basic principle In the absence of selection Sampled lineages can be viewed as randomly picking their parents, as they go back in time Whenever two lineages pick the same parent, their lineages coalesce Eventually all lineages coalesce into a single lineage, the MRCA (most recent common ancestor) of the sample Rosenberg & Nordborg 2002 10
The source of genetic variation polymorphism at a particular site results from mutations along branches of the genealogical tree, which connects sampled copies of the site to their MRCA. Rosenberg & Nordborg 2002 The basic principle behind the coalescent only necessary to keep track of the times between coalescence events [ T(3) and T(2) ] and the topology (which lineages coalesce with which) Rosenberg & Nordborg 2002 11
Basic principle Rate at which lineages coalesce depends on: Lineages picking their parents more lineages = faster rate Size of the population more parents to choose from = slower rate Selectively neutral mutations do not affect reproduction, they can be superimposed on the tree afterwards Rosenberg & Nordborg 2002 Factors included Changes to rate of coalescence variation in reproductive success age structure skewed sex ratios Changes to shape of genealogical trees population structure fluctuation in population size Recombination (random graph vs. tree) Selection the real difficulty! some genotypes reproduce more than others (i.e. lineages do not randomly pick parents) Rosenberg & Nordborg 2002 12
Classical vs. Coalescent Traditional: simulated evolution of entire population, forwards in time, until equilibrium is reached, then sample is taken forward-in-time approach more appropriate for studies of how the long-term behavior of evolutionary systems depends on initial conditions Rosenberg & Nordborg 2002 Classical vs. Coalescent Coalescent: simulates the genealogy of the sample going back in time until MRCA, then add mutations forwards along the branches of the new trees studies of the effects of past evolutionary forces on current genetic variation use individuals that are ancestral computational efficiency increased Rosenberg & Nordborg 2002 13
Coalescence and Phylogenetics Phylogenetics: What is the true tree? Coalescence: What caused the tree? Both methods give a tree and the parameters Probability distributions used (Bayesian) Phylogenetics: probability distribution for tree and includes uncertainty in parameters Coalescence: probability distribution for parameters and includes uncertainty in tree http://www.rni.helsinki.fi/~boh/teaching/bayes/lecture9.pdf Genealogical and Phylogenetic Fundamentally different Developed to determine pattern of species descent (assumed tree-like) Sequences from individuals, genealogy estimated from sequences Estimated gene tree used to draw conclusions about relationships between species Gene tree equivalent to species tree Rosenberg & Norborg 2002 14
Gene Trees and Species Trees Two levels of error: 1) gene tree for sequences will be incorrectly inferred if there is sufficient random or systematic error 2) even if gene tree is correctly inferred, deep gene coalescence (ancestral polymorphisms), gene duplication, and lateral gene transfer can produce a gene tree different from the true species tree Slowinski et al. 1997 Branches of species tree similar length as genealogical tree in species Resolved as long as time intervals between species-branching events are much greater than time intervals between lineage-branching events in each species, gene and species divergences are likely to be nearly congruent. Branches of species tree much longer than genealogical tree in species Rosenberg & Norborg 2002 15
Application Population Genetics and Phylogenetics Application Modeling tool for population genetics Used to analyze DNA sequence polymorphism data Based on realization that genealogy is usually easier to model backward in time and that selectively neutral mutations can be superimposed afterwards Nordborg 2001 16
Application Widely applied in studies of evolution Estimates time to common ancestor Can provide evidence for balancing selection Estimates of recombination and rate of selfing Assessing migration patterns in human ancestry (Y chromosome and MtDNA) Kingman 2000 Population Genetics 17
Population Genetics Approach Development of coalescent-based statistical methods for analyzing DNA sequence samples θ = 4Nµ estimators via Watterson (1975) and Tajima (1983) unbiased under the neutral Wright-Fisher model improvements by Felsenstein (1992) and Fu and Li (1993), Fu (1994) Fu & Li 2002 Population Genetics Approach Maximum Likelihood 1) Griffiths and Tavare (1994, 1995) Monte Carlo method 2) Kuhner et al. (1995) Monte Carlo estimator and Metropolis-Hastings method 3) Fu (1998) Maximum-likelihood method Fu & Li 2002 18
Ex: Population Genetics Approach Palaeo-distributional model generated by projecting ecological niche model (current distribution onto model of past climatic condition) Coalescent simulations used help model population genetic structure and compare phylogeography among different taxa Carstens & Richards 2007 Phylogenetics 19
Ex1: Phylogenetic Approach Gene tree parsimony: terminal sequences of a gene tree have shared a single history represented by a binary tree Finds species tree that minimizes weighted sum of different kinds of incongruence needed to fit each gene tree to a species tree via GeneTree (Page & Charleston 1997) Slowinski et al. 1997 Ex2: Phylogenetic Approach Incorporating a model of stochastic loss of gene lineages by genetic drift into a phylogenetic estimation procedure can provide a robust estimate of grasshopper species relationships Use of ESP (estimated species phylogeny) with coalescent-based approach VS Concatenation of multiple loci Carstens & Knowles 2007 20
Grasshopper Results Coalescent approach: accurate relationships estimated Provided direct statistical evaluation of ESP, versus inferring it from topology of gene tree Concatenation approach: forced topological congruence Estimated trees did not accurately reflect species tree (with recently derived species) Carstens & Knowles 2007 Grasshopper Results They suggested that the coalescent approach may bridge gap between systematics and population genetics ESP chosen maximizes probability of gene trees Carstens & Knowles 2007 21
Ex3: Phylogenetic Approach Methods for estimating gene trees (modelbased estimation of sequence parameters (Ronquist & Huelsenbeck 2003)) commonly used Methods to estimate lineage trees (phylogenetics) from one or more gene trees using coalescent methods is underdeveloped Belfiore et al. 2008 Ex3: Phylogenetic Approach Better solution Incorporate models of stochastic mutation along with gene coalescence directly into estimation of lineage trees (Felsenstein, Maddison, Takahata) can increase efficiency and accuracy, via increasing number of loci and individuals, can infer lineage relationships in cases of rapid radiation Belfiore et al. 2008 22
Ex3: Phylogenetic Approach Problem: individual gene trees often fail to match lineage tree when divergence times are very short relative to effective population size of the ancestral populations Belfiore et al. 2008 Ex3: Phylogenetic Approach Solution: increase # of loci sampled or increase # of gene copies per taxon where larger # coalescence events in common ancestors Gain information on relative divergence times and topology of lineage tree to overwhelm noise from stochastic lineage sorting Belfiore et al. 2008 23
Ex3: Belfiore et al. 2008 Rapid radiation of Thomomys, species borders and relationships partitioned Bayesian analysis of concatenated sequences (Ronquist & Huelsenbeck 2003) VS new Bayesian method using coalescent framework to simultaneously estimate gene trees and species trees from multi-locus data (Edwards et al. 2007, Liu & Pearl 2007) resolution and comparison to previous phylogenetic analyses Phylogenetic Approach Evaluate extension of coalescent approach use in recent radiations (estimate species trees when multiple individuals are sequenced per taxon) previous methods were based on assumption that loci are congruent and monophyletic within species, otherwise different approach is needed to avoid wrongly assuming that all genes have the same history Belfiore et al. 2008 24
Phylogenetic Approach Coalescent-based: estimates species tree from a single sampled allele per taxon (Liu & Pearl 2007) New method: coalescent-based approach allows for divergent histories of independent genes and directly infers species tree, given samples of multiple alleles per gene per species (Belfiore et al. 2008) Belfiore et al. 2008 Phylogenetic Approach Concatenated each locus considered a partition and assigned its own substitution model assumes that all loci have the same evolutionary history (species tree estimation same as gene tree estimate) Belfiore et al. 2008 25
Phylogenetic Approach BEST (Bayesian Estimation of Species Trees) Bayesian hierarchical model, estimates species trees from distribution of gene trees (across multiple loci) modified to incorporate multiple alleles from each taxon into probability density function of gene trees, given species trees (Liu et al. 2008) assumes no reticulation among taxa Belfiore et al. 2008 Results Concatenated method did not show level of conflict among gene trees BEST method directly estimates relationships among taxa, rather than individuals more biologically realistic and captures basic principles of lineage sorting Belfiore et al. 2008 26
Belfiore s final thoughts Call for coalescent methods that can be applied at the interface of phylogenetic and population processes Powerful tool: coalescent method that can test between hypotheses of recent reticulation versus a relatively recent rapid speciation event (resulting in incomplete lineage sorting) Belfiore et al. 2008 Future Molecular data most applications with samples of mtdna, Y chromosome for a better picture, more loci need to be looked at nuclear genome Population genetics continuation of Wright-Fisher model little knowledge of natural selection model better model would include migration and population growth Fu & Li 2002 27
Summary History of the coalescent and coalescence involved many great thinkers The model is mathematically complex, but has a simple biological theme Applications were began in population genetics but are being introduced to phylogenetics Story of old ways versus new ways Questions? Questions? 28