Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Similar documents
Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Analysis of geographically structured populations: Estimators based on coalescence

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Forward thinking: the predictive approach

Population Structure and Genealogies

Coalescent Theory: An Introduction for Phylogenetics

Comparative method, coalescents, and the future

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Ancestral Recombination Graphs

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

TREES OF GENES IN POPULATIONS

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

2 The Wright-Fisher model and the neutral theory

BIOL Evolution. Lecture 8

Chapter 12 Gene Genealogies

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

5 Inferring Population

Viral epidemiology and the Coalescent

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Bioinformatics I, WS 14/15, D. Huson, December 15,

MODERN population genetics is data driven and

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

STAT 536: The Coalescent

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

Coalescent Theory for a Partially Selfing Population

6.047/6.878 Lecture 21: Phylogenomics II

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

Coalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA

Evaluating the performance of likelihood methods for. detecting population structure and migration

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling

Population genetics: Coalescence theory II

Exercise 4 Exploring Population Change without Selection

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II

arxiv: v1 [q-bio.pe] 4 Mar 2013

A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to. Estimate Species Trees in the Presence of Gene Flow.

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent

Pedigree Reconstruction using Identity by Descent

Chapter 4 Neutral Mutations and Genetic Polymorphisms

Approximating the coalescent with recombination

Warning: software often displays unrooted trees like this:

Coalescent genealogy samplers: windows into population history

Where do evolutionary trees comes from?

The Coalescent. Chapter Population Genetic Models

The African Origin Hypothesis What do the data tell us?

BI515 - Population Genetics

Inbreeding and self-fertilization

The Two Phases of the Coalescent and Fixation Processes

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger

Evolutionary trees and population genetics: a family reunion

The Coalescent Model. Florian Weber

Human origins and analysis of mitochondrial DNA sequences

Frequent Inconsistency of Parsimony Under a Simple Model of Cladogenesis

Your mtdna Full Sequence Results

Inbreeding and self-fertilization

SINGLE nucleotide polymorphisms (SNPs) are single cases the SNPs have originally been identified by sequencing.

can mathematicians find the woods?

Estimating Ancient Population Sizes using the Coalescent with Recombination

Introduction to Biosystematics - Zool 575

Kinship and Population Subdivision

Gene coancestry in pedigrees and populations

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

Recap: Properties of Trees. Rooting an unrooted tree. Questions trees can address: Data for phylogeny reconstruction. Rooted vs unrooted trees:

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

Investigations from last time. Inbreeding and neutral evolution Genes, alleles and heterozygosity

Research Article The Ancestry of Genetic Segments

ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Behavioral Adaptations for Survival 1. Co-evolution of predator and prey ( evolutionary arms races )

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

G ene tree discordance, phylogenetic inference and the m ultispecies coalescent

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from?

Lecture 6: Inbreeding. September 10, 2012

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Feb 3 & 5):

Bottlenecks reduce genetic variation Genetic Drift

How to use MIGRATE or why are Markov chain Monte Carlo programs difficult to use?

PHYLOGEOGRAPHIC BREAKS WITHOUT GEOGRAPHIC BARRIERS TO GENE FLOW

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling

Every human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary

Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl

On the nonidentifiability of migration time estimates in isolation with migration models

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74

Methods of Parentage Analysis in Natural Populations

The Structure of Genealogies and the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations

Recent Trends in Population Genetics: More Data! More Math! Simple Models?

Objective: Why? 4/6/2014. Outlines:

THE estimation of population genetics parameters such as

CONGEN. Inbreeding vocabulary

The Contest Between Parsimony and Likelihood. Elliott Sober*

DNA: Statistical Guidelines

Lecture 1: Introduction to pedigree analysis

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Transcription:

Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application Population genetics Phylogenetics 1

Coalescence The merging of ancestral lineages going back in time. Rosenberg & Nordborg 2002 History 2

History Ewens (1972) sampling formula Griffiths (1980) molecular variation 1940 s 1950 s 1960 s 1970 s 1980 s Gustave Malecot s path toward the coalescent Harris (1966) and Lewontin & Hubby (1966) begin measurements of molecular variation Watterson (1974-76) gene frequencies Wakeley 2009 History: According to Kingman Ewens (1972) sampling formula Kingman s (1982) pub on the coalescent Watterson s gene frequencies Hudson (1990) review of the coalescent Wakeley s (2009) text on Coalescent Theory 1970 s 1980 s 1990 s 2000 s 2010 s Genealogical connection? Hudson and Tajima (1983) pubs on similar topic 1974: Australia and the Wright-Fisher model of evolution Wakeley 2009, Nordborg 2001, Kingman 2000 3

Kingman s argument The Wright-Fisher model as equivalent to rule that each member of a generation chooses its mother at random from the previous generation and each member s choice is independent 2 members of same generation have a probability (1 N -1 ) r of having different ancestors r generations back (if N > ) Trace back lines until they coalesce or the number of lines is reduced to one, by means of a Markov chain Kingman 2000 Kingman s Moral Articles on coherent random walks are very mathematically heavy If equations were thought about probabilistically, then family tree wouldn t have been overlooked Simplification: mutation is non-recurrent (mutant is independent of the parent) Those who analyze stochastic models should always lift their eyes from their equations to ask what they actually mean. Kingman 2000 4

Coalescence Definitions and Descriptions Coalescence The merging of ancestral lineages going back in time. Rosenberg & Norborg 2002 5

Lines of Descent Crandall & Templeton 1993 Coalesce Vs. Diverge http://home.cc.umanitoba.ca/%7eumbagher/39.769/presentation/presentation.html 6

Population Genetics Understand forces that produce and maintain genetic variation within species Mutation, recombination, natural selection, population structure, and random transmission of genetic material from parents to offspring Coalescent theory is a part of theoretical population genetics Wakeley 2009 Coalescent Theory Describes the connection between demographic history and genetic data, and provides a framework for extracting information from samples of DNA sequences Often too simple to explain all aspects of variation Wakeley 2009 7

Coalescent Theory Describes the genetic ancestry of a sample of sequences and makes predictions about patterns of genetic variation Gene genealogy set of ancestral relationships among the members of the sample Times to common ancestry Gene genealogies are unobservable and are treated like random variables in a statistical setting Wakeley 2009 Lines of Descent Genealogy Crandall & Templeton 1993 8

The Model Assumptions and Uses Population Genetics: Natural population Fundamental problems: 1) no replication of experiment, only one run of evolution is available to be studied 2) starting conditions of the experiment are unknown Allelic states are statistically dependent because of linkage and shared ancestry mutation, recombination, and coalescence of lineages in the ancestry of the sample Rosenberg & Nordborg 2002 9

Population Genetics: Natural population Heuristics don t fully account for uncertainty from inherent randomness of evolution Solution past modeled stochastically and model constructs random genealogies the coalescent To model a genealogy, you need to consider recombination and coalescence of lineages Rosenberg & Nordborg 2002 Basic principle In the absence of selection Sampled lineages can be viewed as randomly picking their parents, as they go back in time Whenever two lineages pick the same parent, their lineages coalesce Eventually all lineages coalesce into a single lineage, the MRCA (most recent common ancestor) of the sample Rosenberg & Nordborg 2002 10

The source of genetic variation polymorphism at a particular site results from mutations along branches of the genealogical tree, which connects sampled copies of the site to their MRCA. Rosenberg & Nordborg 2002 The basic principle behind the coalescent only necessary to keep track of the times between coalescence events [ T(3) and T(2) ] and the topology (which lineages coalesce with which) Rosenberg & Nordborg 2002 11

Basic principle Rate at which lineages coalesce depends on: Lineages picking their parents more lineages = faster rate Size of the population more parents to choose from = slower rate Selectively neutral mutations do not affect reproduction, they can be superimposed on the tree afterwards Rosenberg & Nordborg 2002 Factors included Changes to rate of coalescence variation in reproductive success age structure skewed sex ratios Changes to shape of genealogical trees population structure fluctuation in population size Recombination (random graph vs. tree) Selection the real difficulty! some genotypes reproduce more than others (i.e. lineages do not randomly pick parents) Rosenberg & Nordborg 2002 12

Classical vs. Coalescent Traditional: simulated evolution of entire population, forwards in time, until equilibrium is reached, then sample is taken forward-in-time approach more appropriate for studies of how the long-term behavior of evolutionary systems depends on initial conditions Rosenberg & Nordborg 2002 Classical vs. Coalescent Coalescent: simulates the genealogy of the sample going back in time until MRCA, then add mutations forwards along the branches of the new trees studies of the effects of past evolutionary forces on current genetic variation use individuals that are ancestral computational efficiency increased Rosenberg & Nordborg 2002 13

Coalescence and Phylogenetics Phylogenetics: What is the true tree? Coalescence: What caused the tree? Both methods give a tree and the parameters Probability distributions used (Bayesian) Phylogenetics: probability distribution for tree and includes uncertainty in parameters Coalescence: probability distribution for parameters and includes uncertainty in tree http://www.rni.helsinki.fi/~boh/teaching/bayes/lecture9.pdf Genealogical and Phylogenetic Fundamentally different Developed to determine pattern of species descent (assumed tree-like) Sequences from individuals, genealogy estimated from sequences Estimated gene tree used to draw conclusions about relationships between species Gene tree equivalent to species tree Rosenberg & Norborg 2002 14

Gene Trees and Species Trees Two levels of error: 1) gene tree for sequences will be incorrectly inferred if there is sufficient random or systematic error 2) even if gene tree is correctly inferred, deep gene coalescence (ancestral polymorphisms), gene duplication, and lateral gene transfer can produce a gene tree different from the true species tree Slowinski et al. 1997 Branches of species tree similar length as genealogical tree in species Resolved as long as time intervals between species-branching events are much greater than time intervals between lineage-branching events in each species, gene and species divergences are likely to be nearly congruent. Branches of species tree much longer than genealogical tree in species Rosenberg & Norborg 2002 15

Application Population Genetics and Phylogenetics Application Modeling tool for population genetics Used to analyze DNA sequence polymorphism data Based on realization that genealogy is usually easier to model backward in time and that selectively neutral mutations can be superimposed afterwards Nordborg 2001 16

Application Widely applied in studies of evolution Estimates time to common ancestor Can provide evidence for balancing selection Estimates of recombination and rate of selfing Assessing migration patterns in human ancestry (Y chromosome and MtDNA) Kingman 2000 Population Genetics 17

Population Genetics Approach Development of coalescent-based statistical methods for analyzing DNA sequence samples θ = 4Nµ estimators via Watterson (1975) and Tajima (1983) unbiased under the neutral Wright-Fisher model improvements by Felsenstein (1992) and Fu and Li (1993), Fu (1994) Fu & Li 2002 Population Genetics Approach Maximum Likelihood 1) Griffiths and Tavare (1994, 1995) Monte Carlo method 2) Kuhner et al. (1995) Monte Carlo estimator and Metropolis-Hastings method 3) Fu (1998) Maximum-likelihood method Fu & Li 2002 18

Ex: Population Genetics Approach Palaeo-distributional model generated by projecting ecological niche model (current distribution onto model of past climatic condition) Coalescent simulations used help model population genetic structure and compare phylogeography among different taxa Carstens & Richards 2007 Phylogenetics 19

Ex1: Phylogenetic Approach Gene tree parsimony: terminal sequences of a gene tree have shared a single history represented by a binary tree Finds species tree that minimizes weighted sum of different kinds of incongruence needed to fit each gene tree to a species tree via GeneTree (Page & Charleston 1997) Slowinski et al. 1997 Ex2: Phylogenetic Approach Incorporating a model of stochastic loss of gene lineages by genetic drift into a phylogenetic estimation procedure can provide a robust estimate of grasshopper species relationships Use of ESP (estimated species phylogeny) with coalescent-based approach VS Concatenation of multiple loci Carstens & Knowles 2007 20

Grasshopper Results Coalescent approach: accurate relationships estimated Provided direct statistical evaluation of ESP, versus inferring it from topology of gene tree Concatenation approach: forced topological congruence Estimated trees did not accurately reflect species tree (with recently derived species) Carstens & Knowles 2007 Grasshopper Results They suggested that the coalescent approach may bridge gap between systematics and population genetics ESP chosen maximizes probability of gene trees Carstens & Knowles 2007 21

Ex3: Phylogenetic Approach Methods for estimating gene trees (modelbased estimation of sequence parameters (Ronquist & Huelsenbeck 2003)) commonly used Methods to estimate lineage trees (phylogenetics) from one or more gene trees using coalescent methods is underdeveloped Belfiore et al. 2008 Ex3: Phylogenetic Approach Better solution Incorporate models of stochastic mutation along with gene coalescence directly into estimation of lineage trees (Felsenstein, Maddison, Takahata) can increase efficiency and accuracy, via increasing number of loci and individuals, can infer lineage relationships in cases of rapid radiation Belfiore et al. 2008 22

Ex3: Phylogenetic Approach Problem: individual gene trees often fail to match lineage tree when divergence times are very short relative to effective population size of the ancestral populations Belfiore et al. 2008 Ex3: Phylogenetic Approach Solution: increase # of loci sampled or increase # of gene copies per taxon where larger # coalescence events in common ancestors Gain information on relative divergence times and topology of lineage tree to overwhelm noise from stochastic lineage sorting Belfiore et al. 2008 23

Ex3: Belfiore et al. 2008 Rapid radiation of Thomomys, species borders and relationships partitioned Bayesian analysis of concatenated sequences (Ronquist & Huelsenbeck 2003) VS new Bayesian method using coalescent framework to simultaneously estimate gene trees and species trees from multi-locus data (Edwards et al. 2007, Liu & Pearl 2007) resolution and comparison to previous phylogenetic analyses Phylogenetic Approach Evaluate extension of coalescent approach use in recent radiations (estimate species trees when multiple individuals are sequenced per taxon) previous methods were based on assumption that loci are congruent and monophyletic within species, otherwise different approach is needed to avoid wrongly assuming that all genes have the same history Belfiore et al. 2008 24

Phylogenetic Approach Coalescent-based: estimates species tree from a single sampled allele per taxon (Liu & Pearl 2007) New method: coalescent-based approach allows for divergent histories of independent genes and directly infers species tree, given samples of multiple alleles per gene per species (Belfiore et al. 2008) Belfiore et al. 2008 Phylogenetic Approach Concatenated each locus considered a partition and assigned its own substitution model assumes that all loci have the same evolutionary history (species tree estimation same as gene tree estimate) Belfiore et al. 2008 25

Phylogenetic Approach BEST (Bayesian Estimation of Species Trees) Bayesian hierarchical model, estimates species trees from distribution of gene trees (across multiple loci) modified to incorporate multiple alleles from each taxon into probability density function of gene trees, given species trees (Liu et al. 2008) assumes no reticulation among taxa Belfiore et al. 2008 Results Concatenated method did not show level of conflict among gene trees BEST method directly estimates relationships among taxa, rather than individuals more biologically realistic and captures basic principles of lineage sorting Belfiore et al. 2008 26

Belfiore s final thoughts Call for coalescent methods that can be applied at the interface of phylogenetic and population processes Powerful tool: coalescent method that can test between hypotheses of recent reticulation versus a relatively recent rapid speciation event (resulting in incomplete lineage sorting) Belfiore et al. 2008 Future Molecular data most applications with samples of mtdna, Y chromosome for a better picture, more loci need to be looked at nuclear genome Population genetics continuation of Wright-Fisher model little knowledge of natural selection model better model would include migration and population growth Fu & Li 2002 27

Summary History of the coalescent and coalescence involved many great thinkers The model is mathematically complex, but has a simple biological theme Applications were began in population genetics but are being introduced to phylogenetics Story of old ways versus new ways Questions? Questions? 28