Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Similar documents
Comparative method, coalescents, and the future

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Analysis of geographically structured populations: Estimators based on coalescence

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Population Structure and Genealogies

Viral epidemiology and the Coalescent

TREES OF GENES IN POPULATIONS

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

BIOL Evolution. Lecture 8

Coalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

Ancestral Recombination Graphs

can mathematicians find the woods?

Coalescent Theory: An Introduction for Phylogenetics

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Population genetics: Coalescence theory II

Approximating the coalescent with recombination

Kinship and Population Subdivision

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from?

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling

Evolutionary trees and population genetics: a family reunion

Coalescent genealogy samplers: windows into population history

Chapter 12 Gene Genealogies

5 Inferring Population

Evaluating the performance of likelihood methods for. detecting population structure and migration

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

2 The Wright-Fisher model and the neutral theory

Coalescent Theory for a Partially Selfing Population

Objective: Why? 4/6/2014. Outlines:

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

BI515 - Population Genetics

How to use MIGRATE or why are Markov chain Monte Carlo programs difficult to use?

SINGLE nucleotide polymorphisms (SNPs) are single cases the SNPs have originally been identified by sequencing.

On the nonidentifiability of migration time estimates in isolation with migration models

The African Origin Hypothesis What do the data tell us?

THE estimation of population genetics parameters such as

Gene coancestry in pedigrees and populations

ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS

The Coalescent Model. Florian Weber

arxiv: v1 [q-bio.pe] 4 Mar 2013

MODERN population genetics is data driven and

Forward thinking: the predictive approach

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent

Research Article The Ancestry of Genetic Segments

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II

Where do evolutionary trees comes from?

Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling

Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics

Lecture 6: Inbreeding. September 10, 2012

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to. Estimate Species Trees in the Presence of Gene Flow.

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Feb 3 & 5):

Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot

6.047/6.878 Lecture 21: Phylogenomics II

Estimating Ancient Population Sizes using the Coalescent with Recombination

Frequent Inconsistency of Parsimony Under a Simple Model of Cladogenesis

Pedigree Reconstruction using Identity by Descent

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

Bioinformatics I, WS 14/15, D. Huson, December 15,

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

LASER server: ancestry tracing with genotypes or sequence reads

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

Big Y-700 White Paper

Lecture 2. Tree space and searching tree space

CONGEN. Inbreeding vocabulary

UNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing

Chapter 4 Neutral Mutations and Genetic Polymorphisms

Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl

Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4,

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74

Contributed by "Kathy Hallett"

The Coalescent. Chapter Population Genetic Models

Lecture 1: Introduction to pedigree analysis

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

NON-RANDOM MATING AND INBREEDING

Recap: Properties of Trees. Rooting an unrooted tree. Questions trees can address: Data for phylogeny reconstruction. Rooted vs unrooted trees:

STAT 536: The Coalescent

Epidemiology. David A. Rasmussen. Department of Biology Duke University. Date: Approved: Katia Koelle, Supervisor. William Morris.

Inference of Population Structure using Dense Haplotype Data

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Population Structure. Population Structure

A hidden Markov model to estimate inbreeding from whole genome sequence data

Behavioral Adaptations for Survival 1. Co-evolution of predator and prey ( evolutionary arms races )

Coalescent vs. Time-forward Simulations in the Problem of the Detection of Past Population Expansion

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations

PATTERNS of heritable genetic variation in contem- relationships, but does not provide a basis for assessing

Warning: software often displays unrooted trees like this:

The Structure of Genealogies and the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations

DNA and Ancestry. An Update on New Tests. Steve Louis. Jewish Genealogical Society of Washington State. January 13, 2014

EUROPEAN COMMISSION Research Executive Agency Marie Curie Actions International Fellowships

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Transcription:

Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/28 Correlation of states in a discrete-state model Character 1: Character 2: Species states #2 #1 0 6 Char. 1 Char. 2 #2 #1 Y Branch changes Y N 1 0 4 0 N 0 18 Comparative method, coalescents, and the future p.2/28

Asimplemodel:Brownianmotion Comparative method, coalescents, and the future p.3/28 Asimplecasetoshoweffectsofphylogeny Comparative method, coalescents, and the future p.4/28

Two uncorrelated characters evolving on that tree Comparative method, coalescents, and the future p.5/28 Identifying the two clades Comparative method, coalescents, and the future p.6/28

Atreeonwhichwearetoobservetwocharacters c a 0.3 0.25 b 0.1 d e 0.1 0.1 0.65 (0.7) (0.2) 0.9 Comparative method, coalescents, and the future p.7/28 This turns out to be statistically equivalent to... 0.3 a b 0.1 weighted average of a, b, with weights 1 /0.3 and 1 / 0.1 (0.3)(0.1) 0.3+0.1 0.25 ab 0.075 c d e 0.1 0.1 0.65 (0.7) add extra length (0.2) 0.9 Comparative method, coalescents, and the future p.8/28

Contrasts on that tree Contrast Variance proportional to y 1 = x a x b 0.4 y 2 = 1 4 x a + 3 4 x b x c 0.975 y 3 = x d x e 0.2 y 4 = 1 6 x a + 1 2 x b + 1 3 x c 1 2 x d 1 2 x e 1.11666 Comparative method, coalescents, and the future p.9/28 Plotting the contrasts against each other 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Comparative method, coalescents, and the future p.10/28

Gene copies in a population of 10 individuals A random mating population Comparative method, coalescents, and the future p.11/28 Going back one generation A random mating population Comparative method, coalescents, and the future p.12/28

... and one more A random mating population Comparative method, coalescents, and the future p.13/28 showing ancestry of gene copies A random mating population Comparative method, coalescents, and the future p.14/28

The genealogy of gene copies is a tree Genealogy of gene copies, after reordering the copies Comparative method, coalescents, and the future p.15/28 Ancestry of a sample of 3 copies Genealogy of a small sample of genes from the population Comparative method, coalescents, and the future p.16/28

Here is that tree of 3 copies in the pedigree Comparative method, coalescents, and the future p.17/28 Kingman s coalescent Coalescent trees of gene copies within species (Kingman, 1982) Random collision of lineages as go back in time (sans recombination) Collision is faster the smaller the effective population size Average time for k copies to coalesce to 4N k 1 = k(k 1) Average time for two copies to coalesce = 2N generations u 8 u 6 u 9 u 7 u 5 u 4 u 3 u 2 In a diploid population of effective population size N, Average time for n copies to coalesce = 4N(1 1 n ( generations Comparative method, coalescents, and the future p.18/28

Coalescence is faster in small populations Change of population size and coalescents Ne the changes in population size will produce waves of coalescence the tree time Coalescence events time time The parameters of the growth curve for N e can be inferred by likelihood methods as they affect the prior probabilities of those trees that fit the data. Comparative method, coalescents, and the future p.19/28 Migration can be taken into account population #1 population #2 Comparative method, coalescents, and the future p.20/28

Recombination creates loops Recomb. Different markers have slightly different coalescent trees Comparative method, coalescents, and the future p.21/28 We want to be able to analyze human evolution "Out of Africa" hypothesis Europe Asia Africa (vertical scale is not time or evolutionary change) Comparative method, coalescents, and the future p.22/28

coalescent and gene trees versus species trees Consistency of gene tree with species tree coalescence time Comparative method, coalescents, and the future p.23/28 If the branch is more than N e generations long... Gene tree and Species tree N 1 N 2 t 1 N 4 N 3 t 2 N 5 Comparative method, coalescents, and the future p.24/28

What to do with coalescents? They are poorly estimated (often only a modest number of sites is available for each tree). Our interest is not in the coalescent tree itself, it is in the population and genetic parameters (population size, mutation rate, migration rate, population growth rate, rate of recombination). So we want to sum up likelihoods over our uncertainty about the tree, or do the equivalent in Bayesian terms. Got that? Our objective is not to get the tree! We don t end up with atree! This can be done by Markov Chain Monte Carlo (MCMC) methods, in programs such as LAMARC, BEAST, MIGRATE, IMa or BEST (there are others too). Comparative method, coalescents, and the future p.25/28 Topics for the future... Use of many loci Use of SNP data on a large scale (if relevant) Use of whole-genome sequences (in the longer run) Integration of between-species and between-population studies with multiple loci across multiple species. IMPORTANT: If you are within aspecies,notalllociwillhavethesametree(wehavejustexplained why, in the discussion of recombination). So you ought to consider coalescents that differ between loci, between SNPs and not just infer the tree. (Also, please do not do phylogenies of individuals). Integration of between-species and between-population studies with QTL mapping Integration of between-species and between-population studies with morphological characters. Inferences of, and using, genomic changes (comparative genomics) More rigorous statistical models for quantitative traits, especially in fossils (hominoid fossils, anyone?) Comparative method, coalescents, and the future p.26/28 Using phylogenies to analyze multispecies microarray data

References Comparative methods Felsenstein, J. 1985. Phylogenies and the comparative method. American Naturalist 125: 1-15. [The contrasts method] Harvey, P. H. and M. D. Pagel. 1991. The Comparative Method in Evolutionary Biology. Oxford University Press, Oxford. [Reviews early work by me, Marl Ridley and the authors on comparative methods] Pagel, M. 1994. Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. Proceedings of the Royal Society of London, Series B 255: 37-45. [Method for two-state discrete characters] Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. [Especially chapter 25 which covers comparative methods] Felsenstein, J. 2012. A comparative method for both discrete and continuous characters using the threshold model. American Naturalist 179: 145-156. [Using Sewall Wright s 1934 threshold model to get a comparative method that can handle both discrete and continuous characaters] The coalescent Griffiths, R. C. and S. Tavaré. 1994a. Sampling theory for neutral alleles in a varying environment. Philosophical Transactions of the Royal Socety of London, Series B(BiologicalSciences)344: 403-10. [The pioneering sampling method] Comparative method, coalescents, and the future p.27/28 (continued) Kuhner, M. K., J. Yamato, and J. Felsenstein. 1995. Effective population size and mutation rate from sequence data using Metropolis-Hastings sampling.genetics 140: 1421-1430. [Our MCMC coalescent likelihood method] Hein, J., M. Schierup, and C. Wiuf. 2005, Gene Genealogies, Variation and Evolution: APrimerinCoalescentTheory.Oxford University Press, Oxford. [One of two books so far on coalescents. Light on estimation issues] Wakeley, J. 2008. Coalescent Theory. Roberts and Co., Greenwood Village, Colorado. [One of two books so far on coalescents. Light on estimation issues.] Nielsen, R. and M. Slatkin. 2013. An Introduction to Population Genetics. Theory and Applications. Sinauer Associates, Sunderland, Massachusetts. Population genetics textbook with more coverage of coalescents than usual. Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. [Especially chapter 27 which covers MCMC likelihood approaches (but explanation of logic of Griffiths/Tavaré method is wrong)] Felsenstein, J. 2007. Trees of genes in populations. pp. 3-29 in Reconstructing Evolution. New Mathematical and Computational Advances, pp. 3-27 in by O. Gascuel and M. Steel. Oxford University Press, Oxford. [Review of coalescents including MCMC, for a somewhat mathematical audience] Comparative method, coalescents, and the future p.28/28