Viral epidemiology and the Coalescent

Similar documents
Population genetics: Coalescence theory II

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

Coalescent Theory: An Introduction for Phylogenetics

Comparative method, coalescents, and the future

Population Structure and Genealogies

Ancestral Recombination Graphs

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

BIOL Evolution. Lecture 8

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

Bioinformatics I, WS 14/15, D. Huson, December 15,

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

arxiv: v1 [q-bio.pe] 4 Mar 2013

Analysis of geographically structured populations: Estimators based on coalescence

Forward thinking: the predictive approach

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

STAT 536: The Coalescent

TREES OF GENES IN POPULATIONS

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

2 The Wright-Fisher model and the neutral theory

Estimating Ancient Population Sizes using the Coalescent with Recombination

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

Coalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA

MODERN population genetics is data driven and

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II

The Coalescent. Chapter Population Genetic Models

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

The Coalescent Model. Florian Weber

6.047/6.878 Lecture 21: Phylogenomics II

can mathematicians find the woods?

Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series

Your mtdna Full Sequence Results

5 Inferring Population

GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS

Research Article The Ancestry of Genetic Segments

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent

Chapter 12 Gene Genealogies

Approximating the coalescent with recombination

The Two Phases of the Coalescent and Fixation Processes

Epidemiology. David A. Rasmussen. Department of Biology Duke University. Date: Approved: Katia Koelle, Supervisor. William Morris.

Where do evolutionary trees comes from?

Coalescent genealogy samplers: windows into population history

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from?

ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS

Evolutionary trees and population genetics: a family reunion

UNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Contributed by "Kathy Hallett"

[CLIENT] SmithDNA1701 DE January 2017

Chapter 4 Neutral Mutations and Genetic Polymorphisms

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

Coalescent Theory for a Partially Selfing Population

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot

Pedigree Reconstruction using Identity by Descent

Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics

A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to. Estimate Species Trees in the Presence of Gene Flow.

Behavioral Adaptations for Survival 1. Co-evolution of predator and prey ( evolutionary arms races )

Recent effective population size estimated from segments of identity by descent in the Lithuanian population

Warning: software often displays unrooted trees like this:

Lecture 6: Inbreeding. September 10, 2012

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Feb 3 & 5):

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

Evaluating the performance of likelihood methods for. detecting population structure and migration

Ancestral population genomics: the coalescent hidden Markov. model approach. Julien Y Dutheil 1, Ganeshkumar Ganapathy 2, Asger Hobolth 1,

Chapter 2: Genes in Pedigrees

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Every human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary

60-CREDIT HOSPITAL AND MOLECULAR EPIDEMIOLOGY GRADUATION CHECKLIST FOR STUDENTS STARTING IN FALL 2014

The African Origin Hypothesis What do the data tell us?

Inference of Population Structure using Dense Haplotype Data

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

THE estimation of population genetics parameters such as

Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling

Big Y-700 White Paper

Probability - Introduction Chapter 3, part 1

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

BI515 - Population Genetics

How to use MIGRATE or why are Markov chain Monte Carlo programs difficult to use?

DNA Testing What you need to know first

Decrease of Heterozygosity Under Inbreeding

Gene Genealogy in Three Related Populations: Consistency Probability Between Gene and Population Trees

Inbreeding and self-fertilization

Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4,

Exercise 4 Exploring Population Change without Selection

Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl

Ewing Surname Y-DNA Project Article 8

Bioinformatics for Evolutionary Biologists

Recent Trends in Population Genetics: More Data! More Math! Simple Models?

Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018

THE BASICS OF DNA TESTING. By Jill Garrison, Genealogy Coordinator Frankfort Community Public Library

Transcription:

Viral epidemiology and the Coalescent Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, and Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UCLA Department of Biostatistics UCLA School of Public Health Population genetics population size change in population size migration and selection Genealogy-based population genetics Phylogeny Genealogy

Genealogy-based population genetics the Coalescent is a model of the ancestral relationships of a random sample of individuals taken from a large background population. the Coalescent describes a probability distribution/density on ancestral genealogies (trees) given a population history. therefore the Coalescent can convert information from ancestral genealogies into information about population history and vice versa. the Coalescent is a model of ancestral genealogies, not sequences, and its simplest form assumes neutral evolution. Demographic inference change in population size through time applications include reconstructing infectious disease epidemics investigating viral dynamics within hosts using viral sequences as genetic markers for their wild hosts and the host demographics population bottlenecks caused by change in climate/environment? Aridification, ices ages, et cetera competition with other species? Humans? transmission bottlenecks in viruses Information pipe-line Randomly sample individuals from population Obtain gene sequences from sampled individuals Reconstruct tree / trees from sequences Simultaneously Infer coalescent results directly from sequences using MCMC Infer Coalescent results from tree / trees

Coalescent inference COALESCENT THEORY A model of virus reproduction Generation 1 Generation 2? Generation 3 For a randomly chosen pair of individuals, the probability that they share a common ancestor in the previous generation is 1/N Wright-Fisher reproduction model Discrete Generations A constant population size of N individuals (usually 2N) Each new (non-overlapping) generation chooses its parent from the previous generation at random with replacement No geographic/social structure, no recombination, no selection

A sample genealogy from an idealized Wright- Fisher population Discrete Generations Past A sample genealogy of 3 sequences from a population (N =10). Past Present Present Kingman (discrete-time) coalescent 2 individuals coalesce in 1 generation w.p. 1 N discrete generations 2 individuals coalesce in j generations w.p. 1 N 1 1 j 1 N k individuals coalesce in j generations w.p. k 1 2 N 1 k 1 2 N j 1 Kingman (continuous-time) coalescent Kingman (1982) J Appl Prob 19A, 27-42 Kingman (1982) Stoch Proc Appl 13, 235-48 Let t = j / N define a rescaled time in past, and Assume a sample of n individuals with n << N Then, the waiting time for k individuals to have k - 1 ancestors P (T k t) =1 e (k 2)t Exponential (memoryless), defines a continuous-time Markov chain E(T k )= 2N k(k 1)

Kingman coalescent: CTMC the number of sampled lineages decreases by one at each coalescence the process continues until the most recent common ancestor (MRCA) is reached. What is the expected time to MRCA? n n E T k = E(T k ) k=2 k=2 < 2 E(T 2 ) tmrca (??) = n 2N k(k 1) =2N k=2 1 1 n T(2) T(3) T(4) T(5) T(6) T(7) T(8) Kingman coalescent: its use here If we obtain a genealogy for a sample of individuals from a population We can calculate the probability P(genealogy N) Kingman coalescent: its use here If we reconstruct a genealogy for a sample of gene sequences from a population We can calculate the probability P(genealogy N) 0.1 likelihood 0.075 0.05 A C G T A C G T 0.025 0 0.25 0.5 0.75 N

N governs rate of coalescence N governs rate of coalescence time But what about our assumptions? the major weakness of the coalescent lie in its simplifying assumptions neutral evolution? reproductive variance? panmitic population?

Solution: effective population size We consider an abstract parameter, the effective population size (Ne), The Ne of a real biological population is the size of an idealized Fisher-Wright population that loses or gains genetic diversity at exactly the same rate Ne is generally smaller than the census population The coalescent Ne provides the time-to-ancestry distribution for a sample genealogy from a real population Variable population size coalescent Changes in Ne reflect changes in the census population Growing population population size past present

Demographic models and tree shape The standard coalescent can be extended to accommodate various scenarios of demographic change through time N(t) = N 0 N(t) = N 0 exp[ rt] Demographic models and tree shape The standard coalescent can be extended to accommodate various scenarios of demographic change through time N(t) = N 0 N(t) = N 0 exp[ rt] nested models can be compared using likelihood ratio tests (arrows represent valid comparisons)