The Two Phases of the Coalescent and Fixation Processes

Similar documents
Research Article The Ancestry of Genetic Segments

BIOL Evolution. Lecture 8

Forward thinking: the predictive approach

The Coalescent. Chapter Population Genetic Models

Exercise 4 Exploring Population Change without Selection

Bioinformatics I, WS 14/15, D. Huson, December 15,

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Coalescent Theory: An Introduction for Phylogenetics

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

2 The Wright-Fisher model and the neutral theory

Ancestral Recombination Graphs

The Coalescent Model. Florian Weber

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

Population genetics: Coalescence theory II

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

STAT 536: The Coalescent

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Decrease of Heterozygosity Under Inbreeding

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Inbreeding and self-fertilization

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Estimating Ancient Population Sizes using the Coalescent with Recombination

Your mtdna Full Sequence Results

Objective: Why? 4/6/2014. Outlines:

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

Inbreeding and self-fertilization

Pedigree Reconstruction using Identity by Descent

Population Structure and Genealogies

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

Coalescent Theory for a Partially Selfing Population

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

MODERN population genetics is data driven and

Viral epidemiology and the Coalescent

[CLIENT] SmithDNA1701 DE January 2017

The Structure of Genealogies and the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations

Lecture 6: Inbreeding. September 10, 2012

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

BIOL 502 Population Genetics Spring 2017

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl

TREES OF GENES IN POPULATIONS

Do You Understand Evolutionary Trees? By T. Ryan Gregory

Meek DNA Project Group B Ancestral Signature

Chapter 2: Genes in Pedigrees

Common ancestors of all humans

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74

Lecture 1: Introduction to pedigree analysis

Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Greedy Algorithm for Sorting by Reversals Pancake Flipping Problem

Chapter 4 Neutral Mutations and Genetic Polymorphisms

Every human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary

Online Resource to The evolution of sanctioning institutions: an experimental approach to the social contract

Chapter 12 Gene Genealogies

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

6.047/6.878 Lecture 21: Phylogenomics II

Contributed by "Kathy Hallett"

Comparative method, coalescents, and the future

Full Length Research Article

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from?

NON-RANDOM MATING AND INBREEDING

CONGEN. Inbreeding vocabulary

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II

A Factorial Representation of Permutations and Its Application to Flow-Shop Scheduling

Kinship and Population Subdivision

Autosomal-DNA. How does the nature of Jewish genealogy make autosomal DNA research more challenging?

THE BASICS OF DNA TESTING. By Jill Garrison, Genealogy Coordinator Frankfort Community Public Library

Recent Results from the Jackson Brigade DNA Project

Using Y-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

UNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

DNA Testing What you need to know first

Enumeration of Two Particular Sets of Minimal Permutations

Warning: software often displays unrooted trees like this:

AFRICAN ANCEvSTRY OF THE WHITE AMERICAN POPULATION*

CIS 2033 Lecture 6, Spring 2017

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent

Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM

Approximating the coalescent with recombination

The Queen of Sheba Comes to Visit Solomon

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Grade 7/8 Math Circles. Visual Group Theory

Methods of Parentage Analysis in Natural Populations

Smart Grid Reconfiguration Using Genetic Algorithm and NSGA-II

5 Inferring Population

Evolutionary Artificial Neural Networks For Medical Data Classification

The African Origin Hypothesis What do the data tell us?

Analysis of geographically structured populations: Estimators based on coalescence

Human origins and analysis of mitochondrial DNA sequences

Transcription:

The Two Phases of the Coalescent and Fixation Processes Introduction The coalescent process which traces back the current population to a common ancestor and the fixation process which follows an individual until the population is fixed for its descendants are heuristically inverse processes, yet the time reversal of one is seldom the other. This is because several generations will share the same most recent common ancestor, and several generations will first achieve fixation for one of their genes in the same generation. If the original individual is the most recent common ancestor of the present generation, and the present generation is the population in which the original individual becomes fixed, then the processes are inverses of each other. In general, however, if a gene is followed to fixation, the most recent common ancestor of the generation in which it becomes fixed will be more recent than the original gene. Similarly, if the present generation is traced back to its most recent common ancestor, that gene will have become fixed prior to the present generation. But a fixation/coalescence inverse process from most recent common ancestor to first generation of fixation (or its reverse) will be a subset of any fixation or coalescent process. The present work considers this aspect of the structure of the coalescent and fixation processes. The inverse process shall be referred to as the transition phase, since it manifests the actual increase from a single copy to the entire population (or contraction from the entire population to a single individual). The several generations of a coalescent process which share the same most recent common ancestor, and the several generations of a fixation process which attain fixation in the same generation shall be called the stasis phase. Because the expected fixation and coalescent times are equal, and those processes share the transition phase, the expected lengths of the stasis phase are the same for the coalescent and fixation processes. 1

Notation The previous concepts can be elucidated by introducing appropriate notation. Start at some generation t, and let T i be the first generation that the population is fixed for some gene in generation t, then the expected fixation time is the expected value of T i t. Next let t i be the generation of the most recent common ancestor of the population in generation T i, then the expected length of the transition phase will be the expected value of T i t i. T i+1 (T i 1 ) can be defined as the next (previous) generation when the population first became fixed for a different most recent ancestor, and t i+1 (t i 1 ) the generations of the respective most recent common ancestors. Then all the generations from T i 1 to T i 1 share the same most recent common ancestor (in generation t i 1 ), and all the generations from t i + 1 to t i+1 first attain fixation for one of their genes in the same generation (T i+1 ). The same notation could have been defined starting at an arbitrary generation, and going back to the generation of its most recent common ancestor rather than forward to its fixation. The intervals T i 1 to T i 1 and t i +1 to t i+1, which I shall denote as T and t, contain the stasis phases for coalescence and fixation, respectively. Hence I shall call them stasis intervals. Note the usage of phase and interval : the stasis phase is the realized stasis period, it is a stasis interval truncated at the initial (or present) generation. Because the initial generation can be anywhere in the stasis interval, the average fixation (coalescent) time should be half the expected value of t ( T ) (weighted by the lengths of the intervals) added to the expected value of the transition phase (E[T i t i ]). Hence the expected fixation time (which is equal to the expected coalescent time) is 1 2 E[( t)2 ]/E[ t] + E[T i t i ]. The adjacent figure illustrates these definitions for a simulation of a haploid population of six individuals. The gene first becomes fixed in generation T 1, for which generation the most recent common ancestor is in generation t 1. Generations t 1 to T 1 and t 2 to T 2 are transition phases; generations t 1 + 1 to t 2 are a stasis interval for fixation; and generations T 1 to T 2 1 are a stasis interval for coalescence. The original generation t would have occurred somewhere in a stasis interval for fixation. 2

Characterization of the Stasis Phases The transition phase is the actual increase of a gene from a single copy to the entire population for fixation, and the reverse for coalescence. The length of the transition phase is the difference between the generation in which the ancestral gene becomes fixed, and the generation of the most recent common ancestor of that population. The stasis phase of fixation heuristically has the ancestral gene as a single copy before it branches to spread to the population; actually the gene may branch and have several copies during that phase, and the branches may persist during part of the transition phase, but those branched lineages will die out before fixation occurs. From the coalescent perspective of going back in time, the fixation stasis phase precedes the most recent common ancestor. The length of the stasis phase is the difference between the initial generation and the most recent common ancestor of the population in which the original gene became fixed. The stasis phase of coalescence is generation(s) when the entire population shares the same most recent common ancestor; the transition phase (which precedes the stasis phase in real time) begins (going backward in time) when the population contains an individual not descended from that ancestor (i.e., there is a more ancient branch in the pedigree). From the fixation perspective of going forward in time, the coalescence stasis phase is the generations after fixation for the most recent common ancestor of the population until the present generation. The length of the coalescence stasis phase is the difference between the present and the first generation in which the population has the specified most recent common ancestor. Note that the stasis phase for a given coalescent (or fixation) process will be a subset of a stasis interval, which includes all generations sharing the same most recent common ancestor (or generation of first fixation), including generations after the present generation (or before the initial generation). 3

Extreme Examples If every member of the population replaces itself for n 1 generations, and then one individual parents the entire next generation, the length of the transition phase will be 1, and the length of the stasis interval ( T or t) will be n 1. The average fixation/coalescence time will be (n + 1)/2 generations. If the length of the stasis phase were a random variable X, the weighted expected value would need to be calculated as noted above. A stasis phase of length 0 is obtained if the member of the ancestral lineage (individuals whose descendants will not go extinct) produces two progeny every generation, every other individual produces one progeny, except that the individual whose ancestors left the ancestral lineage furthest in the past does not reproduce. This follows since every generation will contain a most recent common ancestor for some future generation, and every generation will be the first generation of fixation for some previous generation. If the population has N individuals, then an ancestral gene will become fixed in N 1 generations; that will be the transition time, fixation time, and coalescent time. 4

Poisson Progeny Distribution The binomial or Poisson progeny distribution is employed with the assumption that the future depends only the present, and not previous generations. In particular, at the time of a fixation event (T i ), the time until the next fixation event (T i+1 T i ) will be less than or equal to the expected fixation time, because at time T i there may be multiple copies of the next gene destined for fixation. Therefore, the average length of the stasis interval will be less than the expected fixation time; however, this refers to the unweighted average of the length of the stasis interval. Numerical simulations were performed for 1000 fixations each in haploid populations of 100 and 200 individuals. The average times until fixation were 199 and 390 generations, respectively, which are approximately equal to 2N. At fixation, the average times since the most recent common ancestor were 97 and 195 generations, respectively. Hence the average length of the transition phase was half of the fixation time, and the weighted average of the stasis intervals was equal to the average fixation time. 5

Discussion This study was motivated by the need to clarify the relation between coalescent and fixation events. Indeed, the expected coalescent and fixation times are equal, but the expected time since a common ancestor at the generation when fixation occurs is not the same as the expected coalescent time in general, nor is the expected time until fixation of a gene which is a most recent common ancestor equal to the expected fixation time in general. When studying the fixation of a gene or the coalescence of a population, the actual transition from a single copy to the entire population or from an entire population to the single copy will be less than the fixation or coalescent time. One implication of these results is that hitchhiking occurs in half of the fixation time (for random mating with Poisson progeny distribution) because it is only during the transition phase that crossing over could affect monomorphism at a linked locus. Of course, this does not address the role of mutation in polymorphism. In fact, these results are really not important to the general questions of polymorphism and evolution. Dead end lineages contribute to the variation of a population. The breadth of the coalescent process as well as the coalescent time impacts how much mutation (which provides variation) occurs during fixation. The minimal genetic history of a population is the lineage of single genes which eventually become fixed, for such a lineage there is no concept of variation or coalescence. The concise genetic history of a population is the lineage of the single genes in each generation which are destined for fixation. The genetic diversity which we study is the embellishment of that lineage. The present work provides another perspective on the nature of this embellishment. 6

t time increases toward the bottom t 1 T 1 t 2 T 2 7