POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger
|
|
- Lester Rose
- 6 years ago
- Views:
Transcription
1 POPULATIO GEETICS: WRIGHT FISHER MODEL AD COALESCET PROCESS by Hailong Cui and Wangshu Zhang Superviser: Prof. Quentin Berger A Final Project Report Presented In Partial Fulfillment of the Requirements for Math 505b April 2014
2 Acknowledgments We want to thank Prof. Quentin Berger for introducing to us the Wright Fisher model in the lecture, which inspired us to choose Population Genetics for our project topic. The resources Prof. Berger provided us have been excellent learning materials and his feedback has helped us greatly to create this report. We also like to acknowledge that the research papers (in the reference) are integral parts of this process. They have motivated us to learn more about models beyond the class, and granted us confidence that these probabilistic models can actually be used for real applications. ii
3 Contents Acknowledgments Abstract ii iv 1 Introduction to Population Genetics 1 2 Wright Fisher Model Random drift Genealogy of the Wright Fisher model Coalescent Process 8.1 Kingman s Approximation Applications 11 Reference List 12 iii
4 Abstract In this project on Population Genetics, we aim to introduce models that lay the foundation to study more complicated models further. Specifically we will discuss Wright Fisher model as well as Coalescent process. The reason these are of our interest is not just the mathematical elegance of these models. With the availability of massive amount of sequencing data, we actually can use these models (or advanced models incorporating variable population size, mutation e ect etc, which are however out of the scope of this project) to solve and answer real questions in molecular biology. First we will explain concepts such as random drift, then discuss if an allele can eventually get fixed in a population, and what is the probability of genetic variation surviving after generations. After this we will illustrate in graphs the tree like nature of traversing back to most recent common ancestors (MRCA) then derive the distribution of the time back to MRCA for a sample of size 2. For the remainder of the report, we will provide a treatment of Kingman s approximation. Finally we move on to a literature review of an application to HIV-1 regarding the average coalescent estimates of HIV-1 generation time in vivo. Keywords: Population Genetics, Wright Fisher model, Most Recent Common Ancestors (MRCA), Allele, Sequencing data, Heterozygosity, Genealogy, Coalescent process, Kingman s approximation, HIV-1 evolution iv
5 Chapter 1 Introduction to Population Genetics With the advent of new sequencing technology [1, 2], we are harvesting large volume of genetic data (in DA, RA and even protein level) and making them publically available [, 4, 5, 6]. This enables researchers to analyze these sequencing data to tackle one of the most important challenges in modern molecular biology how to make sense of the variations existing among the genetic information and how these variations are translated into the di erences in phenotypes. For example can one capture the evolution among the tumor cells and use the observed variability to infer the velocity of disease aggravation? Another example is to research the variation among human genetic sequences to extract genes that are related to diseases such as Egr gene for Schizophrenia. These are some of the questions in population genetics, and for the scope of this project we aim to first introduce probabilistic models that comprise the basis of further research. Then we like to briefly review some published papers that utilize these models as well as other related methods and software in Phylogeny which all aim to eventually understand diseases better. Essentially the field of population genetics is a study of genetic variation within a population. We assume that a gene has two alleles and denote them by A and B. Then the population is composed of individuals with two copies of each genes, i.e., AA, AB (BA) or BB. It is convenient to classify the evolution problems by employing the time scales involved. A typical question to ask is what will happen in the future, such as how long does a new mutant survive in the population? or what is the chance that an allele gets fixed in a population?. We can think about these problems from a di erent angle, in other words, retrospectively, by asking where the population has been in the past instead. Many factors can a ect the evolution of a population, such as random drift, selection, mutation, recombination, population subdivision etc. onetheless we will begin by introducing a simple model with many of these e ects ignored in the next section. 1
6 Chapter 2 Wright Fisher Model In this section we want to begin by the introduction of the simplest Wright Fisher model (Fisher (1922), Wright (191)). Here we assume the population is finite, of constant size, and each individual has only two alleles. We also ignore the e ects of mutation, selection, etc. We assume this population undergoes random mating. This is what we call random drift, which will be discussed more formally below. 2.1 Random drift Let s denote two alleles A and B as before at the locus of interest and assume no mutation occurs. Define Y r as the number of A alleles in generation r, then Y r represents the number of B alleles in generation r. First, let s make the following assumptions: Assumption 1. Discrete, non-overlapping generations of equal size. Assumption 2. Parents of next generation of genes are picked randomly with replacement from preceding generation (genetic di erences have no fitness consequences). The population at generation r +1is derived from the population at time r by binomial sampling of genes from a gene pool in which the fraction of A alleles is its current frequency, namely fi i = i/. Hence given Y r = i, the probability that Y r+1 = j is p ij = A B fi j i (1 fi i ) j, 0 Æ i, j Æ. j The process {Y r,r = 0, 1, } is a time-homogeneous Markov chain. It has transition matrix P =(p ij ), and state space S = {0, 1,,}. It is trivial that the states 0 and are absorbing; if the population contains only one allele in some generation, then it remains so in every subsequent generation. In this case, we say that the population is fixed for that allele. 2
7 The binomial nature of the transition matrix makes some properties of the process easy to calculate. For example, E(Y r Y r 1 )= Y r 1 = Y r 1, so by taking expectation on both sides, we get E(Y r )=E(Y r 1 ), and by recursive iteration, E(Y r )=E(Y 0 ),r=1, 2, ote that: E(Y r+1 Y r = i) = 1 i 2 = i V ar(y r+1 Y r = i) = 1 i 21 i 2 1 Therefore the expected number of A (or B) alleles remains constant across generations, nonetheless variability must be lost eventually. Hence, the population ultimately will contain only A alleles or all B alleles. States 0 and are absorbing states. define aturally we want to understand the probabilities of these events, so we a i = P(eventually all alleles are A given that initially only i alleles are A) Apparently a 0 =0, a =1and Y r is a martingale as can be seen from the above equation. If we define T as the time of absorption at 0 or and apply the optional stopping theorem, we can get so E[Y T ]= P(Y T = ) = a i = i a i = i This means an allele will eventually become fixed in the population with the same probability as its initial proportion. As a side note, fixation in genetic sequence increases di culty in traversing back in time to determine the common ancestors. The next question of interest is to assess how fast the genetic variation gets lost. To achieve this purpose, let s study another widely used term in population genetics: heterozygosity. It is defined as a probability H r that two genes chosen at random with replacement in generation r are di erent.
8 If we define P r = Yr to be the proportion of A alleles in generation r, then the heterozygosity H r =2P r (1 P r ). Look at expected heterozygosity: E(H 1 ) = E(2P 1 (1 P 1 )) = 2E(P 1 E(P 2 1)) 4 = 2 E(P 1 ) E(P 1 ) 2 V ar(p 1 ) = 2 p 0 p 2 0 p 4 0(1 p 0 ) = 2p 0 (1 p 0 ) = H After r generations: E(H r ) = H r H 0 e r The probability H r measures the genetic variability surviving in the population, which decays at rate 1/ per generation. The decrease of heterozygosity is a measure of random drift. As can be seen from above computation, the heterozygosity decays to 0 as r goes to infinity. The expected time for the loss is complicated to compute. As a matter of fact, due to the di culty of finding explicit expression, one may want to resort to approximation method. Interested readers can refer to topics on di usion approximations for further reading [7]. 2.2 Genealogy of the Wright Fisher model In this section we want to study the genealogy of the Wright Fisher model. We can imagine that each individual in a given generation carries either A or B allele. Assuming no mutation as before, all o spring of A individuals continue to contain only A alleles. Below we like to introduce the concept of most recent common ancestors (MRCA) by illustrating two simulation results in Fig 2.1 and Fig 2.2 [7]. Both are for a Wright Fisher model of =9individuals. Generations are evolving vertically down and the individuals are labelled 1, 2,, 9 from left to right. Lines are directional though without arrows and join individuals in two generations if one 4
9 Figure 2.1: First simulation. is the o spring of the other. In Fig 2.1, we can see that individual and 4 have the MRCA generations ago. This figure shows very much tangled relationship and may look confusing. The next one Fig 2.2 however presents a more clear structure in a typical phylogenetic tree format. The individual s order is untangled in Figure 2.2, and we can see that the MRCA of individual 6 and 7 is 11 generations ago, i.e., the root of the tree. ow we want to understand how long it takes for two alleles to travel back to their MRCA. Since individuals choose their parents at random, we see that P(2 individuals have 2 distinct parents) = =1 1. 5
10 Figure 2.2: Second simulation in untangled form. Since those parents are themselves a random sample from their generation, we may iterate this argument to see that P(first common ancestor more than r generations ago) = r = r. (2.1) When the population size is large and time is measured in units of generations, the distribution of the time to the MRCA of a sample of size 2 has approximately an exponential distribution with mean 1. To see this, rescale time so that r = t, and let æœin (2.1). We see that this probability is t æ e t. 6
11 ow we consider the probability h r that two individuals chosen with replacement from generation r carry distinct alleles. Two individuals are di erent if and only if their common ancestor is more than r generations ago, and the ancestors at time 0 are distinct. The probability of this event is the chance that 2 individuals chosen without replacement at time 0 carry di erent alleles, and this is just E[2Y 0 ( Y 0 )]/ ( 1). Combining these results h r = r 1 E[2Y 0 ( Y 0 )] ( 1) just as H r we discussed in previous section. Here are more discrete-time properties: = r h 0, P(two genes have same parent in the previous generation) is 1 umber of generations since two genes first shared a common ancestor Geometric( 1 ) umber of generations since at least two genes in a sample of k shared a common ancestor Geometric 1 2 k(k 1) 2 Proof. Define G k,k to be the probability that k distinct ancestors in the previous generation. Then (k 1) G k,k = = k (k 1) 1 = 1 + O 2 4 k(k 1) 1 = O 2 Therefore, the probability that at least two genes share a common ancestor in the previous generation is 1 G k,k = 4 k(k 1) O 2 Since this is the same in each generation, we have that the number of generations until at least two genes in a sample of k shared a common ancestor Geometric 1 2 k(k 1) 2. 7
12 Chapter Coalescent Process In this section we discuss a basic coalescent process. This is tightly related to MRCA introduced in previous sections. Essentially the term coalescence means connection or coming together, it is the contrary of branching. When two alleles are descended from a common ancestor in some previous generation, we say that they coalesce in that generation. In the previous Wright Fisher model we started from a population of size then moved forward in time to observe descendants. In the coalescent process, we begin from a certain generation and then look backward in time at the past. This way the two lineages of two individuals of interest will merge in some previous generation. Let s begin with the simplest statement of the coalescent model. Kingman proved this to be limiting ancestral process for a broad class of populations structures that includes the Wright Fisher model. We trace the ancestral lineages, which are the series of genetic ancestors of the samples at a locus, back through time. The history of a sample of size n comprises n 1 coalescent events. Each coalescent event decreases the number of ancestral lineages by one. This takes the sample from the present day when there are n lineages through a series of steps in which the number of lineages decreases from n to n 1, then from n 1 to n 2, etc., then finally from two to one. The single lineage remaining at the final coalescent event is the MRCA of the entire sample. At each coalescent event, two of the lineages fuse into one common-ancestral lineage. The result is a bifurcating tree like the one shown in Fig.1. The times T i on the right in Fig.1 are the times during which there were exactly i lineages ancestral to the sample..1 Kingman s Approximation Discrete-time models can be cumbersome to work with, thus we would like a representation in continuous time. Kingman (1982) considered the case where (population size) is very large relative to n (sample size). Recall that G k,k = probability that k genes had k ancestors in the previous generation. Define G i,j = probability 8
13 Figure.1: A coalescent genealogy of a sample of n =9items. that i genes had j(j < i) ancestors in the previous generation. Then we can show that G i,j = S(j) i [j], 1 Æ j Æ i i where [j] = ( 1) ( j +1)and S (j) i are Stirling numbers of the second kind. Important observation: When is large, it is unlikely for more than one coalescent event to occur in a single generation. Under the assumption that coalescent events do not occur simultaneously, we look at the limit as æœ: Consider a sample of n lineages and follow the process backwards in time Define T i = time during which there are exactly i lineages in the sample P(T i >t)=probability that the time to a coalescent event in a sample of i lineages in from a population of size is greater than t P(T i >t)=(g i,i ) [t] 9
14 For the Wight Fisher model: P(T i >t) = (G i,i ) [t] = æ e (i 2)t as 1 æœ 4 i(i 1) [t] 2 In this case, with appropriate time units, the time to coalescence in a sample of i lineages follows an Exponential 1 2 µ = 2 i(i 1) distribution. The probability density function for T i is f Ti (t) = The mean and variance are A B i e (i 2)t, t Ø 0, i =2,,,n 2 E(T i )= 2 i(i 1) 2 V ar(t i )= i(i 1) Fewer lineage means longer expected time to coalescence. To generate a genealogy of i genes under Kingman s coalescent: Draw an observation from an exponential distribution with mean µ =2/(i(i 1)). This will be the time of the first coalescent event (looking from the present backwards in time). Pick two lineages at random to coalescence. Decrease i by 1. If i =1, stop. Otherwise, repeat these steps [8, 9]
15 Chapter 4 Applications In this section we like to start by a paper review of coalescent estimates of HIV-1 generation time in vivo [10]. Though a bit outdated, this paper shows us how a new method based on coalescent theory can be used to esimate HIV-1 generation time in vivo. The estimated generation time in HIV-1 had been of many researchers interest and had previously been estimated by a di erent mathematical model of viral dynamics. The first author Allen Rodrigo (now a professor at Duke) used nucleotide sequencing data for the analysis, and a reconstructed genealogy of sequences obtained over time. The study was on one single individual, a homosexual Caucasian male who was diagnosed as HIV-1 positive following an episode of aseptic meningitis in February of 1985, when he was 2 years old. Over the course of years beginning in 1989, blood was obtained at time points 7, 22, 2, and 4 months after the first specimen. The method is applied to sequences obtained from a long-term nonprogressing individual at above five sampling occasions. The estimated average of viral generation time using the coalescent method was 1.2 days per generation and is close to that obtained by mathematical modeling (1.8 days per generation), thus strengthening confidence in estimates of a short viral generation time. Readers interested in more recent papers with application to sequence data can refer to 2002 ature paper by oah Rosenberg [11] (now a professor at Stanford). The authors discussed the increased use of genetic polymorphism for inference about population phenomena, such as migration and selection and employed the coalescence process for their analysis. Beyond the scope of our stochastic modeling, there are also di erent approaches using sequence alignment to infer phylogentic trees, estimate the rates of molecular evolution etc. Readers can refer to a well established software package called MEGA [12]. 11
16 Reference List [1] E. R. Mardis, ext-generation dna sequencing methods, Annu. Rev. Genomics Hum. Genet., vol. 9, pp , [2] M. L. Metzker, Sequencing technologies the next generation, ature Reviews Genetics, vol. 11, no. 1, pp. 1 46, [] P. J. Cock, C. J. Fields,. Goto, M. L. Heuer, and P. M. Rice, The sanger fastq file format for sequences with quality scores, and the solexa/illumina fastq variants, ucleic acids research, vol. 8, no. 6, pp , [4] D. Karolchik, R. Baertsch, M. Diekhans, T. S. Furey, A. Hinrichs, Y. Lu, K. M. Roskin, M. Schwartz, C. W. Sugnet, D. J. Thomas, et al., The ucsc genome browser database, ucleic acids research, vol. 1, no. 1, pp , 200. [5] T. Hubbard, D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cu, V. Curwen, T. Down, et al., The ensembl genome database project, ucleic acids research, vol. 0, no. 1, pp. 8 41, [6] K. D. Pruitt, T. Tatusova, and D. R. Maglott, cbi reference sequence project: update and current status, ucleic acids research, vol. 1, no. 1, pp. 4 7, 200. [7] S. Tavaré, Part i: Ancestral inference in population genetics, in Lectures on probability theory and statistics, pp , Springer, [8] J. Wakeley, Chapter of coalescent theory: An introduction. webpages.uidaho.edu/hohenlohe/wakeley_ch.pdf, cited April [9] L. Kubatko, Tutorial on coalescent theory. ~lkubatko/coalescent_theory_penn_state_part1.pdf, cited April [10] A. G. Rodrigo, E. G. Shpaer, E. L. Delwart, A. K. Iversen, M. V. Gallo, J. Brojatsch, M. S. Hirsch, B. D. Walker, and J. I. Mullins, Coalescent estimates of hiv-1 generation time in vivo, Proceedings of the ational Academy of Sciences, vol. 96, no. 5, pp ,
17 [11]. A. Rosenberg and M. ordborg, Genealogical trees, coalescent theory and the analysis of genetic polymorphisms, ature Reviews Genetics, vol., no. 5, pp , [12] K. Tamura, G. Stecher, D. Peterson, A. Filipski, and S. Kumar, Mega6: Molecular evolutionary genetics analysis version 6.0, Molecular biology and evolution, vol. 0, no. 12, pp ,
Coalescent Theory: An Introduction for Phylogenetics
Coalescent Theory: An Introduction for Phylogenetics Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University lkubatko@stat.ohio-state.edu
More informationAlgorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory
Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Vineet Bafna Harish Nagarajan and Nitin Udpa 1 Disclaimer Please note that a lot of the text and figures here are copied from
More informationThe Coalescent. Chapter Population Genetic Models
Chapter 3 The Coalescent To coalesce means to grow together, to join, or to fuse. When two copies of a gene are descended from a common ancestor which gave rise to them in some past generation, looking
More informationViral epidemiology and the Coalescent
Viral epidemiology and the Coalescent Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, and Departments of Biomathematics and Human Genetics David Geffen School
More informationCoalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application
Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application
More informationGenealogical trees, coalescent theory, and the analysis of genetic polymorphisms
Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Magnus Nordborg University of Southern California The importance of history Genetic polymorphism data represent the outcome
More informationPopulation Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA
Population Genetics using Trees Peter Beerli Genome Sciences University of Washington Seattle WA Outline 1. Introduction to the basic coalescent Population models The coalescent Likelihood estimation of
More informationBioinformatics I, WS 14/15, D. Huson, December 15,
Bioinformatics I, WS 4/5, D. Huson, December 5, 204 07 7 Introduction to Population Genetics This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at the Australian
More information2 The Wright-Fisher model and the neutral theory
0 THE WRIGHT-FISHER MODEL AND THE NEUTRAL THEORY The Wright-Fisher model and the neutral theory Although the main interest of population genetics is conceivably in natural selection, we will first assume
More informationForward thinking: the predictive approach
Coalescent Theory 1 Forward thinking: the predictive approach Random variation in reproduction causes random fluctuation in allele frequencies. Can describe this process as diffusion: (Wright 1931) showed
More informationThe genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times
The coalescent The genealogical history of a population The coalescent process Identity by descent Distribution of pairwise coalescence times Adding mutations Expected pairwise differences Evolutionary
More informationAncestral Recombination Graphs
Ancestral Recombination Graphs Ancestral relationships among a sample of recombining sequences usually cannot be accurately described by just a single genealogy. Linked sites will have similar, but not
More informationComparative method, coalescents, and the future
Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/36 Correlation of
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationThe Two Phases of the Coalescent and Fixation Processes
The Two Phases of the Coalescent and Fixation Processes Introduction The coalescent process which traces back the current population to a common ancestor and the fixation process which follows an individual
More informationCoalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48
Coalescents p.1/48 Coalescents Joe Felsenstein GENOME 453, Autumn 2015 Coalescents p.2/48 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C.
More informationThe Coalescent Model. Florian Weber
The Coalescent Model Florian Weber 23. 7. 2016 The Coalescent Model coalescent = zusammenwachsend Outline Population Genetics and the Wright-Fisher-model The Coalescent on-constant population-sizes Further
More informationBIOL Evolution. Lecture 8
BIOL 432 - Evolution Lecture 8 Expected Genotype Frequencies in the Absence of Evolution are Determined by the Hardy-Weinberg Equation. Assumptions: 1) No mutation 2) Random mating 3) Infinite population
More informationComparative method, coalescents, and the future. Correlation of states in a discrete-state model
Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/28 Correlation of
More informationPopulation Structure and Genealogies
Population Structure and Genealogies One of the key properties of Kingman s coalescent is that each pair of lineages is equally likely to coalesce whenever a coalescent event occurs. This condition is
More informationMOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS
MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS Daniel A. Vasco*, Keith A. Crandall* and Yun-Xin Fu *Department of Zoology, Brigham Young University, Provo, UT 8460, USA Human
More informationCoalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39
Coalescents Joe Felsenstein GENOME 453, Winter 2007 Coalescents p.1/39 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial
More informationAnalysis of geographically structured populations: Estimators based on coalescence
Analysis of geographically structured populations: Estimators based on coalescence Peter Beerli Department of Genetics, Box 357360, University of Washington, Seattle WA 9895-7360, Email: beerli@genetics.washington.edu
More informationSTAT 536: The Coalescent
STAT 536: The Coalescent Karin S. Dorman Department of Statistics Iowa State University November 7, 2006 Wright-Fisher Model Our old friend the Wright-Fisher model envisions populations moving forward
More informationPopulation genetics: Coalescence theory II
Population genetics: Coalescence theory II Peter Beerli August 27, 2009 1 The variance of the coalescence process The coalescent is an accumulation of waiting times. We can think of it as standard queuing
More informationCoalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000
Coalescent Theory Magnus Nordborg Department of Genetics, Lund University March 24, 2000 Abstract The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous
More informationTREES OF GENES IN POPULATIONS
1 TREES OF GENES IN POPULATIONS Joseph Felsenstein Abstract Trees of ancestry of copies of genes form in populations, as a result of the randomness of birth, death, and Mendelian reproduction. Considering
More informationDISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS
Adv. Appl. Prob. 31, 1027 1035 (1999) Printed in Northern Ireland Applied Probability Trust 1999 DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS It is a pleasure to be able to comment
More informationKinship and Population Subdivision
Kinship and Population Subdivision Henry Harpending University of Utah The coefficient of kinship between two diploid organisms describes their overall genetic similarity to each other relative to some
More informationPart I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL
Part I Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Chapter 1 The Coalescent of Bacterial Populations Mikkel H. Schierup and Carsten Wiuf 1.1 BACKGROUND AND MOTIVATION Recent
More informationExercise 4 Exploring Population Change without Selection
Exercise 4 Exploring Population Change without Selection This experiment began with nine Avidian ancestors of identical fitness; the mutation rate is zero percent. Since descendants can never differ in
More informationIoanna Manolopoulou and Brent C. Emerson. October 7, Abstract
Phylogeographic Ancestral Inference Using the Coalescent Model on Haplotype Trees Ioanna Manolopoulou and Brent C. Emerson October 7, 2011 Abstract Phylogeographic ancestral inference is a question frequently
More informationCoalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2
Coalescence time distributions for hypothesis testing -Kapil Rajaraman (rajaramn@uiuc.edu) 498BIN, HW# 2 This essay will be an overview of Maryellen Ruvolo s work on studying modern human origins using
More informationWhere do evolutionary trees comes from?
Probabilistic models of evolutionary trees Joint work with Outline of talk Part 1: History, overview Part 2: Discrete models of tree shape Part 3: Continuous trees Part 4: Applications: phylogenetic diversity,
More informationResearch Article The Ancestry of Genetic Segments
International Scholarly Research Network ISRN Biomathematics Volume 2012, Article ID 384275, 8 pages doi:105402/2012/384275 Research Article The Ancestry of Genetic Segments R B Campbell Department of
More informationTópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II
Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre 29 -II Lab Coalescent simulation using SIMCOAL 17 septiembre 29 Coalescent theory provides a powerful model
More informationYour mtdna Full Sequence Results
Congratulations! You are one of the first to have your entire mitochondrial DNA (DNA) sequenced! Testing the full sequence has already become the standard practice used by researchers studying the DNA,
More informationDNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding
DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding by Dr. Ing. Robert L. Baber 2014 July 26 Rights reserved, see the copyright notice at http://gengen.rlbaber.de
More informationKenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor
Kenneth Nordtvedt Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor (TMRCA) tool to estimate how far back in time the common ancestor existed for two Y-STR haplotypes obtained
More informationcan mathematicians find the woods?
Eolutionary trees, coalescents, and gene trees: can mathematicians find the woods? Joe Felsenstein Department of Genome Sciences and Department of Biology Eolutionary trees, coalescents, and gene trees:
More informationMODERN population genetics is data driven and
Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.092460 Note Extensions of the Coalescent Effective Population Size John Wakeley 1 and Ori Sargsyan Department of Organismic
More informationInbreeding and self-fertilization
Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that I went over a couple of lectures ago? Well, we re about
More informationMitochondrial Eve and Y-chromosome Adam: Who do your genes come from?
Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from? 28 July 2010. Joe Felsenstein Evening At The Genome Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from? p.1/39 Evolutionary
More informationInbreeding and self-fertilization
Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that we just finished? Well, we re about to begin violating
More informationarxiv: v1 [q-bio.pe] 4 Mar 2013
Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees arxiv:1303.0673v1 [q-bio.pe] 4 Mar 2013 Sha Zhu 1,, James H Degnan 2 and Bjarki Eldon 3 1
More informationChapter 12 Gene Genealogies
Chapter 12 Gene Genealogies Noah A. Rosenberg Program in Molecular and Computational Biology. University of Southern California, Los Angeles, California 90089-1113 USA. E-mail: noahr@usc.edu. Phone: 213-740-2416.
More informationThe African Origin Hypothesis What do the data tell us?
The African Origin Hypothesis What do the data tell us? Mitochondrial DNA and Human Evolution Cann, Stoneking and Wilson, Nature 1987. WOS - 1079 citations Mitochondrial DNA and Human Evolution Cann, Stoneking
More informationDice Games and Stochastic Dynamic Programming
Dice Games and Stochastic Dynamic Programming Henk Tijms Dept. of Econometrics and Operations Research Vrije University, Amsterdam, The Netherlands Revised December 5, 2007 (to appear in the jubilee issue
More information6.047/6.878 Lecture 21: Phylogenomics II
Guest Lecture by Matt Rasmussen Orit Giguzinsky and Ethan Sherbondy December 13, 2012 1 Contents 1 Introduction 3 2 Inferring Orthologs/Paralogs, Gene Duplication and Loss 3 2.1 Species Tree..............................................
More informationChapter 4 Neutral Mutations and Genetic Polymorphisms
Chapter 4 Neutral Mutations and Genetic Polymorphisms The relationship between genetic data and the underlying genealogy was introduced in Chapter. Here we will combine the intuitions of Chapter with the
More informationEstimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling
Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling Mary K. Kuhner, Jon Yamato, and Joseph Felsenstein Department of Genetics, University of Washington
More informationFull Length Research Article
Full Length Research Article ON THE EXTINCTION PROBABILITY OF A FAMILY NAME *DZAAN, S. K 1., ONAH, E. S 2. & KIMBIR, A. R 2. 1 Department of Mathematics and Computer Science University of Mkar, Gboko Nigeria.
More informationWright-Fisher Process. (as applied to costly signaling)
Wright-Fisher Process (as applied to costly signaling) 1 Today: 1) new model of evolution/learning (Wright-Fisher) 2) evolution/learning costly signaling (We will come back to evidence for costly signaling
More informationApproximating the coalescent with recombination
Approximating the coalescent with recombination Gilean A. T. McVean* and Niall J. Cardin 360, 1387 1393 doi:10.1098/rstb.2005.1673 Published online 7 July 2005 Department of Statistics, 1 South Parks Road,
More informationEstimating Ancient Population Sizes using the Coalescent with Recombination
Estimating Ancient Population Sizes using the Coalescent with Recombination Sara Sheehan joint work with Kelley Harris and Yun S. Song May 26, 2012 Sheehan, Harris, Song May 26, 2012 1 Motivation Introduction
More informationTheoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting
Theoretical Population Biology 75 (2009) 33 345 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb An approximate likelihood for genetic
More informationPedigree Reconstruction using Identity by Descent
Pedigree Reconstruction using Identity by Descent Bonnie Kirkpatrick Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-43 http://www.eecs.berkeley.edu/pubs/techrpts/2010/eecs-2010-43.html
More informationBottlenecks reduce genetic variation Genetic Drift
Bottlenecks reduce genetic variation Genetic Drift Northern Elephant Seals were reduced to ~30 individuals in the 1800s. Rare alleles are likely to be lost during a bottleneck Two important determinants
More information5 Inferring Population
5 Inferring Population History and Demography While population genetics was a very theoretical discipline originally, the modern abundance of population genetic data has forced the field to become more
More informationChapter 2: Genes in Pedigrees
Chapter 2: Genes in Pedigrees Chapter 2-0 2.1 Pedigree definitions and terminology 2-1 2.2 Gene identity by descent (ibd) 2-5 2.3 ibd of more than 2 genes 2-14 2.4 Data on relatives 2-21 2.1.1 GRAPHICAL
More informationWarning: software often displays unrooted trees like this:
Warning: software often displays unrooted trees like this: /------------------------------ Chara /-------------------------- Chlorella /---------16 \---------------------------- Volvox +-------------------17
More informationCoalescent Theory for a Partially Selfing Population
Copyright 6 1997 by the Genetics Society of America T Coalescent Theory for a Partially Selfing Population Yun-xin FU Human Genetics Center, University of Texas, Houston, Texas 77225 Manuscript received
More informationLecture 6: Inbreeding. September 10, 2012
Lecture 6: Inbreeding September 0, 202 Announcements Hari s New Office Hours Tues 5-6 pm Wed 3-4 pm Fri 2-3 pm In computer lab 3306 LSB Last Time More Hardy-Weinberg Calculations Merle Patterning in Dogs:
More informationMeek DNA Project Group B Ancestral Signature
Meek DNA Project Group B Ancestral Signature The purpose of this paper is to explore the method and logic used by the author in establishing the Y-DNA ancestral signature for The Meek DNA Project Group
More informationCombinatorics. Chapter Permutations. Counting Problems
Chapter 3 Combinatorics 3.1 Permutations Many problems in probability theory require that we count the number of ways that a particular event can occur. For this, we study the topics of permutations and
More informationCIS 2033 Lecture 6, Spring 2017
CIS 2033 Lecture 6, Spring 2017 Instructor: David Dobor February 2, 2017 In this lecture, we introduce the basic principle of counting, use it to count subsets, permutations, combinations, and partitions,
More informationBI515 - Population Genetics
BI515 - Population Genetics Fall 2014 Michael Sorenson msoren@bu.edu Office hours (BRB529): M, Th, F 4-5PM or by appt. (send e-mail) My research: Avian behavior, systematics, population genetics, and molecular
More informationPopulations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,
Change in Recessive Lethal Alleles Frequency in Inbred Populations arxiv:1304.2955v1 [q-bio.pe] 10 Apr 2013 Arindam RoyChoudhury Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,
More informationPopulation Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70
Population Genetics Joe Felsenstein GENOME 453, Autumn 2013 Population Genetics p.1/70 Godfrey Harold Hardy (1877-1947) Wilhelm Weinberg (1862-1937) Population Genetics p.2/70 A Hardy-Weinberg calculation
More informationFrequent Inconsistency of Parsimony Under a Simple Model of Cladogenesis
Syst. Biol. 52(5):641 648, 2003 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150390235467 Frequent Inconsistency of Parsimony Under a Simple Model
More informationHalley Family. Mystery? Mystery? Can you solve a. Can you help solve a
Can you solve a Can you help solve a Halley Halley Family Family Mystery? Mystery? Who was the great grandfather of John Bennett Halley? He lived in Maryland around 1797 and might have been born there.
More informationA Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to. Estimate Species Trees in the Presence of Gene Flow.
A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to Estimate Species Trees in the Presence of Gene Flow Thesis Presented in Partial Fulfillment of the Requirements for the Degree
More informationEvery human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary
Introduction to Genetic Genealogy Every human cell (except red blood cells and sperm and eggs) has an identical set of 23 pairs of chromosomes which carry all the hereditary information that is passed
More informationPopulation Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74
Population Genetics Joe Felsenstein GENOME 453, Autumn 2011 Population Genetics p.1/74 Godfrey Harold Hardy (1877-1947) Wilhelm Weinberg (1862-1937) Population Genetics p.2/74 A Hardy-Weinberg calculation
More informationPermutation Groups. Definition and Notation
5 Permutation Groups Wigner s discovery about the electron permutation group was just the beginning. He and others found many similar applications and nowadays group theoretical methods especially those
More informationarxiv: v1 [cs.gt] 23 May 2018
On self-play computation of equilibrium in poker Mikhail Goykhman Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem, 91904, Israel E-mail: michael.goykhman@mail.huji.ac.il arxiv:1805.09282v1
More informationLecture 1: Introduction to pedigree analysis
Lecture 1: Introduction to pedigree analysis Magnus Dehli Vigeland NORBIS course, 8 th 12 th of January 2018, Oslo Outline Part I: Brief introductions Pedigrees symbols and terminology Some common relationships
More informationMethods of Parentage Analysis in Natural Populations
Methods of Parentage Analysis in Natural Populations Using molecular markers, estimates of genetic maternity or paternity can be achieved by excluding as parents all adults whose genotypes are incompatible
More informationGENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS
GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Noah A. Rosenberg and Magnus Nordborg Improvements in genotyping technologies have led to the increased use of genetic polymorphism
More informationProbability - Introduction Chapter 3, part 1
Probability - Introduction Chapter 3, part 1 Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) January 27, 2004 Statistics 371 Last modified: Jan 28, 2004 Why Learn Probability? Some
More informationTRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter
TRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter 1-1: DNA: WHERE DO I START? Definition Genetic genealogy is the application of genetics to traditional genealogy. Genetic genealogy uses genealogical
More informationGenetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations
Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters
More informationProject. B) Building the PWM Read the instructions of HO_14. 1) Determine all the 9-mers and list them here:
Project Please choose ONE project among the given five projects. The last three projects are programming projects. hoose any programming language you want. Note that you can also write programs for the
More informationCoalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA
Coalescent Likelihood Methods Mary K. Kuhner Genome Sciences University of Washington Seattle WA Outline 1. Introduction to coalescent theory 2. Practical example 3. Genealogy samplers 4. Break 5. Survey
More informationOptimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations
Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations K. Stachowicz 12*, A. C. Sørensen 23 and P. Berg 3 1 Department
More informationUNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing
Copyright 1999 by the Genetics Society of America The Ancestry of a Sample of Sequences Subject to Recombination Carsten Wiuf and Jotun Hein Institute of Biological Sciences, University of Aarhus, DK-8000
More informationKinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.
Kinship/relatedness David Balding Professor of Statistical Genetics University of Melbourne, and University College London 2 Feb 2016 1 Ways to measure relatedness 2 Pedigree-based kinship coefficients
More informationEnumeration of Two Particular Sets of Minimal Permutations
3 47 6 3 Journal of Integer Sequences, Vol. 8 (05), Article 5.0. Enumeration of Two Particular Sets of Minimal Permutations Stefano Bilotta, Elisabetta Grazzini, and Elisa Pergola Dipartimento di Matematica
More informationBIOL 502 Population Genetics Spring 2017
BIOL 502 Population Genetics Spring 2017 Week 8 Inbreeding Arun Sethuraman California State University San Marcos Table of contents 1. Inbreeding Coefficient 2. Mating Systems 3. Consanguinity and Inbreeding
More informationRecap: Properties of Trees. Rooting an unrooted tree. Questions trees can address: Data for phylogeny reconstruction. Rooted vs unrooted trees:
Pairwise sequence alignment (global and local) Recap: Properties of rees Multiple sequence alignment global local ubstitution matrices atabase ing L equence statistics Leaf nodes contemporary taxa Internal
More informationYour web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore
Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Activitydevelop U SING GENETIC MARKERS TO CREATE L INEAGES How do
More informationBioinformatics for Evolutionary Biologists
Bioinformatics for Evolutionary Biologists Bernhard Haubold Angelika Börsch-Haubold Bioinformatics for Evolutionary Biologists A Problems Approach 123 Bernhard Haubold Department of Evolutionary Genetics
More informationBig Y-700 White Paper
Big Y-700 White Paper Powering discovery in the field of paternal ancestry Authors: Caleb Davis, Michael Sager, Göran Runfeldt, Elliott Greenspan, Arjan Bormans, Bennett Greenspan, and Connie Bormans Last
More information6.2 Modular Arithmetic
6.2 Modular Arithmetic Every reader is familiar with arithmetic from the time they are three or four years old. It is the study of numbers and various ways in which we can combine them, such as through
More informationDynamic Programming in Real Life: A Two-Person Dice Game
Mathematical Methods in Operations Research 2005 Special issue in honor of Arie Hordijk Dynamic Programming in Real Life: A Two-Person Dice Game Henk Tijms 1, Jan van der Wal 2 1 Department of Econometrics,
More informationDecrease of Heterozygosity Under Inbreeding
INBREEDING When matings take place between relatives, the pattern is referred to as inbreeding. There are three common areas where inbreeding is observed mating between relatives small populations hermaphroditic
More informationDVA325 Formal Languages, Automata and Models of Computation (FABER)
DVA325 Formal Languages, Automata and Models of Computation (FABER) Lecture 1 - Introduction School of Innovation, Design and Engineering Mälardalen University 11 November 2014 Abu Naser Masud FABER November
More information17. Symmetries. Thus, the example above corresponds to the matrix: We shall now look at how permutations relate to trees.
7 Symmetries 7 Permutations A permutation of a set is a reordering of its elements Another way to look at it is as a function Φ that takes as its argument a set of natural numbers of the form {, 2,, n}
More informationLaboratory 1: Uncertainty Analysis
University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can
More informationChapter 5 - Elementary Probability Theory
Chapter 5 - Elementary Probability Theory Historical Background Much of the early work in probability concerned games and gambling. One of the first to apply probability to matters other than gambling
More information