Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory
|
|
- Delilah Allison
- 5 years ago
- Views:
Transcription
1 Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Vineet Bafna Harish Nagarajan and Nitin Udpa 1 Disclaimer Please note that a lot of the text and figures here are copied from Nordborg s excellent tutorial on the subject [6]. This writeup is to be used only for the class. Please do NOT use for any purpose other than for class preparation. 2 Wright Fisher Model of Evolution When studying population genetic data, how can we decide if we are seeing something unexpected? One way is to simulate populations evolving under neutral circumstances, and then compute parameters that depart from neutrality. Wright and Fisher introduced a simple model of populations focusing on the genealogical relationships. It is based on a number of simplifying assumptions. 2.1 Insights of the model The Wright Fisher model of evolution attempts to simulate genetic drift wherein each individual in a generation produces an infinite and equal amount of gametes. The subsequent generation is a random draw of 2N gametes from this pool. In other words, the gene frequency in the next generation is composed of 2N draws from the gene frequency of the current generation. Mutations are randomly dropped into the gamete pool in the Wright Fisher model at a rate of mutations per individual per generation. The transition probability, which is the probability of a population drifting from a state having i copies to having j copies of a particular allele, can be obtained by the following expression [4]. T ij = ( ) ( 2N i j 2N ) j ( ) 2N j 2N i (1) 2N 2.2 Assumptions of the Wright Fisher Model The WF model of evolution rests mainly on the following six simplistic assumptions. 1. Discrete and non-overlapping generations are needed to separate the gamete pools from each generation. Although some mixing exists between generations, generally this assumption is a good way to reduce computational complexity without a significant loss in accuracy. 2. Constant population size (2N haplotypes) across generations is an assumption used to computationally simplify the problem. Although this is not accurate in certain regions of the world (with an exponentially growing population), the simplification is generally appropriate. 3. Equal fitness of all individuals is another simplifying assumption. This will be accurate if any mutations that occur are selectively neutral. Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA Bioinformatics Graduate Program, University of California San Diego, La Jolla, CA, USA 1
2 4. No geographical or social structure and random mating is yet another assumption that simplifies the problem computationally. This is not really accurate for large populations, but is necessary for any realistic simulation. 5. No recombination is also assumed for computational simplification. This is not necessarily accurate, but generally recombination rates are quite low, so it is probably a valid assumption in most cases. This is used to trace the lineage of members of a population easily, as each child is a direct copy of a parent (except for mutations). 6. The infinite sites assumption is incorporated for computational simplification, as each site can mutate at most once. This is used to create a simplified tree, as once a site is mutated, every descendant of the individual in question will have the mutation and every descendant not descended from the individual would have the normal form of the gene. This is a reasonable assumption as most sites rarely mutate multiple times, particularly over a short number of generations. 2.3 Simulating populations under the WF model In order to generate a small sample of n individuals from a population of 2N haplotypes, where n << N, the following strategy is used. First, the genealogy is simulated for T generations. This is followed by dropping mutations at a fixed rate µ, each at a new site, following which the haplotypes are generated and n individuals are sampled. The time complexity of this simulation strategy to generate a random population is O(NT ). An example of the WF simulation is shown in Figure 1 Figure 1: In this model, the genealogy is simulated for T generations of 2N individuals each. The green lines on the right panel indicate mutations being dropped at a rate µ. 3 Coalescent Theory Coalescent theory is an efficient way of simulating populations under some assumptions. Indeed, a population of n << N individuals can be simulated using only O(n) random values corresponding to coalescent times 2
3 as against O(NT ) in the case of WF model. 3.1 Insights of Coalescent Theory One of the key insights of coalescent theory is that the genealogy can be separated from the allelic states In other words, the mutations or allelic states that occur in a population have no effect on the fitness of its individual. As a result, the genealogy can be created randomly without any input from the genotype. According to Nordborg, the state can be separated from the descent. As shown in Figure 1, after creating the genealogic tree, the allelic states of any group of individuals can be generated by assigning an allelic state to their most recent common ancestor (MRCA), and then dropping mutations along the branches of the genealogical tree that leads to them. The coalescent theory also rests on the fact that much of the genealogy is irrelevant. The genealogy of a group of individuals can be modeled backwards in time, without worrying about other members of the previous generation that did not contribute offsprings. This is clearly depicted in Figure 2 Further, as a general consequence of the assumption of selective neutrality, each individual in a generation can be viewed as picking its parent at random from the previous generation. Figure 2: As one goes backwards in time, fewer individuals contribute to the current gamete pool as coalescent events occur. The topology of the genealogy basically can be generated by randomly picking lineages to coalesce. In other words, all topologies are equally likely. On the other hand, the branch length, which is defined by the coalescent time, is an independent exponential random variable. From this, stems the third insight of coalescent theory i.e. topology of genealogy is independent of coalescent time. Finally, there is no need to go back generation by generation to get coalescent times. Given some k individuals (1 k n) from a WF population of 2N, 3
4 t P r[no coalescence one generation back] = = k 1 i=0 k 1 i=0 2N i 2N 1 i 2N ( k 2) = 1 2N + O P r[no coalescence T generations back] e (k 2)t ( ) 1 N 2 (2) (3) (4) e (k 2) 2N (5) = e (k 2)τ 2N (6) Where τ = 2N in Eq: 7 is time measured in units of 2N generations. This is convenient as it takes an unknown quantity (N) out of the equation. E(Coalescence time for k individuals) = 1 ( k 2) (8) (7) In other words, the time for individuals to coalesce can be calculated via an exponentially distributed random variable. It is easy to see now that CT offers an implicit, efficient algorithm for generating a random genealogy of n individuals. 1. Generate a random binary topology for n individuals. 2. For k = n... 2, generate random times t k exponentially distributed with parameter ( k 2). 3. For k = n... 2, set the coalescent time to t k. Figure 3 illustrates this for a particular case of 6 individuals. Figure 3: Genealogical trees for 6 individuals are randomly generated using an exponential random variable with parameter ( k 2), where k is the number of distinct individuals in a generation. 3.2 Coalescent Properties We can use CT to compute some parameters. Let T MRCA denote the time to reach the most recent common ancestor. The expected time to reach the MRCA is calculated as the summation of the expected coalescence 4
5 times at each stage of coalescence (i.e. from n individuals to 2 individuals). E(T MRCA ) = n i=2 1 ( i 2) (9) n 2 = (i) (i 1) i=2 n ( 1 = 2 i 1 1 ) i i=2 ( = ) n All times are in units of 2N generations. Note also that the expected time of the last step (2 individuals coalescing into 1) is 1, and it takes half of the total time to MRCA. This implies that most mutations ( 50%) are shared across ancient mutations. Let T tot denote the sum of all branch-lengths. E(T tot ) = = n i=2 n i=2 (10) (11) (12) i ( i 2) (13) 2 i 1 (14) = 2 (γ + log n) (15) Here, γ is the Euler s constant. Equation 15 implies that with increasing n, the sum of all branch lengths increases only by a factor of log n. Thus this limits the benefits of sampling with larger populations. 3.3 Simulating Populations Simulating Populations of Constant Size Once we have a genealogy, we can use it to simulate a population. Each allele travels from the root to the leaf (individual), possibly mutating on the way. Suppose that for the genomic region we are simulating, mutations occur at a fixed rate µ per generation. On any branch of length τ (in units of 2N generations), select a number of mutations by sampling from an exponential distribution with mean 2N τ µ. In practice, we usually make the infinite sites assumption that each mutation hits a new site. Therefore, each mutation is labeled with the site label. This set of mutations allows us to generate a population of variant sites. An example of this strategy is shown in Figure 4 5
6 Figure 4: Simulating populations by generating a coalescent topology and branch length and dropping mutation with rate µt for every branch of length t. The numbers on each branch indicate the sites which have been mutated in that particular branch Simulating Populations under exponential growth Equation 8 shows that the expected time to coalesce is directly proportional to the population size N (as the time is measured in 2N generations). Under conditions of exponential population growth, N increases (and thus, the time to coalesce increases) as one goes forward in time. As a result, the branches closer to the leaves would be larger and those closer to the root would be smaller than the corresponding branches of a tree created under an unchanging population size. As mutation rate is proportional to branch length, one would expect mutations to be more unique under conditions of exponential growth (see Figure 5 for an illustration). One of the most common tools to simulate coalescent populations, Hudson s ms program, was used in Table 1 to further show this. 6
7 Figure 5: Typical genealogical trees under constant size and exponential growth conditions Table 1: Sample Populations Generated using Hudson s ms Program Constant Population Size Exponential Growth
8 3.4 Coalescent with Recombination The coalescent theory with recombination is identical to the theory without recombination with the addition of one more scaled factor, ρ, which is the probability of recombining. As a result, in a generation with k individuals, either an individual arises because of a recombination event between two individuals (and thus, it will have 2 parents each contributing a part of his genome), two individuals coalesce, neither (each individual has a distinct parent), or multiple events (which is generally ignored due to its low probability). In summary, the evolutionary history cannot be considered as a tree but as an Ancestral Recombination Graph (ARG). The ARG is represented pictorially in Figure 6 Figure 6: Pictorial representation of the Ancestral Recombination Graph 8
9 3.4.1 Generating sequences of the ARG The first step in generating sequences is to create the ARG. In order to simulate the ARG, the branch lengths and topology can be calculated as follows. Assume there are k individuals in a generation, r is the recombination rate and the population size is 2N. The mutation rate is µ P r[a recombination event occurs] = kr (16) ( k 2) P r[a coalescence event occurs] = P r[no individual recombines and no pair coalesces ] = e (kr2n+(k 2)) (18) P r[recombination given Coalescence or recombination] = kr2n kr2n + ( ) k 2 (19) = kρ kρ + 2 ( ) k (20) = 2N 2 ρ ρ + (k 1) When the time is considered in scaled units of 2N generations, the number of individuals will increase at a rate of kr2n and decrease at a rate of ( k 2). The scaled recombination rate ρ is thus defined as 4Nr. The following iteration is carried out by starting with k equal to n until k reaches a value of 1. The event of recombination is picked with the probability given by equation 21. If the event is recombination, an individual and a position to recombine are chosen randomly, otherwise a pair to coalesce is chosen. Once the ARG is simulated, the constituent coalescents are generated and the scaled mutation rate θ (which is equal to 4N µ) is revised by the following procedure. Every position less than the randomly chosen locus of recombination comes from the left parent and every position greater than the locus of recombination comes from the right parent. The fraction of the positions that go along a particular path is equal to the fraction of the mutation rate (compared to θ) of that path. The sequence is generated by splitting up the sequences into several trees based on the locations of recombinations, dropping mutations, and concatenating the resulting subsequences. The above described process is summarized in Figure 7 (17) (21) 9
10 4 Perfect Phylogeny Figure 7: Generating the sequences of the ARG The perfect phylogeny algorithm can be used to generate a genealogical tree from a population undergoing a coalescent process. It works as follows. The infinite sites assumption dictates that only two states are possible for any locus (the mutated state and the ancestral state), so these can be computationally represented as 0 and 1 (arbitrarily, if unknown). The sites can be sorted in decreasing order based on the number of mutated states present in the population. A tree can be created with individuals as leaves of a root containing the ancestral state of all sites. Following this, each locus can be used to sort the individuals into those containing the mutated state or the ancestral state. More loci with presumably different characteristics would create different dichotomies and thus, define the relative locations of mutations on the tree more accurately. Another way to look at this is for every pair of columns i and j in the sorted genotypic matrix, a perfect phylogeny exists if and only if the sets of rows with the value one are disjoint or if one is a subset of the other. Implementing this algorithm by comparing the rows of every pair of columns would yield a complexity of O(nm 2 ). [2] Given the genealogical tree, the branch lengths can be estimated by using the estimated time to coalesce, which is shown in equation 8. The perfect phylogeny algorithm is pictorially represented in Figure 8 10
11 Figure 8: The steps involved in constructing the genealogy using the perfect phylogeny algorithm Given a population and a pair of sites in the population, the LD can be calculated as follows. D = P 00 P 0 P 0 (22) In the perfect phylogeny algorithm, assumptions include a lack of recombination or recurrent mutations, so any pair of sites in the population will be linked. As a result, this will cause a state of perfect disequilibrium (represented by a high D value). 4.1 Applications of Perfect Phylogeny The perfect phylogeny algorithm generates a genealogical tree from members of a population. This can be useful in several aspects. For instance, in the genographic project, loci from human mitochondrial DNA and Y-chromosome DNA (both of which do not recombine) from several different ethnic groups are used to create a genealogical tree. This tree can be used to temporally place the coalescence events of the groups and thus, suggest a migration pattern for the human species. ( Linear time algorithm for perfect phylogeny The perfect phylogeny algorithm described above is of complexity O(nm 2 ). In 1991 Dan Gusfield proposed a linear time algorithm for perfect phylogeny based on graph theoretic approach [1]. Assuming an input of an n*m genotype array M of a population, this algorithm is summarized below. 11
12 1. Treat the columns of M as binary digits (with most significant bit in row 1) and sort in descending order. Delete any duplicate columns. 2. Store all cells with value 1 in a separate structure (called O in the paper). Create an associated matrix L. For every cell (i, j) in O, let k be the largest value less than j such that (i, k) is in O (if j is the first column with a 1 in row i, let L(i, j) = 0). 3. A perfect phylogeny exists if and only if, for every column j, every cell (i, j) in O has the same value stored in L(i, j). 4.3 Special cases of the perfect phylogeny problem In the unrooted case, if a perfect phylogeny exists, a root can be determined. Under a perfect phylogeny, the actual tree does not change when 1s and 0s at a column are interchanged. In this case, if the values in a column are switched, such that 0 is the majority element, the same tree as in the rooted case could be obtained. In the event of missing data, the perfect phylogeny problem becomes intractable. As discussed by Kimmel and Shamir [5], the perfect phylogeny algorithm turns out to be an NP - hard problem. A special, polynomial case of this as described by Halperin and Karp [3] is if the input satisfies the rich data hypothesis (that is, if enough information exists to infer missing haplotypes). In addition, if the root is known, the perfect phylogeny algorithm can be modified to account for the missing data. In the event of a recurrent mutation occuring, the infinite sites assumption becomes invalid. This would probably make the task of reconstructing the genealogy nonpolynomial, and thus, impossible to solve accurately under any realistic sample. References [1] D Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks, 21:19 28, [2] D Gusfield. Algorithms on Strings, Trees, And Sequences. Cambridge University Press, Cambridge, UK, [3] E. Halperin and R.M. Karp. Perfect phylogeny and haplotype assignment. In Proceedings of the 8th RECOMB. ACM Press. [4] D.L. Hartl and A.G. Clark. Principles of Population Genetics. Sinauer Associates, Inc., Sunderland, MA, USA, [5] G. Kimmel and R. Shamir. The incomplete perfect phylogeny haplotype problem. J Bioinform Comput Biol, 3: , Apr [6] M. Nordborg. Handbook of Statistical Genetics, chapter Coalescent Theory. John Wiley & Sons, Ltd,
Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application
Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application
More informationAncestral Recombination Graphs
Ancestral Recombination Graphs Ancestral relationships among a sample of recombining sequences usually cannot be accurately described by just a single genealogy. Linked sites will have similar, but not
More informationGenealogical trees, coalescent theory, and the analysis of genetic polymorphisms
Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Magnus Nordborg University of Southern California The importance of history Genetic polymorphism data represent the outcome
More informationCoalescent Theory: An Introduction for Phylogenetics
Coalescent Theory: An Introduction for Phylogenetics Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University lkubatko@stat.ohio-state.edu
More informationBIOL Evolution. Lecture 8
BIOL 432 - Evolution Lecture 8 Expected Genotype Frequencies in the Absence of Evolution are Determined by the Hardy-Weinberg Equation. Assumptions: 1) No mutation 2) Random mating 3) Infinite population
More informationForward thinking: the predictive approach
Coalescent Theory 1 Forward thinking: the predictive approach Random variation in reproduction causes random fluctuation in allele frequencies. Can describe this process as diffusion: (Wright 1931) showed
More informationPopulation Structure and Genealogies
Population Structure and Genealogies One of the key properties of Kingman s coalescent is that each pair of lineages is equally likely to coalesce whenever a coalescent event occurs. This condition is
More informationThe genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times
The coalescent The genealogical history of a population The coalescent process Identity by descent Distribution of pairwise coalescence times Adding mutations Expected pairwise differences Evolutionary
More informationPopulation Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA
Population Genetics using Trees Peter Beerli Genome Sciences University of Washington Seattle WA Outline 1. Introduction to the basic coalescent Population models The coalescent Likelihood estimation of
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationPopulation genetics: Coalescence theory II
Population genetics: Coalescence theory II Peter Beerli August 27, 2009 1 The variance of the coalescence process The coalescent is an accumulation of waiting times. We can think of it as standard queuing
More information2 The Wright-Fisher model and the neutral theory
0 THE WRIGHT-FISHER MODEL AND THE NEUTRAL THEORY The Wright-Fisher model and the neutral theory Although the main interest of population genetics is conceivably in natural selection, we will first assume
More informationBioinformatics I, WS 14/15, D. Huson, December 15,
Bioinformatics I, WS 4/5, D. Huson, December 5, 204 07 7 Introduction to Population Genetics This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at the Australian
More informationComparative method, coalescents, and the future
Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/36 Correlation of
More informationSTAT 536: The Coalescent
STAT 536: The Coalescent Karin S. Dorman Department of Statistics Iowa State University November 7, 2006 Wright-Fisher Model Our old friend the Wright-Fisher model envisions populations moving forward
More informationCoalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000
Coalescent Theory Magnus Nordborg Department of Genetics, Lund University March 24, 2000 Abstract The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous
More informationPedigree Reconstruction using Identity by Descent
Pedigree Reconstruction using Identity by Descent Bonnie Kirkpatrick Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-43 http://www.eecs.berkeley.edu/pubs/techrpts/2010/eecs-2010-43.html
More informationComparative method, coalescents, and the future. Correlation of states in a discrete-state model
Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/28 Correlation of
More informationCoalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48
Coalescents p.1/48 Coalescents Joe Felsenstein GENOME 453, Autumn 2015 Coalescents p.2/48 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C.
More informationTREES OF GENES IN POPULATIONS
1 TREES OF GENES IN POPULATIONS Joseph Felsenstein Abstract Trees of ancestry of copies of genes form in populations, as a result of the randomness of birth, death, and Mendelian reproduction. Considering
More informationAnalysis of geographically structured populations: Estimators based on coalescence
Analysis of geographically structured populations: Estimators based on coalescence Peter Beerli Department of Genetics, Box 357360, University of Washington, Seattle WA 9895-7360, Email: beerli@genetics.washington.edu
More informationPart I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL
Part I Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Chapter 1 The Coalescent of Bacterial Populations Mikkel H. Schierup and Carsten Wiuf 1.1 BACKGROUND AND MOTIVATION Recent
More informationViral epidemiology and the Coalescent
Viral epidemiology and the Coalescent Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, and Departments of Biomathematics and Human Genetics David Geffen School
More informationKenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor
Kenneth Nordtvedt Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor (TMRCA) tool to estimate how far back in time the common ancestor existed for two Y-STR haplotypes obtained
More informationDNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding
DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding by Dr. Ing. Robert L. Baber 2014 July 26 Rights reserved, see the copyright notice at http://gengen.rlbaber.de
More informationcan mathematicians find the woods?
Eolutionary trees, coalescents, and gene trees: can mathematicians find the woods? Joe Felsenstein Department of Genome Sciences and Department of Biology Eolutionary trees, coalescents, and gene trees:
More informationKinship and Population Subdivision
Kinship and Population Subdivision Henry Harpending University of Utah The coefficient of kinship between two diploid organisms describes their overall genetic similarity to each other relative to some
More informationCoalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39
Coalescents Joe Felsenstein GENOME 453, Winter 2007 Coalescents p.1/39 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial
More informationIoanna Manolopoulou and Brent C. Emerson. October 7, Abstract
Phylogeographic Ancestral Inference Using the Coalescent Model on Haplotype Trees Ioanna Manolopoulou and Brent C. Emerson October 7, 2011 Abstract Phylogeographic ancestral inference is a question frequently
More informationMOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS
MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS Daniel A. Vasco*, Keith A. Crandall* and Yun-Xin Fu *Department of Zoology, Brigham Young University, Provo, UT 8460, USA Human
More informationExercise 4 Exploring Population Change without Selection
Exercise 4 Exploring Population Change without Selection This experiment began with nine Avidian ancestors of identical fitness; the mutation rate is zero percent. Since descendants can never differ in
More informationarxiv: v1 [q-bio.pe] 4 Mar 2013
Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees arxiv:1303.0673v1 [q-bio.pe] 4 Mar 2013 Sha Zhu 1,, James H Degnan 2 and Bjarki Eldon 3 1
More informationEstimating Ancient Population Sizes using the Coalescent with Recombination
Estimating Ancient Population Sizes using the Coalescent with Recombination Sara Sheehan joint work with Kelley Harris and Yun S. Song May 26, 2012 Sheehan, Harris, Song May 26, 2012 1 Motivation Introduction
More informationThe Two Phases of the Coalescent and Fixation Processes
The Two Phases of the Coalescent and Fixation Processes Introduction The coalescent process which traces back the current population to a common ancestor and the fixation process which follows an individual
More information6.047/6.878 Lecture 21: Phylogenomics II
Guest Lecture by Matt Rasmussen Orit Giguzinsky and Ethan Sherbondy December 13, 2012 1 Contents 1 Introduction 3 2 Inferring Orthologs/Paralogs, Gene Duplication and Loss 3 2.1 Species Tree..............................................
More informationThe Coalescent. Chapter Population Genetic Models
Chapter 3 The Coalescent To coalesce means to grow together, to join, or to fuse. When two copies of a gene are descended from a common ancestor which gave rise to them in some past generation, looking
More informationWarning: software often displays unrooted trees like this:
Warning: software often displays unrooted trees like this: /------------------------------ Chara /-------------------------- Chlorella /---------16 \---------------------------- Volvox +-------------------17
More informationCoalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2
Coalescence time distributions for hypothesis testing -Kapil Rajaraman (rajaramn@uiuc.edu) 498BIN, HW# 2 This essay will be an overview of Maryellen Ruvolo s work on studying modern human origins using
More informationLANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS
LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS ABSTRACT The recent popularity of genetic algorithms (GA s) and their application to a wide range of problems is a result of their
More informationGene coancestry in pedigrees and populations
Gene coancestry in pedigrees and populations Thompson, Elizabeth University of Washington, Department of Statistics Box 354322 Seattle, WA 98115-4322, USA E-mail: eathomp@uw.edu Glazner, Chris University
More informationChapter 12 Gene Genealogies
Chapter 12 Gene Genealogies Noah A. Rosenberg Program in Molecular and Computational Biology. University of Southern California, Los Angeles, California 90089-1113 USA. E-mail: noahr@usc.edu. Phone: 213-740-2416.
More informationYour mtdna Full Sequence Results
Congratulations! You are one of the first to have your entire mitochondrial DNA (DNA) sequenced! Testing the full sequence has already become the standard practice used by researchers studying the DNA,
More informationTópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II
Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre 29 -II Lab Coalescent simulation using SIMCOAL 17 septiembre 29 Coalescent theory provides a powerful model
More informationContributed by "Kathy Hallett"
National Geographic: The Genographic Project Name Background The National Geographic Society is undertaking the ambitious process of tracking human migration using genetic technology. By using the latest
More informationApproximating the coalescent with recombination
Approximating the coalescent with recombination Gilean A. T. McVean* and Niall J. Cardin 360, 1387 1393 doi:10.1098/rstb.2005.1673 Published online 7 July 2005 Department of Statistics, 1 South Parks Road,
More informationInbreeding and self-fertilization
Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that I went over a couple of lectures ago? Well, we re about
More informationTheoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting
Theoretical Population Biology 75 (2009) 33 345 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb An approximate likelihood for genetic
More informationEvery human cell (except red blood cells and sperm and eggs) has an. identical set of 23 pairs of chromosomes which carry all the hereditary
Introduction to Genetic Genealogy Every human cell (except red blood cells and sperm and eggs) has an identical set of 23 pairs of chromosomes which carry all the hereditary information that is passed
More informationThe African Origin Hypothesis What do the data tell us?
The African Origin Hypothesis What do the data tell us? Mitochondrial DNA and Human Evolution Cann, Stoneking and Wilson, Nature 1987. WOS - 1079 citations Mitochondrial DNA and Human Evolution Cann, Stoneking
More informationMethods of Parentage Analysis in Natural Populations
Methods of Parentage Analysis in Natural Populations Using molecular markers, estimates of genetic maternity or paternity can be achieved by excluding as parents all adults whose genotypes are incompatible
More informationGame Theory and Randomized Algorithms
Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international
More informationInbreeding and self-fertilization
Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that we just finished? Well, we re about to begin violating
More informationThe Genetic Algorithm
The Genetic Algorithm The Genetic Algorithm, (GA) is finding increasing applications in electromagnetics including antenna design. In this lesson we will learn about some of these techniques so you are
More informationPopulations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,
Change in Recessive Lethal Alleles Frequency in Inbred Populations arxiv:1304.2955v1 [q-bio.pe] 10 Apr 2013 Arindam RoyChoudhury Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,
More informationA Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to. Estimate Species Trees in the Presence of Gene Flow.
A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to Estimate Species Trees in the Presence of Gene Flow Thesis Presented in Partial Fulfillment of the Requirements for the Degree
More informationFrequent Inconsistency of Parsimony Under a Simple Model of Cladogenesis
Syst. Biol. 52(5):641 648, 2003 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150390235467 Frequent Inconsistency of Parsimony Under a Simple Model
More informationMODERN population genetics is data driven and
Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.092460 Note Extensions of the Coalescent Effective Population Size John Wakeley 1 and Ori Sargsyan Department of Organismic
More informationCONGEN. Inbreeding vocabulary
CONGEN Inbreeding vocabulary Inbreeding Mating between relatives. Inbreeding depression Reduction in fitness due to inbreeding. Identical by descent Alleles that are identical by descent are direct descendents
More informationPopulation Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70
Population Genetics Joe Felsenstein GENOME 453, Autumn 2013 Population Genetics p.1/70 Godfrey Harold Hardy (1877-1947) Wilhelm Weinberg (1862-1937) Population Genetics p.2/70 A Hardy-Weinberg calculation
More informationDISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS
Adv. Appl. Prob. 31, 1027 1035 (1999) Printed in Northern Ireland Applied Probability Trust 1999 DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS It is a pleasure to be able to comment
More informationPhylogeny and Molecular Evolution
Phylogeny and Molecular Evolution Character Based Phylogeny Large Parsimony 1/50 Credit Ron Shamir s lecture notes Notes by Nir Friedman Dan Geiger, Shlomo Moran, Sagi Snir and Ron Shamir Durbin et al.
More informationGrowing the Family Tree: The Power of DNA in Reconstructing Family Relationships
Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships Luke A. D. Hutchison Natalie M. Myres Scott R. Woodward Sorenson Molecular Genealogy Foundation (www.smgf.org) 2511 South
More informationResearch Article The Ancestry of Genetic Segments
International Scholarly Research Network ISRN Biomathematics Volume 2012, Article ID 384275, 8 pages doi:105402/2012/384275 Research Article The Ancestry of Genetic Segments R B Campbell Department of
More informationThe Coalescent Model. Florian Weber
The Coalescent Model Florian Weber 23. 7. 2016 The Coalescent Model coalescent = zusammenwachsend Outline Population Genetics and the Wright-Fisher-model The Coalescent on-constant population-sizes Further
More informationLaboratory 1: Uncertainty Analysis
University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can
More informationPOPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger
POPULATIO GEETICS: WRIGHT FISHER MODEL AD COALESCET PROCESS by Hailong Cui and Wangshu Zhang Superviser: Prof. Quentin Berger A Final Project Report Presented In Partial Fulfillment of the Requirements
More informationEstimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling
Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling Mary K. Kuhner, Jon Yamato, and Joseph Felsenstein Department of Genetics, University of Washington
More informationGENOMIC REARRANGEMENT ALGORITHMS
GENOMIC REARRANGEMENT ALGORITHMS KAREN LOSTRITTO Abstract. In this paper, I discuss genomic rearrangement. Specifically, I describe the formal representation of these genomic rearrangements as well as
More informationHalley Family. Mystery? Mystery? Can you solve a. Can you help solve a
Can you solve a Can you help solve a Halley Halley Family Family Mystery? Mystery? Who was the great grandfather of John Bennett Halley? He lived in Maryland around 1797 and might have been born there.
More informationRecap: Properties of Trees. Rooting an unrooted tree. Questions trees can address: Data for phylogeny reconstruction. Rooted vs unrooted trees:
Pairwise sequence alignment (global and local) Recap: Properties of rees Multiple sequence alignment global local ubstitution matrices atabase ing L equence statistics Leaf nodes contemporary taxa Internal
More informationObjective: Why? 4/6/2014. Outlines:
Objective: Develop mathematical models that quantify/model resemblance between relatives for phenotypes of a quantitative trait : - based on pedigree - based on markers Outlines: Causal model for covariances
More informationPopulation Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74
Population Genetics Joe Felsenstein GENOME 453, Autumn 2011 Population Genetics p.1/74 Godfrey Harold Hardy (1877-1947) Wilhelm Weinberg (1862-1937) Population Genetics p.2/74 A Hardy-Weinberg calculation
More informationCoalescent Theory for a Partially Selfing Population
Copyright 6 1997 by the Genetics Society of America T Coalescent Theory for a Partially Selfing Population Yun-xin FU Human Genetics Center, University of Texas, Houston, Texas 77225 Manuscript received
More informationGENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS
GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Noah A. Rosenberg and Magnus Nordborg Improvements in genotyping technologies have led to the increased use of genetic polymorphism
More informationAchieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters
Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.
More informationMeek DNA Project Group B Ancestral Signature
Meek DNA Project Group B Ancestral Signature The purpose of this paper is to explore the method and logic used by the author in establishing the Y-DNA ancestral signature for The Meek DNA Project Group
More informationA Genetic Algorithm-Based Controller for Decentralized Multi-Agent Robotic Systems
A Genetic Algorithm-Based Controller for Decentralized Multi-Agent Robotic Systems Arvin Agah Bio-Robotics Division Mechanical Engineering Laboratory, AIST-MITI 1-2 Namiki, Tsukuba 305, JAPAN agah@melcy.mel.go.jp
More information5 Inferring Population
5 Inferring Population History and Demography While population genetics was a very theoretical discipline originally, the modern abundance of population genetic data has forced the field to become more
More informationTRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter
TRACK 1: BEGINNING DNA RESEARCH presented by Andy Hochreiter 1-1: DNA: WHERE DO I START? Definition Genetic genealogy is the application of genetics to traditional genealogy. Genetic genealogy uses genealogical
More informationUsing Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM
Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM This is one article of a series on using DNA for genealogical research. There are several types of DNA tests offered for genealogical
More informationBottlenecks reduce genetic variation Genetic Drift
Bottlenecks reduce genetic variation Genetic Drift Northern Elephant Seals were reduced to ~30 individuals in the 1800s. Rare alleles are likely to be lost during a bottleneck Two important determinants
More informationFull Length Research Article
Full Length Research Article ON THE EXTINCTION PROBABILITY OF A FAMILY NAME *DZAAN, S. K 1., ONAH, E. S 2. & KIMBIR, A. R 2. 1 Department of Mathematics and Computer Science University of Mkar, Gboko Nigeria.
More informationShuffled Complex Evolution
Shuffled Complex Evolution Shuffled Complex Evolution An Evolutionary algorithm That performs local and global search A solution evolves locally through a memetic evolution (Local search) This local search
More informationReport on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl
Report on the VAN_TUYL Surname Project Y-STR Results 3/11/2013 Rory Van Tuyl Abstract: Recent data for two descendants of Ott van Tuyl has been added to the project, bringing the total number of Gameren
More informationEvaluating the performance of likelihood methods for. detecting population structure and migration
Molecular Ecology (2004) 13, 837 851 doi: 10.1111/j.1365-294X.2004.02132.x Evaluating the performance of likelihood methods for Blackwell Publishing, Ltd. detecting population structure and migration ZAID
More informationDNA: Statistical Guidelines
Frequency calculations for STR analysis When a probative association between an evidence profile and a reference profile is made, a frequency estimate is calculated to give weight to the association. Frequency
More informationGEDmatch Home Page The upper left corner of your home page has Information about you and links to lots of helpful information. Check them out!
USING GEDMATCH Created March 2015 GEDmatch is a free, non-profit site that accepts raw autosomal data files from Ancestry, FTDNA, and 23andme. As such, it provides a large autosomal database that spans
More informationCoding for Efficiency
Let s suppose that, over some channel, we want to transmit text containing only 4 symbols, a, b, c, and d. Further, let s suppose they have a probability of occurrence in any block of text we send as follows
More informationUsing Y-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM
Using Y-DNA for Genealogy Debbie Parker Wayne, CG, CGL SM This is one article of a series on using DNA for genealogical research. There are several types of DNA tests offered for genealogical purposes.
More informationVesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham
Towards the Automatic Design of More Efficient Digital Circuits Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham
More informationLecture 1: Introduction to pedigree analysis
Lecture 1: Introduction to pedigree analysis Magnus Dehli Vigeland NORBIS course, 8 th 12 th of January 2018, Oslo Outline Part I: Brief introductions Pedigrees symbols and terminology Some common relationships
More informationPopstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing
Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing Arthur J. Eisenberg, Ph.D. Director DNA Identity Laboratory UNT-Health Science Center eisenber@hsc.unt.edu PATERNITY TESTING
More informationVol. 5, No. 6 June 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.
Optimal Synthesis of Finite State Machines with Universal Gates using Evolutionary Algorithm 1 Noor Ullah, 2 Khawaja M.Yahya, 3 Irfan Ahmed 1, 2, 3 Department of Electrical Engineering University of Engineering
More informationTwo-point linkage analysis using the LINKAGE/FASTLINK programs
1 Two-point linkage analysis using the LINKAGE/FASTLINK programs Copyrighted 2018 Maria Chahrour and Suzanne M. Leal These exercises will introduce the LINKAGE file format which is the standard format
More informationInference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4,
1 Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4, 1 Department of Mathematics, University of Bristol, Bristol,
More informationHuman origins and analysis of mitochondrial DNA sequences
Human origins and analysis of mitochondrial DNA sequences Science, February 7, 1992 L. Vigilant et al. [1] recently presented "the strongest support yet for the placement of [their] common mtdna [mitochondrial
More informationUsing Autosomal DNA for Genealogy Debbie Parker Wayne, CG, CGL SM
Using Autosomal DNA for Genealogy Debbie Parker Wayne, CG, CGL SM This is one article of a series on using DNA for genealogical research. There are several types of DNA tests offered for genealogical purposes.
More informationbaobabluna: the solution space of sorting by reversals Documentation Marília D. V. Braga
baobabluna: the solution space of sorting by reversals Documentation Marília D. V. Braga March 15, 2009 II Acknowledgments This work was funded by the European Union Programme Alβan (scholarship no. E05D053131BR),
More informationYour web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore
Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Activitydevelop U SING GENETIC MARKERS TO CREATE L INEAGES How do
More informationUniversity of Washington, TOPMed DCC July 2018
Module 12: Comput l Pipeline for WGS Relatedness Inference from Genetic Data Timothy Thornton (tathornt@uw.edu) & Stephanie Gogarten (sdmorris@uw.edu) University of Washington, TOPMed DCC July 2018 1 /
More information