Coalescent Theory for a Partially Selfing Population

Size: px
Start display at page:

Download "Coalescent Theory for a Partially Selfing Population"

Transcription

1 Copyright by the Genetics Society of America T Coalescent Theory for a Partially Selfing Population Yun-xin FU Human Genetics Center, University of Texas, Houston, Texas Manuscript received October 16, 1996 Accepted for publication April 21, 1997 ABSTRACT A coalescent theory for a sample of DNA sequences from a partially selfing diploid population and an algorithm for simulating such samples are developed in this article. Approximate formulas are given for the expectation and the variance of the number of segregating sites in a sample of k sequences from n individuals. Several new estimators of the important parameters 0 = 4Np and the selfing rate s, where Nand p are, respectively, the effective population size and the mutation rate per sequence per generation, are proposed and their sampling properties are studied. HE coalescent theory ( KINGMAN 1982a,b; HUDSON 1983; TAJIMA 1983; and see HUDSON 1991 for a review) is becoming the cornerstone for utilizing the information in a sample of DNA sequences from a population to study its evolution. Although a number of population genetics models have been studied in the framework of coalescence, it remains a common assumption that mating between individuals is at random, despite that inbreeding is not rare in nature. A completely selfing population is an extreme example of inbreeding, while many plant populations exhibit partial self-fertilization, that is, a mixture of selfing and random mating ( FRYXELL 1957; WILLSON 1984). The purpose of this article is to develop a coalescent theory for a sample from a partially selfing population. I shall use WRIGHT S (1969) definition of partial selfing throughout this article, which assumes that in a diploid population of N individuals each individual has probability s (the selfing rate) of being the offspring of a self-fertilization and probability 1 - s of being the offspring of a random mating. Furthermore, an individual resulting from selfing has probability 1 / Nof being the offspring of any particular individual of the previous generation, and an individual resulting from random mating is formed by randomly selecting two sequences from the gene pool of 2N sequences of the previous generation. We also assume in this article that random genetic drift and partial selfing are the only evolutionary forces in action. In other words, there is no selection, no population subdivision and migration, and no recombination. Although the coalescent theory is a relatively young branch of population genetics, the concept of identity by descent in classical population genetics theory has essentially the same spirit as the coalescence. Indeed, SLATKIN ( 1992) tried to link the probability of identical Corresponding authm: Yun-Xin Fu, Human Genetics Center, University of Texas at Houston, 6901 Bertner Ave., Houston, TX fu@hgc.sph.uth.tmc.edu Genetics (August, 1997) by descent, the inbreeding coefficient, to coalescent time and MILLIGAN (1996) utilized SLATKIN S approach to derive an estimator of the selfing rate. However, identical by descent has a fundamental weakness: it deals with only two sequences at time a and by doing so, much information in sample a is wasted. The coalescent approach deals with the whole sample of sequences and allows efficient use of available information ( FELSENSTEIN 1992; Fu and LI 1993). Furthermore, coalescent algorithms provide highly efficient methods to simulate samples from a population. The coalescent theory and a simulation algorithm developed in this article enable us to find approximate formulas for the expectation and variance of the number of segregating sites in a sample, and consequently to explore estimators of the mutation rate (or population size) and the selfing rate. THE THEORY Definitions: The essence of coalescent approach is that one starts with the sequences in a sample and traces backward in generations until the most recent common ancestor of the sample is found. To characterize the coalescent process, one need keep track of only the number of ancestral sequences of the sample from a random mating population, but for a partially selfing population, one need keep track of not only the number of ancestral sequences but also how the sequences are packed into ancestors, because the two ancestral sequences in one individual have different fates from two ancestral sequences from different individuals. Let the present generation of a population be designated as generation 0 and the time t generations earlier as generation t. Let k be the number of ancestral sequences and n be the number of ancestors into whom these k sequences are packed at generation t. We call ( k, n) the state of coalescence at generation t. In particular, we refer to (k, n) at generation 0, ie., the generation at which the sample was taken, as the sample con-

2 1490 Y.-X. FU figuration. For diploid organisms, as we assume in this paper, there are k - n ancestors whose both sequences are descended in the sample, and there are 2n - k ancestors with only one sequence descended in the sample. Obviously we must have n c: k 5 2n. An ancestor of the sequences in a sample (and simply referred to as an ancestor hereafter) is an individual at certain generation whose sequence (s) have descendant (s) in the sample. We distinguish two types of ancestors. An ancestor is a full ancestor if both of its sequences have descendant (s) in the sample; an ancestor is a half ancestm if only one of its sequences has descendant (s) in the sample. For convenience, we also treat the individuals in the sample as ancestors. It follows that when the state of coalescence at generation tis (k, n), there are k - n full ancestors and 2n - k half ancestors in that generation. To describe the coalescent process for a partially selfing population, it is convenient to divide ancestral sequences into groups that will be called effective sequences. We define each ancestral sequence at generation tin an ancestor that resulted from random mating as one effective sequence, and the ancestral sequence (s) in each ancestor that resulted from selfing as one effective sequence. Therefore, an effective sequence may represent one or two ancestral sequences. According to the definition, the two sequences of a full ancestor are each an effective sequence if the full ancestor is derived from random mating, but are counted as only one effective sequence if the full ancestor is derived from selfing. The reason for them being counted as only one effective sequence is that they have the same probability of coming from a particular individual of the previous generation as a single sequence resulting from random mating. For a random mating population, coalescent a event usually means that two ancestral sequences at a generation come from a single ancestral sequence at previous generation (e.g., HUDSON 1991). For a partially selfing population, we need to broaden the definition of a coalescent event and we recognize three different types of coalescent events. First, two effective sequences at generation t coalesce to a full ancestor at generation t + 1. That is, two effective sequences come from the two sequences of a full ancestor at the previous generation. Second, two effective sequences coalesce to a half ancestor at the previous generation. Third, the two sequences of a single effective sequence in an individual derived by selfing coalesce to a single ancestral sequence at the previous generation. We note that when two effective sequences each represent a single sequence, the second type of coalescent event defined above corresponds to the traditional definition of a coalescent event, but the first type of coalescent event defined above is not considered as a coalescent event for a random mating population. The third type of coalescent event is unique to a partially selfing population because it occurs only when a full ancestor is the product of selfing. Each of the first two types of coalescent events involves two effective sequences, while the third type involves only one effective sequence. For brevity, a coalescent event in this article means a coalescent event of either type one or type two, unless the third type of coalescent event is stated explicitly. Probability of thenumber of effective sequences: Suppose the state of coalescence at generation t is (k, n). Then we have k - n full ancestors at the generation. If i of the k - n full ancestors are results of selfing, the number of effective sequences is k - i according to our definition. Because the probability that a full ancestor is the product of selfing is equal to the selfing rate s, the probability that there are i full ancestors resulting from selfing is thus given by the binomial probability which is also the probability of having it - i effective sequences. Change of state when there is no coalescent event: Each ancestral sequence of the sample at generation t has the probability 1 / Nof coming from a particular individual at generation t + 1 regardless of whether the sequence is a result of random mating or a result of selfing. However, when the latter is true, the other allele of the same ancestor also comes from the same individual at generation t + 1. That is, each effective sequence has the probability 1 / N of coming from a particular individual of the previous generation. This result implies that once the effective sequences are determined, there is no need to consider if an effective sequence is from selfing or from random mating. Consequently, the probability of no coalescence in one generation between effective sequences, given there are k - i effective sequences at generation t, is... N- 1 N- 2 N- (k-i) + 1 N N N =I- (it- i)(k- i- 1) 2N + o(+). Neglecting terms of higher order, i.e., assuming the probability of more than one coalescent event is negligible, we have that the probability of no coalescent event between effective sequences is 1- (k- i)(k- 2-1) 2N The probability of one coalescent event is thus ( k - i) (k- i- 1)/(2N. The state (k, n) can change in one generation even when there is no coalescent event between effective sequences. The value of n will change because all full ancestors resulting from random mating change to two half ancestors at generation t + 1; on the other hand,

3 Coalescent Theory for a Population Selfing Partially 1491 the value of k may change if a full ancestor resulting from selfing becomes a half ancestor (i.e., a coalescent event of third type) at generation t + 1, which has probability 1/2. Given there are i full ancestors resulting from selfing, the probability that 1 of them become half ancestors in generation t + 1 is given by (i)($ and the number of ancestors at generation t + 1 becomes i + 2 ( k - n - i) + 2n - k = k - i. Therefore conditioning on there being i full ancestors from selfing at generation t, the probability that there is no coalescence between effective sequences and that state ( k, n) changes to (k - 1, k - i) at generation t + 1 is (k- i)(k- i- 1) P- 2N for0 5 1s is n. Change of statewhenthere is a coalescent event: When a coalescent event occurs between two effective sequences, two randomly chosen effective sequences coalesce to an ancestor at generation t + 1. Because an effective sequence may represent either one or two sequences, the two effective sequences may represent two, three, or four sequences. Let PI( 2 5 j 5 4) be the probability that they represent j sequences. Then conditioning on there being i full ancestors resulting from selfing, &, P, and p, are equal to (k-2i)(k-2i- 1) 2i(k - 2i) (k- i)(k- i- 1) ' (k- i)(k- i- 1) ' i(i- 1) (k- i)(k- i- 1) ' respectively. The ancestor of the two effective sequences at generation t + 1 may be a half ancestor or a full ancestor. The former event has the probability = (;)Ip1, because each of the j - 1 sequences has probability '/ sharing the same ancestral sequence of the first sequence. The latter event has the probability 1 - e-. We note that when there are i full ancestors resulting from selfing and there is a coalescent event, the number of ancestors at generation t + 1 becomes k - i - 1. There are two ways the number of sequences can be reduced from k to k - 1 given that the two effective sequences representj sequences. The first is that the two effective sequences coalesce to a full ancestor and 1 - j + 1 of the i - j + 2 full ancestors resulting from selfing ( j - 2 are already involved.( in coalescence between effective sequences) at generation t become half ancestors at generation t + 1. The probability of this event is i- j+ 2 - j + 1 ) ( $)l-i+z. The second is that the two effective sequences coalesce to a half ancestor and 1 - j + 2 of the full ancestors at generation t become half ancestors at generation t + 1. The probability of this event is (1-). ( - j j+ 2 ) (1)"". Therefore, conditioning on there being i full ancestors resulting from selfing, the joint probability that there is a coalescent event and that the state (k, n) changes to (k- 1,k- i- 1) is (k- i)(k- i- 1) i-j+2 2N - - ( y + l { (k - 22) ( k - 2i - 1) 2N 2 :) x [3(jI (;I;)] + + i(i- 1) [7(i-2) + ( 7 } (3) for i. Probability of a transition of state: Putting ( 1-3) together and noting that to change from n ancestors at generation t to k - i ancestors at generation t + 1 through a coalescent event, there must be i - 1 full ancestors resulting from selfing at generation t, we have that the probability of changing state (k, n) at generation t to state (k - l, k - i) at generation t + l is p[(k, n) -+ (k - 1, k - i)] = ( ; x [l- (k- i)(k- 2-1) 2N Si-l (1 - k-n-i+l- ") St( 1 - s)k--n-l x[(z;1)+(;+;)]+2(i-l)(k-2(i-l))

4 1492 Y.-X. FU for 0 I 1 5 iand 0 5 i 5 k - n + 1, with the convention that (;) = 0 whenever i < j or i < 0 or j < 0. The transition probability (4) and the sample configuration (k, n) completely specify the Markov chain for the coalescence process. When the number of ancestors changes from n to k - i (i = 0,..., k - n + l ), the number of sequences can be any value between k - i and k. Therefore the number of states to which ( k, n) can change in one generation is thus (k-n+2) - ( k- n+ 2)(k- n+ 3) - 2 We can see that the number of possible states of the Markov chain is considerably larger than that for a random mating population. Among the states, (k, k) is of special importance because once the Markov chain enters this state, it tends to stay there, because from (4) the probability that ( k, k) remains unchanged in one generation is 1- k(k - 1) 2N which is large unless k is very large. We also have k(k - 1) so the probability of entering the state ( k, k) increases with n. The probability that state (k, n) remains unchanged in one generation is (k- 2(k- n- l))(k- 2(k- n- 1) - 1) 2N which is smaller than 2 n-k. Simulation algorih A coalescent algorithm for simulating a genealogy of a sample starts with the sample configuration, then moves backward in time and changes state stochastically along the way until the most recent common ancestral sequence (MRCA) is found. Given that sample configuration is (k, n), the selfing rate is s and the effective population size is N, the analysis in the previous sections suggests that we can simulate a sample genealogy by the following steps: 1. Go to step 3 if there is at least one full ancestor, otherwise go to step Simulate the time length before the state (m, m) changes, then change it to the state (m, m - 1) or the state ( m - 1, m - 1 ) randomly and go to step Determine for each full ancestor whether it is an offspring from selfing, and thus determine each effective sequence. 4. Determine if there is a coalescent between effective sequences. 5. If there is coalescent event, randomly select two effective sequences and determine if they coalesce to a full ancestor or a half ancestor. 6. Determine for each full ancestor resulting from selfing whether a coalescent event of the third type occurs. 7. Update information and return to the first step if k > 1 and stop otherwise. We can see from the algorithm that most of the simulation cycles move one generation backward at a time unless the state is (m, (i.e., m) the number of ancestral sequences is equal to the number of ancestors). The time length before the state (m, changes m) follows approximately the exponential distribution 2N m(m - 1) m(m - 1) exp[ - 2N When it does change, it has equal probability changing to the state (m, m - 1) and the state ( m - 1, m - 1). The above algorithm simulates only the path from the MRCA to the sequences in the sample. To simulate a sample ofdna sequences from a partially selfing population, one can superimpose mutations in each branch of a simulated genealogy, as in the coalescent algorithm for simulating DNA samples from a random mating population. A C subroutine that generates a genealogy of a sample of any configuration is available upon request. Examining a few examples of the simulated genealogies of a sample from a partially selfing population should be interesting, and we give two examples in Figure 1. Figure la shows a simulated genealogy of a sample with (k, n) = (6, 3) from a population withlow selfing rate (s = 0.2), and Figure lb from a population with high selfing rate ( s = 0.9). In the case of low selfing rate, it happened that among the three full ancestors only one was an offspring of selfing, whose sequences coalesced to a single ancestral sequence at previous generation. After this event, the rest of the genealogy back to the MRCA was similar to that of a sample offive sequences from a completely random mating population. In the case of high selfing rate, it happened that all the three full ancestors were from selfing, in one generation two of them became half ancestors and the third one remained a full ancestor, who was also from selfing, after one more generation the full ancestor became a half ancestor and from then on the genealogy looked similar to that of a sample of three sequences from a random mating population. It is interesting to see that the number of generations during which there were three ancestral sequences (the so called three-coalescent time) in the case of low selfing rate is considerably larger than in the case of high selfing, and the same is true for the twocoalescent time. Although the stochastic nature of the coalescent process may account for the pattern, it turns out that this is expected, as we shall see later.

5 Coalescent Theory for a Partially Selfing Population 1493 FIGURE 1.-Simulated genealogies of a sample with (k, n) = (6, 3) from a partially selfing population with N = (a) 5 = 0.2 and (b) s = 0.9. Branch lengths are in terms of the numbers of generations. THE NUMBER OF SEGREGATINGSITES Let sk,n be the number of segregating sites in a sample of configuration (k, n). For a random mating population, it does not matter how many individuals the k sequences are from, and it is well known (WATTERSON 1975) that E(Sk,n) = ad (5) Var (Sk,n) = ako + bk02, (6) where 0 = 4Np where p is mutation rate per sequence per generation, and In the case of a completely selfing population, the two sequences of an full ancestor always come from a single ancestor and the mean time for them to coalesce to a single ancestral sequence is two generations. Since the coalescent between two effective sequences takes many more generations, the mean number Tk,n of generations in the sample genealogy is mainly due to the times between coalescent events between effective sequences. Therefore, Tk,n is approximately the mean time length of the genealogy of a sample of n sequences from a haploid population witheffective population size N, that is, Assuming the number of mutations in each branch of the sample genealogy follows a Poisson distribution with mean equal to the product of the mutation rate p and the branch length (generations), we then have E( Sk,n) = y2 ad, (9) where 0 = 4Np. For a partially selfing population, E( sk,$) must lie between (5) and (9). We note that Tk,n s satisfy the recurrent equation k-n+l i Tk,n = k + C E p[ (k, n) i=o 1=0 + ( k - I, k - i)] Tk-Lk-i. (10) In principle, one can solve these linear equations for the value of T( k, n). It is simple to do so in the case of two sequences because N- 1 1 T2.2 = - N T N T2,l + 2 and from which we have T2.1 = (1 - S) (4N) + 4s T2,2 = (1 - s ) ( ~ N ) + s(2n) + 2s. Neglecting the terms 2sp and 4sp, we have H &, l ) = (1 - S )Q E($,*) = [% S + (1- S)IQ, which can be written as E(&,1) = ry2 UlS + (1 - S)a210 E(s2,2) = r% a2s + (1 - S)%IO. 1 + S 2 T 2,l + 2,

6 1494 Y.-X. FU 3.0 r 2.5 (a) N = 500 o --- (k,n) = (10, 5) --- (kg) = (10,8) (k,n) = (10, 10) of Tk,n for several sample configurations and effective population sizes. We can see that ( 12) and simulations agree quite well, suggesting that (11) is an excellent approximation if not entirely correct. Since there is a linear relationship between E( Sk,n) and the selfing rate s, it is tempting to suggest - 9) w I.5 I I, o + [a& + bb'] (1 - S ), (13) but simulations show this equation overestimates the variance of sk,n for a partially selfing population. Since we assume the number of mutations in each branch of the sample genealogy follows a Poisson distribution, the variance of sk,n is equal to (b) N =5000 o --- (k,n) =(40,20) --- (k,n) = (40, 30) (k,n) = (40,40) Because E( Tk,n) is a linear function of s, it is likely that the coefficient, Var ( Tk,n/ (4@), of 6' is a quadratic function of s. I find that subtracting bns ( 1 - s) e'/ 4 from (13) results an excellent approximation to the variance (see Figure 3). Therefore, I suggest using the following equation: -8 g w ' o I I I I FIGURE 2.-Mean values of tree length ( TkJ (4N)) with respect to the selfing rate s. Points are simulation results and lines are given by Equation 12.20,OOO genealogies were simulated for each value of s for each sample configuration. These two equations, together with ( 5 ) and (9 ), suggest the following equation: E(Sk,n) = [% ans+ ak(1 - S)lO. (11) To verify this equation analytically does not appear simple, we thus turn to computer simulations for help. In our simulations, we used the algorithm developed earlier to generate genealogies of a sample of a given configuration. For each simulated genealogy, we recorded the value of TkSn and obtained the mean value of Tk,n after many replicates. Note that for the purpose of verifymg ( 11 ), there is no need to superimpose mutations to the genealogies, because if we show G,n/ (4N) = 7' ans + ak( 1 - S) 9 (12) then (1 1 ) must be correct. Figure 2 shows examples = [; ans + 1 ak(1 - s) 0 Figure 3 also shows that Var ( Tk,n) / ( 4p) differs little among different values of n. This reflects the fact that b, converge rapidly, and the difference between b, and bk is rather small. We therefore have approximate formulas for the mean and variance of For practical purposes, ( 11 ) and (14) may be sufficiently accurate, but since the number of segregating sites in a sample is an important quantity, it will be useful to obtain exact results for E( &n) and Var (&,) in the future. ESTIMATION OF 6 AND s Equation 11 can be used for estimating the mutation parameter 8 and the selfing rate s as follows. Let ( i, j ) and ( I, m) be different, i.e., at least different for one component. We then have

7 Coalescent Theory for a Population Selfing Partially 1495 where O1 = s13 and O2 = ( 1 - s) 8. Solving these equations yields (kp) = (IO, 5) --- (k,n) = (10, 8) 1.2 c (k,n) = (IO, 10) Since 0 = e, + e2 we have &,)I - (aj- uj) E( (17) 0.0 I I, I J I.o S= ale(sij) - aje(sl,rn) (al- /2am) E(Sj,) - (ai - /2 ai) E(Sl,m) (18) The implication of these equations is that when a sample of configuration (k, n) is taken, one can resample the sample in two different ways, and when substituting E( SjJ) and E( Sl,,) by appropriate sample means, we obtain an estimator of 6 and an estimator of s. A simple scheme is found by setting (i, j) = (2, 2) and (I, m) = (2, 1 ), ie., only two sequences are considered at a time. Let &,2 and &,, be, respectively, the average numbers of nucleotide differences between and within individuals. Then substituting &,2 for E( &,2) and &,, for E( &,,), we have the following pair of estimators: We note that 8, is identical to MILLIGAN S (1996) estimator of 6 while 3, differs from MILLIGAN S (1996) estimator of s by a factor N/ (N- 1 ). Since N/ (N- 1 ) is usually very close to 1, fm is essentially the same estimator as MILLIGAN S. Because of ( 11 ), 8 is nearly unbiased estimator. An alternative scheme of estimation is to use the pair and &,l, that is, the number of segregating sites in the sample and the mean number of nucleotide differences within individual. From (17) and (18), we have It is easy to show using ( 11 ) that 8, is nearly unbiased. Using simulated samples, we found ( 11 ) is indeed unbiased. However, our simulation study shows that fez is 0.0 I (b) N = (k,n) = (40, 20) --- (k,n) = (40, 30) (k,n) =(40,40) I I I I.o FIGURE 3.-Coefficient of O2 in V( Sh,n). Points are simulation results, the straight lines are given by bn/4s + bk ( 1 - s) and the curves are given by bk ( 1 - s) + 6,s2/4. 20,000 genealogies were simulated for each value of sfor each sample configuration. not a very reliable estimator, so we shall not discuss it further. One problem with estimators Tm and Om is that they do not guarantee positive values as they ought to do. Table 1 shows the estimated probabilities of having negative values of f, and 8, for several sample configurations. The chance of having a negative value for 8, is not high but the same cannot be said for &. Note that in the case of random mating, one may expect that f, takes negative values about half of the time because the expected values of &,2 and s,l are both equal to zero. Table 1 shows that fm is more likely to be negative than positive. An examination of its empirical distribution shows that it is skewed to the left although its mean is zero.

8 1496 Y.-X. FU TABLE 1 Percentage of samples resulting in negative i, and 8. withn=5000and8=5 (20, 10) (40, 20) (80, 40) Each row is based on 20,000 independently simulated samples. In fact there is also no guarantee that 8, is positive because &,2 can be smaller than half of &,] although the chance is small. Our simulations show that this does happen and when it happens, f, can be substantially larger than 1. Since &,* is expected to be larger than $,1 when there is partial selfing, having inequality $,2 < / 2 suggests that s = 0 instead of s = 1. Therefore, I propose to use the following estimator of s: Q g 1.o Q 3.5 > I I I I o 40.0 r (b) e= selfing rate s 20.0 To remedy e, we note that when 8, is negative, there must be at least one segregating site in the sample, which suggests that 8 must be larger than 0. Therefore, we can use as an estimator of 8, that is, estimating 8 by e,, if 8, > 0, and estimating 8 by WATTERSON S estimate otherwise. Equation 11 provides yet another estimator of 8 as Our simulations show that this estimator is also nearly unbiased. A variation of this estimator is to substitute $ by im given by (20), and the performance of the re- sulting estimator is nearly the same as g,, although f, may be less than 0 or larger than 1. Figure 4 shows the sampling variances of the three estimators of 8 for a sample of size 20 and 60, respectively. It is clear that MILLIGAN S estimator 8, has the largest variance among the three, 8, has the smallest variance; when s is small, 6, has smaller variance than 8, but the two converge when s is large, and the speed of convergence increases with 8. Sinceall the three estimators are nearly unbiased, the one with smallest 10.0 I I I I.o selfing rate s FIGURE 4.-Variances of the three estimators of 0. Curves with unfilled symbols (0, 0, and A) correspond to samples of size20, and curveswithfilledsymbols (0, H, and A) correspond to samples of size 60. Curves with circles, squares and triangles correspond to Ow, 0, and Om, respectively. Results for each parameter set are based on 20,000 independent samples from a population with N = variance should be preferred. I thus recommend that 8, be the first choice for estimating 8. Figure 5 shows that the means and variances of the two estimators of s for several parameter sets. One can see (Figure 5a) that fm is slightly biased downward. It is interesting to see (Figure 5c) that S, is overall less biased than im. The major improvement of $over fm is the variance, which is significantly smaller than that of &, particularly when s is small. Therefore, $is recommended as the first choice. Figure 5 also shows that increasing either sample size or the value of 8 reduces the variance of estimation.

9 Coalescent Theory for a Partially Selfing Population 1497 I.o e, c g c) h: e, s.- C Q 0 8 >.% 0.1 U I.o 2 c) Q c h: o (k,n) = (20,lO), 8 = 1 o (k,n) = (60,30), 8 = 1 (k,n) = (20,10), 8 = 10 0 (k,n) = (60,30), 8 = I. I.. I. _ I *, FIGURE 5.-The mean and variance of estimators of s. (a and b) Results of fm. (c and d) Results of 4. All the panels use the sample legends as d. Results for each parameter set are based on 20,000 independent samples from a population with N = Comparing the variances of sr/ in the two cases ( k, n) =(60,30),8=1and(k,n)=(20,10),8=10,itis clear that increasing sample size is a more effective way of reducing the sampling variance of 9. It should be pointed out that the variances of fm shown in Figure 5 are substantially smaller than those shown in Figure 5 of MILLIGAN ( 1996). There are two reasons for this discrepancy. First, MILLIGAN'S results were based on independent samples of two sequences, which inflates the difference between sequences from different individuals in a larger sample due to shared common ancestry. Second, MILLIGAN (1996) appeared to use the same number of between and within individual pairs to compute $,2 and (Yb and yw in MILLI- GAN'S notation) and because there are many more between individual pairs than within individual pairs, doing so necessarily results in loss of information and thus inflates the variance of estimation. MILLIGAN (1996) recognized the limitations of his simulations and predicted that the true variance of S, may be substantially smaller, as is indeed the case shown here. Finally we would like to examine how sample configurations affect the estimation of 8 and s. Given the number (k) of sequences in a sample, one may obtain these k sequences from k/ 2 individuals, or k/ 2 + 1,..., or k individuals. If one obtains the sample from k/ 2 individuals (assuming k is a even number), then both alleles of an individual would have been se-

10 1498 Y.-X. FU a, me (a) Estimation of s IlO (b) Estimation of I.o n=30 n =40 n=50 FIGURE 6.-Relationship between the variance of an estimator and sample configuration (k, n) with k = 60, N = 50>0 and 6 = 5. (a) The variance of f,. (b) The variance of 8,. 10,000 independent samples were generated for each value of s and each value of n. quenced, while if the sample is from k individuals, one individual would have contributed only one sequence. Figure 6 shows how sample configurations affect the variances in the estimation of 0 and s. Figure 6a shows that sampling both alleles of each individual is the best scheme for the purpose of estimating s. This appears logical because doing so increases the number of within individual comparisons and thus reduces the variance of $,,, which is much more variable than $,2 unless s is close to 1. Figure 6b shows the variance of 8, and it is clear that it is better to sample the sequences from fewer individuals when s is small, obviously due to smaller variance in the estimate of s. However, it is better to sample more individuals when s is close to 1, because f, has very small variance and increasing number of individuals is equivalent to increasing the sample size. Overall, the differences among the variances of 8, are not as substantial as those among the variances of $for different sample configurations. DISCUSSION The coalescent approach is a powerful way to study the evolution of a population. The development of the coalescent theory and its applications have been centered around animal populations in which random mating is often assumed. With the advent of fast and inexpensive techniques for obtaining DNA sequences, DNA samples from plant populations will become more abundant in the future. The coalescent theory for partially selfing populations in this article, aswellas in NORDBORG and DONNELLY ( 1997), is a step forward in providing a powerful modern framework for studying plant populations. The coalescent algorithm for generating samples of DNA sequences enables one to obtain efficiently a large number of simulated samples for an empirical study of a partially selfing population. We derived in this article the transition probabilities of sample configurations in two successive generations, and our simulation algorithm is based on these transition probabilities. A reviewer suggested that a simpler, but an approximate, algorithm can be derived from the fact that coalescence in selfing individuals is rapid and the number of generations required can be neglected. The algorithm by NORDBORC and DONNELLY (1997) is essentially based on such approach. However, deriving a simulation algorithm from transition probabilities of sample configurations may be necessary (and safer) when a more complex genetic model, for example involving recombination and selection, is studied. Therefore, the framework established in this article should benefit further theoretical investigations on partially selfing populations. The number of segregating sites in a sample is the simplest quantity observable, yet highly informative. The first fruit of the coalescent theory and the algorithm developed in this article is the equations for the mean and variance of the number of segregating sites. Given the complexity of these two quantities in a number of population genetic models, such as those involving recombination or population subdivision, it is a pleasant surprise that the mean and variance of the number of segregating sites are simple functions of 0 and s, and when s = 0, they reduce to WATTEMON S (1975) classical results for a random mating population. We took advantage of the mean and the variance of the number of segregating sites and the coalescent algorithm to develop two new estimators of 0 and a new estimator of s. The best estimator of 0 found in this article is the one that is analogous to WATTERSON S (1975) estimator of 0 for a random mating population. Our new estimator of s, which is a modification of MILLIGAN S estimator, not only provides an estimate that is meaningful but has substantially smaller variance. As far as estimating the selfing rate is concerned, this study shows that one can be more optimistic than MILLIGAN ( 1996), who concluded that a relatively large

11 Coalescent for Theory a Population Selfing Partially 1499 number of individuals ( >loo) is needed to achieve reasonable accuracy in the estimation of s. We show that the selfing rate s can be estimated reasonably well even with a sample of 20 sequences from 10 individuals. Since it is found in this article that increasing sample size is a more effective way of reducing the variance in the estimate of s than increasing sequence length, I recommend that obtaining a large random sample be assigned a high priority when designing an experiment. It should be noted that the estimators of 0 and s developed in this article are based on a few summary statistics of a sample and thus do not make full use of available information. Consequently they are unlikely to be the best estimators that can be developed, although they are easy to compute and reasonably accurate. Since significantly better estimators of 0 than WATTEMON S have been found (e.g., FU 1994a,b; KUHNER et al. 1995; GRIFFITHS and TAVARG 1995) in the case of random mating populations, developing more efficient estimators of 0 and s for a sample from a partially selfing population by making full use of available information in the sample should be worthy of further effort. I thank Dr. M. UYENOYAMA and a reviewer for their comments and suggestions. I also thank Drs. NORDBORG and DONNELLY for sending me a copy of their manuscript while this manuscript was being reviewed. This research was supported in part by National Institutes of Health grant R29 GM LITERATURE CITED FELSENSTEIN, J., 1992 Estimating effective population size from samples of sequences: inefficiency of pairwise and segregation sites as compared to phylogenetic estimates. Genet. Res. 56: FRYXELL, P. A,, 1957 Mode of reproduction of higher plants. Bot. Rev. 23: Fu, Y. X., 1994a Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. Genetics 138: Fu, Y. X., 1994b A phylogenetic estimator of effective population size or mutation rate. Genetics 136: Fu, Y. X., and W. H. LI, 1993 Maximum likelihood estimation of population parameters. Genetics 134: GRIFFITHS, R. C., and S. TAVAR~, 1995 Unrooted genealogical tree probabilities in the ininitely-many-sites model. Math. Biosci. 127: HUDSON, R. R., 1982 Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: HUDSON, R. R., 1991 Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7: KINGMAN, J. F. C., 1982a The coalescent. Stochastic Processes and Their Applications. 13: KINGMAN, J. F. C., 1982b On the genealogy of large populations. J. Appl. Probab. 19A KUHNER, M. IC, Y. YAMATO and J. FELSENSTEIN, 1995 Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140: MILLIGAN, B. G., 1996 Estimating long-term mating systems using DNA sequence. Genetics 142: NORDBORG, M., and P. DONNEILY, 1997 The coalescent process with selfing. Genetics 146: SLATKIN, M., 1991 Inbreeding coefficients and coalescent times. Genet. Res. 58: TAJIMA, F., 1983 Evolutionary relationship of DNA sequences in finite populations. Genetics 105: WATTERSON, G. A,, 1975 On the number of segregation sites. Theor. Popul. Biol. 7: WILLSON, M. F., 1984 Mating patterns in plants, pp in Perspectives on Plant Pqpulation Ecology, edited by R. DIRZO and J. SARUKHAN. Sinauer Associates, Sunderland, MA. WRIGHT, S., 1969 Evolution and the Genetics ofpopulations, The Theq of GeneFrequacies. Vol. 2, The University of Chicago Press, Chicago. Communicating editor: M. K UWNOYAMA

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application

More information

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times The coalescent The genealogical history of a population The coalescent process Identity by descent Distribution of pairwise coalescence times Adding mutations Expected pairwise differences Evolutionary

More information

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS

MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS MOLECULAR POPULATION GENETICS: COALESCENT METHODS BASED ON SUMMARY STATISTICS Daniel A. Vasco*, Keith A. Crandall* and Yun-Xin Fu *Department of Zoology, Brigham Young University, Provo, UT 8460, USA Human

More information

Analysis of geographically structured populations: Estimators based on coalescence

Analysis of geographically structured populations: Estimators based on coalescence Analysis of geographically structured populations: Estimators based on coalescence Peter Beerli Department of Genetics, Box 357360, University of Washington, Seattle WA 9895-7360, Email: beerli@genetics.washington.edu

More information

Forward thinking: the predictive approach

Forward thinking: the predictive approach Coalescent Theory 1 Forward thinking: the predictive approach Random variation in reproduction causes random fluctuation in allele frequencies. Can describe this process as diffusion: (Wright 1931) showed

More information

Comparative method, coalescents, and the future

Comparative method, coalescents, and the future Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/36 Correlation of

More information

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Magnus Nordborg University of Southern California The importance of history Genetic polymorphism data represent the outcome

More information

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/28 Correlation of

More information

Population Structure and Genealogies

Population Structure and Genealogies Population Structure and Genealogies One of the key properties of Kingman s coalescent is that each pair of lineages is equally likely to coalesce whenever a coalescent event occurs. This condition is

More information

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Vineet Bafna Harish Nagarajan and Nitin Udpa 1 Disclaimer Please note that a lot of the text and figures here are copied from

More information

Coalescent Theory: An Introduction for Phylogenetics

Coalescent Theory: An Introduction for Phylogenetics Coalescent Theory: An Introduction for Phylogenetics Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University lkubatko@stat.ohio-state.edu

More information

2 The Wright-Fisher model and the neutral theory

2 The Wright-Fisher model and the neutral theory 0 THE WRIGHT-FISHER MODEL AND THE NEUTRAL THEORY The Wright-Fisher model and the neutral theory Although the main interest of population genetics is conceivably in natural selection, we will first assume

More information

Population genetics: Coalescence theory II

Population genetics: Coalescence theory II Population genetics: Coalescence theory II Peter Beerli August 27, 2009 1 The variance of the coalescence process The coalescent is an accumulation of waiting times. We can think of it as standard queuing

More information

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA Population Genetics using Trees Peter Beerli Genome Sciences University of Washington Seattle WA Outline 1. Introduction to the basic coalescent Population models The coalescent Likelihood estimation of

More information

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000 Coalescent Theory Magnus Nordborg Department of Genetics, Lund University March 24, 2000 Abstract The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous

More information

Ancestral Recombination Graphs

Ancestral Recombination Graphs Ancestral Recombination Graphs Ancestral relationships among a sample of recombining sequences usually cannot be accurately described by just a single genealogy. Linked sites will have similar, but not

More information

Viral epidemiology and the Coalescent

Viral epidemiology and the Coalescent Viral epidemiology and the Coalescent Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, and Departments of Biomathematics and Human Genetics David Geffen School

More information

BIOL Evolution. Lecture 8

BIOL Evolution. Lecture 8 BIOL 432 - Evolution Lecture 8 Expected Genotype Frequencies in the Absence of Evolution are Determined by the Hardy-Weinberg Equation. Assumptions: 1) No mutation 2) Random mating 3) Infinite population

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

STAT 536: The Coalescent

STAT 536: The Coalescent STAT 536: The Coalescent Karin S. Dorman Department of Statistics Iowa State University November 7, 2006 Wright-Fisher Model Our old friend the Wright-Fisher model envisions populations moving forward

More information

Inbreeding and self-fertilization

Inbreeding and self-fertilization Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that I went over a couple of lectures ago? Well, we re about

More information

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling Mary K. Kuhner, Jon Yamato, and Joseph Felsenstein Department of Genetics, University of Washington

More information

MODERN population genetics is data driven and

MODERN population genetics is data driven and Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.092460 Note Extensions of the Coalescent Effective Population Size John Wakeley 1 and Ori Sargsyan Department of Organismic

More information

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Part I Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Chapter 1 The Coalescent of Bacterial Populations Mikkel H. Schierup and Carsten Wiuf 1.1 BACKGROUND AND MOTIVATION Recent

More information

Inbreeding and self-fertilization

Inbreeding and self-fertilization Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that we just finished? Well, we re about to begin violating

More information

The Two Phases of the Coalescent and Fixation Processes

The Two Phases of the Coalescent and Fixation Processes The Two Phases of the Coalescent and Fixation Processes Introduction The coalescent process which traces back the current population to a common ancestor and the fixation process which follows an individual

More information

The Coalescent. Chapter Population Genetic Models

The Coalescent. Chapter Population Genetic Models Chapter 3 The Coalescent To coalesce means to grow together, to join, or to fuse. When two copies of a gene are descended from a common ancestor which gave rise to them in some past generation, looking

More information

Bioinformatics I, WS 14/15, D. Huson, December 15,

Bioinformatics I, WS 14/15, D. Huson, December 15, Bioinformatics I, WS 4/5, D. Huson, December 5, 204 07 7 Introduction to Population Genetics This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at the Australian

More information

Kinship and Population Subdivision

Kinship and Population Subdivision Kinship and Population Subdivision Henry Harpending University of Utah The coefficient of kinship between two diploid organisms describes their overall genetic similarity to each other relative to some

More information

TREES OF GENES IN POPULATIONS

TREES OF GENES IN POPULATIONS 1 TREES OF GENES IN POPULATIONS Joseph Felsenstein Abstract Trees of ancestry of copies of genes form in populations, as a result of the randomness of birth, death, and Mendelian reproduction. Considering

More information

Chapter 4 Neutral Mutations and Genetic Polymorphisms

Chapter 4 Neutral Mutations and Genetic Polymorphisms Chapter 4 Neutral Mutations and Genetic Polymorphisms The relationship between genetic data and the underlying genealogy was introduced in Chapter. Here we will combine the intuitions of Chapter with the

More information

Research Article The Ancestry of Genetic Segments

Research Article The Ancestry of Genetic Segments International Scholarly Research Network ISRN Biomathematics Volume 2012, Article ID 384275, 8 pages doi:105402/2012/384275 Research Article The Ancestry of Genetic Segments R B Campbell Department of

More information

Chapter 12 Gene Genealogies

Chapter 12 Gene Genealogies Chapter 12 Gene Genealogies Noah A. Rosenberg Program in Molecular and Computational Biology. University of Southern California, Los Angeles, California 90089-1113 USA. E-mail: noahr@usc.edu. Phone: 213-740-2416.

More information

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS Adv. Appl. Prob. 31, 1027 1035 (1999) Printed in Northern Ireland Applied Probability Trust 1999 DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS It is a pleasure to be able to comment

More information

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor Kenneth Nordtvedt Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor (TMRCA) tool to estimate how far back in time the common ancestor existed for two Y-STR haplotypes obtained

More information

arxiv: v1 [q-bio.pe] 4 Mar 2013

arxiv: v1 [q-bio.pe] 4 Mar 2013 Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees arxiv:1303.0673v1 [q-bio.pe] 4 Mar 2013 Sha Zhu 1,, James H Degnan 2 and Bjarki Eldon 3 1

More information

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract

Ioanna Manolopoulou and Brent C. Emerson. October 7, Abstract Phylogeographic Ancestral Inference Using the Coalescent Model on Haplotype Trees Ioanna Manolopoulou and Brent C. Emerson October 7, 2011 Abstract Phylogeographic ancestral inference is a question frequently

More information

Pedigree Reconstruction using Identity by Descent

Pedigree Reconstruction using Identity by Descent Pedigree Reconstruction using Identity by Descent Bonnie Kirkpatrick Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-43 http://www.eecs.berkeley.edu/pubs/techrpts/2010/eecs-2010-43.html

More information

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48 Coalescents p.1/48 Coalescents Joe Felsenstein GENOME 453, Autumn 2015 Coalescents p.2/48 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C.

More information

The Coalescent Model. Florian Weber

The Coalescent Model. Florian Weber The Coalescent Model Florian Weber 23. 7. 2016 The Coalescent Model coalescent = zusammenwachsend Outline Population Genetics and the Wright-Fisher-model The Coalescent on-constant population-sizes Further

More information

UNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing

UNDERSTANDING the genealogical relationship finite for any sample size. But, even positions sharing Copyright 1999 by the Genetics Society of America The Ancestry of a Sample of Sequences Subject to Recombination Carsten Wiuf and Jotun Hein Institute of Biological Sciences, University of Aarhus, DK-8000

More information

Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling

Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling Copyright 0 1995 by the Genetics Society of America Estimating Effective Population Size and Mutation Rate From Sequence Data Using Metropolis-Hastings Sampling Mary K. Kuhner, Jon Yarnato and Joseph Felsenstein

More information

Approximating the coalescent with recombination

Approximating the coalescent with recombination Approximating the coalescent with recombination Gilean A. T. McVean* and Niall J. Cardin 360, 1387 1393 doi:10.1098/rstb.2005.1673 Published online 7 July 2005 Department of Statistics, 1 South Parks Road,

More information

GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS

GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Noah A. Rosenberg and Magnus Nordborg Improvements in genotyping technologies have led to the increased use of genetic polymorphism

More information

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations K. Stachowicz 12*, A. C. Sørensen 23 and P. Berg 3 1 Department

More information

Decrease of Heterozygosity Under Inbreeding

Decrease of Heterozygosity Under Inbreeding INBREEDING When matings take place between relatives, the pattern is referred to as inbreeding. There are three common areas where inbreeding is observed mating between relatives small populations hermaphroditic

More information

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39 Coalescents Joe Felsenstein GENOME 453, Winter 2007 Coalescents p.1/39 Cann, Stoneking, and Wilson Becky Cann Mark Stoneking the late Allan Wilson Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial

More information

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes. Simulated gene genealogy of a sample of size 50 from a population of constant size The History of Population Size from Whole Genomes Alan R Rogers October 1, 2018 Short terminal branches; long basal ones

More information

#A13 INTEGERS 15 (2015) THE LOCATION OF THE FIRST ASCENT IN A 123-AVOIDING PERMUTATION

#A13 INTEGERS 15 (2015) THE LOCATION OF THE FIRST ASCENT IN A 123-AVOIDING PERMUTATION #A13 INTEGERS 15 (2015) THE LOCATION OF THE FIRST ASCENT IN A 123-AVOIDING PERMUTATION Samuel Connolly Department of Mathematics, Brown University, Providence, Rhode Island Zachary Gabor Department of

More information

Gene Genealogy in Three Related Populations: Consistency Probability Between Gene and Population Trees

Gene Genealogy in Three Related Populations: Consistency Probability Between Gene and Population Trees Copyright 0 989 by the Genetics Society of America Gene Genealogy in Three Related Populations: Consistency Probability Between Gene and Population Trees Naoyuki Takahata National Institute of Genetics,

More information

Meek DNA Project Group B Ancestral Signature

Meek DNA Project Group B Ancestral Signature Meek DNA Project Group B Ancestral Signature The purpose of this paper is to explore the method and logic used by the author in establishing the Y-DNA ancestral signature for The Meek DNA Project Group

More information

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London. Kinship/relatedness David Balding Professor of Statistical Genetics University of Melbourne, and University College London 2 Feb 2016 1 Ways to measure relatedness 2 Pedigree-based kinship coefficients

More information

Your mtdna Full Sequence Results

Your mtdna Full Sequence Results Congratulations! You are one of the first to have your entire mitochondrial DNA (DNA) sequenced! Testing the full sequence has already become the standard practice used by researchers studying the DNA,

More information

Chapter 2: Genes in Pedigrees

Chapter 2: Genes in Pedigrees Chapter 2: Genes in Pedigrees Chapter 2-0 2.1 Pedigree definitions and terminology 2-1 2.2 Gene identity by descent (ibd) 2-5 2.3 ibd of more than 2 genes 2-14 2.4 Data on relatives 2-21 2.1.1 GRAPHICAL

More information

Behavioral Adaptations for Survival 1. Co-evolution of predator and prey ( evolutionary arms races )

Behavioral Adaptations for Survival 1. Co-evolution of predator and prey ( evolutionary arms races ) Behavioral Adaptations for Survival 1 Co-evolution of predator and prey ( evolutionary arms races ) Outline Mobbing Behavior What is an adaptation? The Comparative Method Divergent and convergent evolution

More information

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

More information

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A., Change in Recessive Lethal Alleles Frequency in Inbred Populations arxiv:1304.2955v1 [q-bio.pe] 10 Apr 2013 Arindam RoyChoudhury Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

More information

6.047/6.878 Lecture 21: Phylogenomics II

6.047/6.878 Lecture 21: Phylogenomics II Guest Lecture by Matt Rasmussen Orit Giguzinsky and Ethan Sherbondy December 13, 2012 1 Contents 1 Introduction 3 2 Inferring Orthologs/Paralogs, Gene Duplication and Loss 3 2.1 Species Tree..............................................

More information

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding

DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding DNA Basics, Y DNA Marker Tables, Ancestral Trees and Mutation Graphs: Definitions, Concepts, Understanding by Dr. Ing. Robert L. Baber 2014 July 26 Rights reserved, see the copyright notice at http://gengen.rlbaber.de

More information

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory Prev Sci (2007) 8:206 213 DOI 10.1007/s11121-007-0070-9 How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory John W. Graham & Allison E. Olchowski & Tamika

More information

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II

Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre II Tópicos Depto. Ciencias Biológicas, UniAndes Profesor Andrew J. Crawford Semestre 29 -II Lab Coalescent simulation using SIMCOAL 17 septiembre 29 Coalescent theory provides a powerful model

More information

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger

POPULATION GENETICS: WRIGHT FISHER MODEL AND COALESCENT PROCESS. Hailong Cui and Wangshu Zhang. Superviser: Prof. Quentin Berger POPULATIO GEETICS: WRIGHT FISHER MODEL AD COALESCET PROCESS by Hailong Cui and Wangshu Zhang Superviser: Prof. Quentin Berger A Final Project Report Presented In Partial Fulfillment of the Requirements

More information

Evaluating the performance of likelihood methods for. detecting population structure and migration

Evaluating the performance of likelihood methods for. detecting population structure and migration Molecular Ecology (2004) 13, 837 851 doi: 10.1111/j.1365-294X.2004.02132.x Evaluating the performance of likelihood methods for Blackwell Publishing, Ltd. detecting population structure and migration ZAID

More information

Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot

Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot Exploring the Demographic History of DNA Sequences Using the Generalized Syline Plot Korbinian Strimmer and Oliver G. Pybus Department of Zoology, University of Oxford We present an intuitive visual framewor,

More information

Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks

Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks Ka Hung Hui, Dongning Guo and Randall A. Berry Department of Electrical Engineering and Computer Science Northwestern

More information

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example. nbreeding depression in corn nbreeding Alan R Rogers Two plants on left are from inbred homozygous strains Next: the F offspring of these strains Then offspring (F2 ) of two F s Then F3 And so on November

More information

Lecture 6: Inbreeding. September 10, 2012

Lecture 6: Inbreeding. September 10, 2012 Lecture 6: Inbreeding September 0, 202 Announcements Hari s New Office Hours Tues 5-6 pm Wed 3-4 pm Fri 2-3 pm In computer lab 3306 LSB Last Time More Hardy-Weinberg Calculations Merle Patterning in Dogs:

More information

LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS

LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS ABSTRACT The recent popularity of genetic algorithms (GA s) and their application to a wide range of problems is a result of their

More information

The Structure of Genealogies and the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations

The Structure of Genealogies and the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations Copyright 0 1991 by the Genetics Society of America The Structure of Genealogies the Distribution of Fixed Differences Between DNA Sequence Samples From Natural Populations Department of Biological Sciences,

More information

5 Inferring Population

5 Inferring Population 5 Inferring Population History and Demography While population genetics was a very theoretical discipline originally, the modern abundance of population genetic data has forced the field to become more

More information

The Problem. Tom Davis December 19, 2016

The Problem. Tom Davis  December 19, 2016 The 1 2 3 4 Problem Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles December 19, 2016 Abstract The first paragraph in the main part of this article poses a problem that can be approached

More information

Coalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA

Coalescent Likelihood Methods. Mary K. Kuhner Genome Sciences University of Washington Seattle WA Coalescent Likelihood Methods Mary K. Kuhner Genome Sciences University of Washington Seattle WA Outline 1. Introduction to coalescent theory 2. Practical example 3. Genealogy samplers 4. Break 5. Survey

More information

The African Origin Hypothesis What do the data tell us?

The African Origin Hypothesis What do the data tell us? The African Origin Hypothesis What do the data tell us? Mitochondrial DNA and Human Evolution Cann, Stoneking and Wilson, Nature 1987. WOS - 1079 citations Mitochondrial DNA and Human Evolution Cann, Stoneking

More information

Exercise 4 Exploring Population Change without Selection

Exercise 4 Exploring Population Change without Selection Exercise 4 Exploring Population Change without Selection This experiment began with nine Avidian ancestors of identical fitness; the mutation rate is zero percent. Since descendants can never differ in

More information

U among relatives in inbred populations for the special case of no dominance or

U among relatives in inbred populations for the special case of no dominance or PARENT-OFFSPRING AND FULL SIB CORRELATIONS UNDER A PARENT-OFFSPRING MATING SYSTEM THEODORE W. HORNER Statistical Laboratory, Iowa State College, Ames, Iowa Received February 25, 1956 SING the method of

More information

CONGEN. Inbreeding vocabulary

CONGEN. Inbreeding vocabulary CONGEN Inbreeding vocabulary Inbreeding Mating between relatives. Inbreeding depression Reduction in fitness due to inbreeding. Identical by descent Alleles that are identical by descent are direct descendents

More information

Coalescent genealogy samplers: windows into population history

Coalescent genealogy samplers: windows into population history Review Coalescent genealogy samplers: windows into population history Mary K. Kuhner Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA 98195-5065, USA Coalescent genealogy

More information

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from?

Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from? Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from? 28 July 2010. Joe Felsenstein Evening At The Genome Mitochondrial Eve and Y-chromosome Adam: Who do your genes come from? p.1/39 Evolutionary

More information

NON-RANDOM MATING AND INBREEDING

NON-RANDOM MATING AND INBREEDING Instructor: Dr. Martha B. Reiskind AEC 495/AEC592: Conservation Genetics DEFINITIONS Nonrandom mating: Mating individuals are more closely related or less closely related than those drawn by chance from

More information

Bottlenecks reduce genetic variation Genetic Drift

Bottlenecks reduce genetic variation Genetic Drift Bottlenecks reduce genetic variation Genetic Drift Northern Elephant Seals were reduced to ~30 individuals in the 1800s. Rare alleles are likely to be lost during a bottleneck Two important determinants

More information

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting Theoretical Population Biology 75 (2009) 33 345 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb An approximate likelihood for genetic

More information

ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS

ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS ORIGINAL ARTICLE doi:1.1111/j.1558-5646.7.8.x ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS Eric C. Anderson 1, and Montgomery Slatkin 3,4 1 Fisheries Ecology Division, Southwest

More information

Generalized Game Trees

Generalized Game Trees Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game

More information

The DNA Case for Bethuel Riggs

The DNA Case for Bethuel Riggs The DNA Case for Bethuel Riggs The following was originally intended as an appendix to Alvy Ray Smith, Edwardian Riggses of America I: Elder Bethuel Riggs (1757 1835) of Morris County, New Jersey, and

More information

CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS

CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS 170 CHAPTER 9 THE EFFECTS OF GAUGE LENGTH AND STRAIN RATE ON THE TENSILE PROPERTIES OF REGULAR AND AIR JET ROTOR SPUN COTTON YARNS 9.1 INTRODUCTION It is the usual practise to test the yarn at a gauge

More information

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007 3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 53, NO 10, OCTOBER 2007 Resource Allocation for Wireless Fading Relay Channels: Max-Min Solution Yingbin Liang, Member, IEEE, Venugopal V Veeravalli, Fellow,

More information

Chapter 5 - Elementary Probability Theory

Chapter 5 - Elementary Probability Theory Chapter 5 - Elementary Probability Theory Historical Background Much of the early work in probability concerned games and gambling. One of the first to apply probability to matters other than gambling

More information

Probabilities and Probability Distributions

Probabilities and Probability Distributions Probabilities and Probability Distributions George H Olson, PhD Doctoral Program in Educational Leadership Appalachian State University May 2012 Contents Basic Probability Theory Independent vs. Dependent

More information

FIBER OPTICS. Prof. R.K. Shevgaonkar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture: 22.

FIBER OPTICS. Prof. R.K. Shevgaonkar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture: 22. FIBER OPTICS Prof. R.K. Shevgaonkar Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture: 22 Optical Receivers Fiber Optics, Prof. R.K. Shevgaonkar, Dept. of Electrical Engineering,

More information

TenMarks Curriculum Alignment Guide: EngageNY/Eureka Math, Grade 7

TenMarks Curriculum Alignment Guide: EngageNY/Eureka Math, Grade 7 EngageNY Module 1: Ratios and Proportional Relationships Topic A: Proportional Relationships Lesson 1 Lesson 2 Lesson 3 Understand equivalent ratios, rate, and unit rate related to a Understand proportional

More information

4.7 k V C 10 V I B. (b) V ma V. 3.3 k ma. (c)

4.7 k V C 10 V I B. (b) V ma V. 3.3 k ma. (c) 380 Chapter 6 Bipolar Junction Transistors (BJTs) Example 6.4 Consider the circuit shown in Fig. 6., which is redrawn in Fig. 6. to remind the reader of the convention employed throughout this book for

More information

Laboratory 1: Uncertainty Analysis

Laboratory 1: Uncertainty Analysis University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can

More information

Pixel Response Effects on CCD Camera Gain Calibration

Pixel Response Effects on CCD Camera Gain Calibration 1 of 7 1/21/2014 3:03 PM HO M E P R O D UC T S B R IE F S T E C H NO T E S S UP P O RT P UR C HA S E NE W S W E B T O O L S INF O C O NTA C T Pixel Response Effects on CCD Camera Gain Calibration Copyright

More information

Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles?

Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles? Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles? Andrew C. Thomas December 7, 2017 arxiv:1107.2456v1 [stat.ap] 13 Jul 2011 Abstract In the game of Scrabble, letter tiles

More information

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent

Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent Advanced data analysis in population genetics Likelihood-based demographic inference using the coalescent Raphael Leblois Centre de Biologie pour la Gestion des Populations (CBGP), INRA, Montpellier master

More information

Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM

Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM Using Mitochondrial DNA (mtdna) for Genealogy Debbie Parker Wayne, CG, CGL SM This is one article of a series on using DNA for genealogical research. There are several types of DNA tests offered for genealogical

More information

Combinatorics. Chapter Permutations. Counting Problems

Combinatorics. Chapter Permutations. Counting Problems Chapter 3 Combinatorics 3.1 Permutations Many problems in probability theory require that we count the number of ways that a particular event can occur. For this, we study the topics of permutations and

More information

Warning: software often displays unrooted trees like this:

Warning: software often displays unrooted trees like this: Warning: software often displays unrooted trees like this: /------------------------------ Chara /-------------------------- Chlorella /---------16 \---------------------------- Volvox +-------------------17

More information

Lecture 1: Introduction to pedigree analysis

Lecture 1: Introduction to pedigree analysis Lecture 1: Introduction to pedigree analysis Magnus Dehli Vigeland NORBIS course, 8 th 12 th of January 2018, Oslo Outline Part I: Brief introductions Pedigrees symbols and terminology Some common relationships

More information

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham Towards the Automatic Design of More Efficient Digital Circuits Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

More information