Simulated gene genealogy of a sample of size 50 from a population of constant size The History of Population Size from Whole Genomes Alan R Rogers October 1, 2018 Short terminal branches; long basal ones Large samples tell us about recent past Not necessary for ancient past N Population Size Skyline Plot Effect of a population explosion 0 7 14 Mutational time before present Genealogy Mismatch Distribution 0 7 14 Site Differences Middle: genealogy of 50 individuals; dots are mutations 1 mutational diff per time unit Bottom: = simulated data, line = theory Wave peaks at population expansion (Drummond et al 2005) Use mutations to estimate length of each interval Long intervals imply large population size Won t work with nuclear DNA: too few mutations per tree Nuclear genome But we can still use the site frequency spectrum Population size Huge amounts of data Recombination makes previous methods unusable 0 10 20 Mutational time before present Site frequency spectrum 0 1/2 Frequency of minor allele
a i: inferences from the site frequency spectrum Stairway Plot uses spectrum (Gutenkunst et al 2009) 1000-Genomes data; (Liu & Fu 2015) Also: recombination is our friend Two (hypothetical) loci in a single diploid genome MRCA Useful data began to appear in about 2000 Crossovers shuffle DNA Each chromosome has many gene genealogies, which vary in length MRCA: most recent common ancestor Gene trees vary in length across the genome Mutation ( ) is more likely on a deep gene tree MRCA MRCA varies along the chromosome PSMC is accurate from 30 ky to 3 my ago MRCA (depth of gene tree) Chromosome Circles: nucleotide sites that differ (are heterozygous) in a single diploid sample Heterozygous sites are denser where gene tree is deep Population size length of MRCA segments and genetic variation within segments PSMC uses this pattern to estimate population history (Li and Durbin 2011)
PSMC estimates from autosomes PSMC estimates from X chromosomes 2 mya (origin of Homo); 200 kya (origin of modern humans); 20 kya (beginning of Holocene) Eurasian/African split 150 kya African bottleneck short and shallow MSMC: using multiple genomes PSMC with Neanderthal as well as Denisova (Prüfer et al 2014) PSMC gives misleading signal of decline in subdivided populations, even if population size is constant (Mazet et al 2014) Population tree N XYND 2NK 2N E 2N 0 1/m 2/m 3/m 4/m Generations Ago Key: N, deme size; K, number of demes; m migration rate T XYND N XY N ND T ND T XY N m N N T N Pops: X Y N D X, Africa; Y, Europe; N, Neanderthal; D, Denisovan
Gene genealogies and nucleotide site patterns m N X Y N D yn: 0 1 1 0 ynd: 0 1 1 1 Gene genealogy within population tree Mutation on red branch site pattern yn Blue branch ynd 0, ancestral; 1, derived Observed Site Pattern Frequencies xy xn xd yn yd nd xyn xyd xnd ynd 01 02 Site Pattern Frequency (fraction of nucleotide sites exhibiting each pattern) X, Africa; Y, Europe; N, Neanderthal; D, Denisovan xy is common because X and Y share ancestry Ditto nd Goal: infer history from these data The mystery of the 4000-year-old Denisovan We argued in 2017 for an early separation of Neanderthals and Denisivans and a bottleneck among their ancestors Mafessoni and Prüfer showed that results are different if one includes singleton site patterns However, the with-singleton anaysis also implied an implausible 4 kya date for the Denisovan fossil How can this be explained? Clues: an excess of site patterns d and xyn x y n d xyn xyd xnd ynd 0160 0165 0170 0175 0028 0030 0032 Site Pattern Frequency Suggests hyperarchaic admixture into Denisovans (Prüfer et al 2014) or early modern admixture into Neanderthals (Kuhlwilm et al 2016) Two ways to inflate d and xyn H D XY N m H X Y N D H X Y N D H d: 0 0 0 1 d: 0 0 0 1 xyn: 1 1 1 0 xyn: 1 1 1 0 Red branch mutations generate site pattern d; Blue generates xyn; 0, ancestral allele; 1, derived; X, Africa; Y, Eurasia; N, Neanderthal; D, Denisovan; H, Hyperarchaic; XY, population ancestral to X and Y m N m H T XYNDH T XY T ND T AV T A T V T D 2N XYND 2N XY 2N ND 2N AV 2N N 000 002 004 006 Admixture Fraction 1 2 3 4 5 6 7 8 910 13 1619 10 5 Years 0 10000 20000 30000 40000 Haploid Population Size (2N)
Implications of previous slide Summary (part 1) m H, T XYNDH T ND 2N ND 2N AV, 2N N T V, T A, T D Substantial H D admixture and a hyperarchaic separation time of 16 mya XY N admixture Early separation of Neanderthals and Denisovans: 747 kya Narrow bottleneck in Neanderthal-Denisovan ancestors Effective Neanderthal population was large early (2N AV 21 k) but small later (2N N 5 k) Vindija, Altai, and Denisovan fossil ages are 70 ky, 150 ky, and 100 ky History of population size affects depth of gene trees, genetic variation, and length of MRCA segments We can use these facts to infer the history of population size Human population has varied in size over past 3 my Bottleneck during last ice age, ending 20 kya African bottleneck was shorter and shallower Eurasian/African split 150 kya European/Asian split 20 kya Summary (part 2) Current consensus Neanderthals and Denisovans separated 450 kya, then declined to tiny population sizes (<1000 individuals) Our view Archaics separated from moderns 750 kya, then endured a bottleneck of 5 ky Neanderthals & Denisovan separated shortly thereafter Neanderthal population was large early & small later