Supplementary Note: Analysis of Latino populations from GALA and MEC reveals genomic loci with biased local ancestry estimation

Size: px
Start display at page:

Download "Supplementary Note: Analysis of Latino populations from GALA and MEC reveals genomic loci with biased local ancestry estimation"

Transcription

1 Supplementary Note: Analysis of Latino populations from GALA and MEC reveals genomic loci with biased local ancestry estimation Bogdan Pasaniuc, Sriram Sankararaman, et al. 1 Relation between Error Rate in local ancestry estimation and MI- LANC Consider a single SNP at which local ancestry is inferred. The error rate in the local ancestry at this SNP is drawn from some distribution F with mean µ 6 and standard deviation σ 6. Given a trio of individuals and assuming that the errors in inferring the local ancestry of each allele in this trio are independent, the probability of at least a single local ancestry error in this trio is denoted ǫ. For small local ancestry error rates, the distribution ofǫacross SNPs has meanµand standard deviation σ. Under the assumption of an uncorrelated error process across trios, the number of ancestry errors at this SNP for n trios is given by O Bin(n,ǫ). Assume that a fraction α of these errors lead to mendelian inconsistencies. ThusM, the MILANC, at this SNP follows M O Bin(O,α) for someα. We want to compute the population correlation ρ = ρ(m,o). Some results, we ll use Then, using the tower property of expectation E[M O] = Oα Var[M O] = Oα(1 α) E[O ǫ] = nǫ Var[O ǫ] = nǫ(1 ǫ) E[M ǫ] = E[E[M O] ǫ] = E[Oα ǫ] = nǫα 1

2 and the variance decomposition Var[M ǫ] = Var[E[M O] ǫ]+e[var[m O] ǫ] = Var[Oα ǫ]+e[oα(1 α) ǫ] = nα 2 ǫ(1 ǫ)+nα(1 α)ǫ = nαǫ(1 αǫ) Applying this again E[O] = E[E[O ǫ]] = E[nǫ] = ne[ǫ] = nµ Var[O] = Var[E[O ǫ]]+e[var[o ǫ]] = Var[nǫ]+E[nǫ(1 ǫ)] = n 2 σ 2 +n ( µ (µ 2 +σ 2 ) ) = n 2 σ 2 +n ( µ (µ 2 +σ 2 ) ) E[M] = E[E[M ǫ]] = E[nǫα] = nαe[ǫ] = nαµ Var[M] = Var[E[M ǫ]]+e[var[m ǫ]] = Var[nǫα]+E[nαǫ(1 αǫ)] = n 2 α 2 σ 2 +nα(µ α(µ 2 +σ 2 )) E[MO] = E[E[MO O]] = E[OE[M O]] = E [ αo 2] = αe[o 2 ] = α ( n 2 (σ 2 +µ 2 )+n ( µ (µ 2 +σ 2 ) )) 2

3 cov(m,o) = E[MO] E[M]E[O] = E[MO] αn 2 µ 2 = α ( n 2 σ 2 +n ( µ (µ 2 +σ 2 ) )) Finally, ρ 2 = cov(m,o)2 Var[O]Var[M] ( ( α n 2 σ 2 +n ( µ (µ 2 +σ 2 ) ))) 2 = (n 2 σ 2 +n(µ (µ 2 +σ 2 )))(n 2 α 2 σ 2 +nα(µ α(µ 2 +σ 2 ))) = α2( n 2 σ 2 +n ( µ (µ 2 +σ 2 ) )) (n 2 α 2 σ 2 +nα(µ α(µ 2 +σ 2 ))) (1) Consider two cases. 1. σ 2 = 0: Then Equation 1 becomes ρ 2 = α2 nµ(1 µ) nαµ(1 αµ) = α(1 µ) (1 αµ) 2. σ 2 > 0: Then Equation 1 becomes ρ 2 = ( 1+ 1 µ(1 µ) n σ ( 2 µ(1 µα) ασ n ) 1 ) 1 For n, we then haveρ 2 1. There are several assumptions underlying our result. 1. Independence of local ancestry estimation errors across the trios at a given SNP. 2. Independence of estimation errors across the alleles within a trio. The above assumptions can be violated in practice. Assumption 1 will be violated due to more distant relatedness amongst the trios. The effect of violating assumption a) is that the effective number of trios is less than n. Nevertheless, the asymptotic result will continue to hold although the rate will be slower. Assumption 2 is likely to be violated when, for example, the offspring is expected to share ancestry with its parents and this increases the correlation of local ancestry errors. Violating assumption 2 can be considered equivalent to having a factor smaller than 6 for the error rates ( µ 6, σ 6 ). It will also change α, the fraction of errors that lead to a MILANC. While this changes the quantitative value of the correlation, the qualitatitive result that the correlation tends to 1 for large sample sizes only whenσ 2 > 0 still holds. 3

4 The result does not assume independence across loci however. The reason is that we are considering the population correlation between the marginal MILANC rate and the marginal true error rate at a locus. The estimation of this correlation will need to deal with non-independence but, here, the bigger problem is that the true error rates for empirical data are unknown. 4

5 Figure 1: Histogram of MILANC rates across the genome in 232 Mexican (left) and 257 Puerto Rican (right) trios showing variability of error rate across genome. LAMP-LD was used for local ancestry estimation using 600k reference panel. Similar results are obtained for other methods (Supplementary Note). 5

6 Figure 2: Histogram of MILANC across the genome in 232 Mexican (left) and 257 Puerto Rican (top) trios, using European(top) or Spanish reference panels (down), showing variability of error rate across genome. LAMP-LD was used for local ancestry estimation using 300k reference panel (see Main text). 6

7 Figure 3: MILANC across the genome in 232 Mexican (left) and 257 Puerto Rican (top) trios, using European(top) or Spanish reference panels (down), showing variability of error rate across genome. Every point denotes average across 1Mb region. LAMP-LD was used for local ancestry estimation using 300k reference panel (see Main text). Horizontal lines denote 3 (4) standard deviations from the mean. 7

8 s Figure 4: Histogram of MILANC across the genome in 232 Mexican (left) and 257 Puerto Rican (top) trios, using European(top) or Spanish reference panels (down), showing variability of error rate across genome. WINPOP was used for local ancestry estimation using 600k reference panel (see Main text). 8

9 Figure 5: MILANC in 232 Mexican (left) and 257 Puerto Rican (right) trios computed when ancestral LD is not modeled (WINPOP) in local ancestry estimation procedure. Horizontal lines denote 3 (4) standard deviations from the mean. 9

10 Figure 6: MILANC versus genome-wide African ancestry proportion showing correlation between genome-wide amount of African Ancestry and Error rate. 10

11 Figure 7: Histogram of MILANC rate in 366 Latino trio families of GALA when either standard (light color) or new reference panels built from the remaining 123 trios (darker color) are used. We observe a significant decrease in average MILANC rate. 11

12 Figure 8: Average local ancestry in 3204 Latino Mexican ancestry samples of MEC data set. Red denotes results for cosmopolitan reference panels while blue denotes results when the reference panels built from GALA trios are also provided. The increase in African average local ancestry at the HLA locus on chromosome 6 is the only genome-wide significant deviations in average local ancestry in this sample. 12

13 Figure 9: Average MILANC in 366 trios across using different sets of reference panels for the European and Native American components (chromosome 6 and 7). We note the significant reduction in MILANC in chromosome 6 locus 20Mb suggesting a much better representation of the true ancestral European and Native American ancestries at this locus. Results are binned into averages across 1Mb regions for better visualization. 13

14 Figure 10: Deviation (in stdevs) of the average local ancestry (European, Native American and African) in 3204 MEC samples versus observed MILANC rate in the 232 Mexican trio families of GALA. Every dot denotes the average values across 1 Mb contiguous genomic region. We observe significant correlation of MILANC rate to decreases in Native American ancestry and increases in European ancestry suggesting that when errors are made in local ancestry inference they are more likely to miscall true Native American chromosomes as European. 14

15 Figure 11: Deviation (in stdevs) of the average local ancestry (European, Native American and African) in 3204 MEC samples versus observed MILANC rate in the 232 Mexican trio families of GALA when the new reference panels are employed in addition to the cosmopolitan panels. Every dot denotes the average values across 1 Mb contiguous genomic region. We observe significant correlation of MILANC to decreases in Native American ancestry and increases in European ancestry, albeit at a much smaller decrease than when the cosmopolitan reference panels are used. 15

16 Figure 12: Haploid Error Rate (number of allele miscalled from total number of alleles) versus MILANC rate as obtained by LAMP-LD in 1,000 simulated Latinos (chromosome 1). 16

17 YRI AFR PR AFR MEX AFR MEX 2.43% 0.62% AFR PR 1.64% 0.62% CEU EUR PR EUR MEX EUR MEX 1.05% 0.43% EUR PR 0.91% 0.43% NA NA PR NA MEX NA MEX 1.40% 2.57% NA PR 2.96% 2.57% Table 1: F ST estimates between inferred ancestral segments in Mexicans and Puerto Ricans and different ancestral panels computed on the 600k set of SNPs. 17

18 Population (European, Native American) (European,African) (Native American, African) Mexicans 77% 12% 11% Puerto Ricans 54% 32% 14% Table 2: Proportion of trio families showing one MILANC clustered according to the pair of ancestries that are inconsistent between parent and child. LAMP-LD (CEU-600k) was used for local ancestry inference. 18

19 Method Correlation MILANC rate to Local Ancestry European Native American African GALA Mexican WINPOP LAMP-LD ALLOY PCAdmix GALA Puerto Rican WINPOP LAMP-LD ALLOY PCAdmix Table 3: Correlation of MILANC rate to local ancestry across the genome. 19

20 Window Number of Correlation MILANC rate to Local Ancestry Size Windows European Native American African GALA Mexican 10Kb Kb Kb Kb Mb M b Mb Mb GALA Puerto Rican 10Kb Kb Kb Kb Mb Mb Mb Mb Table 4: Robustness to window size. We plot the correlation between MILANC and Average local ancestry in the local ancestry estimates of LAMP-LD in the GALA study across various window sizes. Overall the results are insensitive to window size, although we note a slight increase in correlation as windows size increases across all ancestries due to averaging within a windows that removes part of the sampling noise in local ancestry and MILANC estimation. All results reported use a 1Mb window size. 20

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations Genetics: Early Online, published on July 20, 2016 as 10.1534/genetics.115.184184 GENETICS INVESTIGATION Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations Caitlin

More information

Figure S5 PCA of individuals run on the EAS array reporting Pacific Islander ethnicity, including those reporting another ethnicity.

Figure S5 PCA of individuals run on the EAS array reporting Pacific Islander ethnicity, including those reporting another ethnicity. Figure S1 PCA of European and West Asian subjects on the EUR array. A clear Ashkenazi cluster is observed. The largest cluster depicts the northwest southeast cline within Europe. A Those reporting a single

More information

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms Magnus Nordborg University of Southern California The importance of history Genetic polymorphism data represent the outcome

More information

LASER server: ancestry tracing with genotypes or sequence reads

LASER server: ancestry tracing with genotypes or sequence reads LASER server: ancestry tracing with genotypes or sequence reads The LASER method Supplementary Data For each ancestry reference panel of N individuals, LASER applies principal components analysis (PCA)

More information

University of Washington, TOPMed DCC July 2018

University of Washington, TOPMed DCC July 2018 Module 12: Comput l Pipeline for WGS Relatedness Inference from Genetic Data Timothy Thornton (tathornt@uw.edu) & Stephanie Gogarten (sdmorris@uw.edu) University of Washington, TOPMed DCC July 2018 1 /

More information

Objective: Why? 4/6/2014. Outlines:

Objective: Why? 4/6/2014. Outlines: Objective: Develop mathematical models that quantify/model resemblance between relatives for phenotypes of a quantitative trait : - based on pedigree - based on markers Outlines: Causal model for covariances

More information

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London. Kinship/relatedness David Balding Professor of Statistical Genetics University of Melbourne, and University College London 2 Feb 2016 1 Ways to measure relatedness 2 Pedigree-based kinship coefficients

More information

Estimating Ancient Population Sizes using the Coalescent with Recombination

Estimating Ancient Population Sizes using the Coalescent with Recombination Estimating Ancient Population Sizes using the Coalescent with Recombination Sara Sheehan joint work with Kelley Harris and Yun S. Song May 26, 2012 Sheehan, Harris, Song May 26, 2012 1 Motivation Introduction

More information

Inbreeding and self-fertilization

Inbreeding and self-fertilization Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that I went over a couple of lectures ago? Well, we re about

More information

Inbreeding and self-fertilization

Inbreeding and self-fertilization Inbreeding and self-fertilization Introduction Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that we just finished? Well, we re about to begin violating

More information

Population Structure. Population Structure

Population Structure. Population Structure Nonrandom Mating HWE assumes that mating is random in the population Most natural populations deviate in some way from random mating There are various ways in which a species might deviate from random

More information

Genome-Wide Association Exercise - Data Quality Control

Genome-Wide Association Exercise - Data Quality Control Genome-Wide Association Exercise - Data Quality Control The Rockefeller University, New York, June 25, 2016 Copyright 2016 Merry-Lynn McDonald & Suzanne M. Leal Introduction In this exercise, you will

More information

ville, VA Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX

ville, VA Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX Robust Relationship Inference in Genome Wide Association Studies Ani Manichaikul 1,2, Josyf Mychaleckyj 1, Stephen S. Rich 1, Kathy Daly 3, Michele Sale 1,4,5 and Wei- Min Chen 1,2,* 1 Center for Public

More information

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor

Kenneth Nordtvedt. Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor Kenneth Nordtvedt Many genetic genealogists eventually employ a time-tomost-recent-common-ancestor (TMRCA) tool to estimate how far back in time the common ancestor existed for two Y-STR haplotypes obtained

More information

Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations

Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations Alkes L. Price 1,2,3, Arti Tandon 3,4, Nick Patterson 3, Kathleen C. Barnes 5, Nicholas Rafaels 5, Ingo Ruczinski

More information

This page intentionally left blank

This page intentionally left blank Appendix E Labs This page intentionally left blank Dice Lab (Worksheet) Objectives: 1. Learn how to calculate basic probabilities of dice. 2. Understand how theoretical probabilities explain experimental

More information

TDT vignette Use of snpstats in family based studies

TDT vignette Use of snpstats in family based studies TDT vignette Use of snpstats in family based studies David Clayton April 30, 2018 Pedigree data The snpstats package contains some tools for analysis of family-based studies. These assume that a subject

More information

Chapter 6 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc.

Chapter 6 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc. 1 2 Learning Objectives Chapter 6 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. 3 4 5 Subgroup Data with Unknown μ and σ Chapter 6 Introduction to Statistical Quality

More information

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory Vineet Bafna Harish Nagarajan and Nitin Udpa 1 Disclaimer Please note that a lot of the text and figures here are copied from

More information

Bioinformatics I, WS 14/15, D. Huson, December 15,

Bioinformatics I, WS 14/15, D. Huson, December 15, Bioinformatics I, WS 4/5, D. Huson, December 5, 204 07 7 Introduction to Population Genetics This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at the Australian

More information

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1 Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Example: population mean Statistic known value calculated

More information

Gene coancestry in pedigrees and populations

Gene coancestry in pedigrees and populations Gene coancestry in pedigrees and populations Thompson, Elizabeth University of Washington, Department of Statistics Box 354322 Seattle, WA 98115-4322, USA E-mail: eathomp@uw.edu Glazner, Chris University

More information

BIOL Evolution. Lecture 8

BIOL Evolution. Lecture 8 BIOL 432 - Evolution Lecture 8 Expected Genotype Frequencies in the Absence of Evolution are Determined by the Hardy-Weinberg Equation. Assumptions: 1) No mutation 2) Random mating 3) Infinite population

More information

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times The coalescent The genealogical history of a population The coalescent process Identity by descent Distribution of pairwise coalescence times Adding mutations Expected pairwise differences Evolutionary

More information

Ancestral Recombination Graphs

Ancestral Recombination Graphs Ancestral Recombination Graphs Ancestral relationships among a sample of recombining sequences usually cannot be accurately described by just a single genealogy. Linked sites will have similar, but not

More information

Chapter 3. The Normal Distributions. BPS - 5th Ed. Chapter 3 1

Chapter 3. The Normal Distributions. BPS - 5th Ed. Chapter 3 1 Chapter 3 The Normal Distributions BPS - 5th Ed. Chapter 3 1 Density Curves Example: here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical

More information

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application Coalescence History, Model, and Application Outline History Origins of theory/approach Trace the incorporation of other s ideas Coalescence Definition and descriptions The Model Assumptions and Uses Application

More information

Density Curves. Chapter 3. Density Curves. Density Curves. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition.

Density Curves. Chapter 3. Density Curves. Density Curves. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 The Normal Distributions Example: here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical idialization for the distribution.

More information

Chapter 2: Genes in Pedigrees

Chapter 2: Genes in Pedigrees Chapter 2: Genes in Pedigrees Chapter 2-0 2.1 Pedigree definitions and terminology 2-1 2.2 Gene identity by descent (ibd) 2-5 2.3 ibd of more than 2 genes 2-14 2.4 Data on relatives 2-21 2.1.1 GRAPHICAL

More information

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis Ranajit Chakraborty, PhD Center for Computational Genomics Institute of Applied Genetics Department

More information

Sampling distributions and the Central Limit Theorem

Sampling distributions and the Central Limit Theorem Sampling distributions and the Central Limit Theorem Johan A. Elkink University College Dublin 14 October 2013 Johan A. Elkink (UCD) Central Limit Theorem 14 October 2013 1 / 29 Outline 1 Sampling 2 Statistical

More information

Lecture 6: Inbreeding. September 10, 2012

Lecture 6: Inbreeding. September 10, 2012 Lecture 6: Inbreeding September 0, 202 Announcements Hari s New Office Hours Tues 5-6 pm Wed 3-4 pm Fri 2-3 pm In computer lab 3306 LSB Last Time More Hardy-Weinberg Calculations Merle Patterning in Dogs:

More information

Forward thinking: the predictive approach

Forward thinking: the predictive approach Coalescent Theory 1 Forward thinking: the predictive approach Random variation in reproduction causes random fluctuation in allele frequencies. Can describe this process as diffusion: (Wright 1931) showed

More information

ICMP DNA REPORTS GUIDE

ICMP DNA REPORTS GUIDE ICMP DNA REPORTS GUIDE Distribution: General Sarajevo, 16 th December 2010 GUIDE TO ICMP DNA REPORTS 1. Purpose of This Document 1. The International Commission on Missing Persons (ICMP) endeavors to secure

More information

SNP variant discovery in pedigrees using Bayesian networks. Amit R. Indap

SNP variant discovery in pedigrees using Bayesian networks. Amit R. Indap SNP variant discovery in pedigrees using Bayesian networks Amit R. Indap 1 1 Background Next generation sequencing technologies have reduced the cost and increased the throughput of DNA sequencing experiments

More information

ARTICLE A Genomewide Admixture Map for Latino Populations

ARTICLE A Genomewide Admixture Map for Latino Populations ARTICLE A Genomewide Admixture Map for Latino Populations Alkes L. Price, Nick Patterson, Fuli Yu, David R. Cox, Alicja Waliszewska, Gavin J. McDonald, Arti Tandon, Christine Schirmer, Julie Neubauer,

More information

Package EILA. February 19, Index 6. The CEU-CHD-YRI admixed simulation data

Package EILA. February 19, Index 6. The CEU-CHD-YRI admixed simulation data Type Package Title Efficient Inference of Local Ancestry Version 0.1-2 Date 2013-09-09 Package EILA February 19, 2015 Author James J. Yang, Jia Li, Anne Buu, and L. Keoki Williams Maintainer James J. Yang

More information

Methods of Parentage Analysis in Natural Populations

Methods of Parentage Analysis in Natural Populations Methods of Parentage Analysis in Natural Populations Using molecular markers, estimates of genetic maternity or paternity can be achieved by excluding as parents all adults whose genotypes are incompatible

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Quality control of FALS discovery cohort.

Nature Genetics: doi: /ng Supplementary Figure 1. Quality control of FALS discovery cohort. Supplementary Figure 1 Quality control of FALS discovery cohort. Exome sequences were obtained for 1,376 FALS cases and 13,883 controls. Samples were excluded in the event of exome-wide call rate

More information

2 The Wright-Fisher model and the neutral theory

2 The Wright-Fisher model and the neutral theory 0 THE WRIGHT-FISHER MODEL AND THE NEUTRAL THEORY The Wright-Fisher model and the neutral theory Although the main interest of population genetics is conceivably in natural selection, we will first assume

More information

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/28 Correlation of

More information

Ancient Admixture in Human History

Ancient Admixture in Human History Genetics: Published Articles Ahead of Print, published on September 7, 2012 as 10.1534/genetics.112.145037 Ancient Admixture in Human History Nick Patterson 1, Priya Moorjani 2, Yontao Luo 3, Swapan Mallick

More information

Population Structure and Genealogies

Population Structure and Genealogies Population Structure and Genealogies One of the key properties of Kingman s coalescent is that each pair of lineages is equally likely to coalesce whenever a coalescent event occurs. This condition is

More information

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA Population Genetics using Trees Peter Beerli Genome Sciences University of Washington Seattle WA Outline 1. Introduction to the basic coalescent Population models The coalescent Likelihood estimation of

More information

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes.

Simulated gene genealogy of a sample of size 50 from a population of constant size. The History of Population Size from Whole Genomes. Simulated gene genealogy of a sample of size 50 from a population of constant size The History of Population Size from Whole Genomes Alan R Rogers October 1, 2018 Short terminal branches; long basal ones

More information

Comparative method, coalescents, and the future

Comparative method, coalescents, and the future Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/36 Correlation of

More information

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000

Coalescent Theory. Magnus Nordborg. Department of Genetics, Lund University. March 24, 2000 Coalescent Theory Magnus Nordborg Department of Genetics, Lund University March 24, 2000 Abstract The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous

More information

Section 6.4. Sampling Distributions and Estimators

Section 6.4. Sampling Distributions and Estimators Section 6.4 Sampling Distributions and Estimators IDEA Ch 5 and part of Ch 6 worked with population. Now we are going to work with statistics. Sample Statistics to estimate population parameters. To make

More information

The History of African Gene Flow into Southern Europeans, Levantines, and Jews

The History of African Gene Flow into Southern Europeans, Levantines, and Jews The History of African Gene Flow into Southern Europeans, Levantines, and Jews Priya Moorjani 1,2 *, Nick Patterson 2, Joel N. Hirschhorn 1,2,3, Alon Keinan 4, Li Hao 5, Gil Atzmon 6, Edward Burns 6, Harry

More information

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1 Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic

More information

Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018

Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018 DNA, Ancestry, and Your Genealogical Research- Segments and centimorgans Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018 1 Today s agenda Brief review of previous DIG session

More information

CONGEN. Inbreeding vocabulary

CONGEN. Inbreeding vocabulary CONGEN Inbreeding vocabulary Inbreeding Mating between relatives. Inbreeding depression Reduction in fitness due to inbreeding. Identical by descent Alleles that are identical by descent are direct descendents

More information

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1 Chapter 11 Sampling Distributions BPS - 5th Ed. Chapter 11 1 Sampling Terminology Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic

More information

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting

Theoretical Population Biology. An approximate likelihood for genetic data under a model with recombination and population splitting Theoretical Population Biology 75 (2009) 33 345 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb An approximate likelihood for genetic

More information

DNA: UNLOCKING THE CODE

DNA: UNLOCKING THE CODE DNA: UNLOCKING THE CODE Connecting Cousins for Genetic Genealogy Bryant McAllister, PhD Associate Professor of Biology University of Iowa bryant-mcallister@uiowa.edu Iowa Genealogical Society April 9,

More information

Population genetics: Coalescence theory II

Population genetics: Coalescence theory II Population genetics: Coalescence theory II Peter Beerli August 27, 2009 1 The variance of the coalescence process The coalescent is an accumulation of waiting times. We can think of it as standard queuing

More information

Image Filtering in Spatial domain. Computer Vision Jia-Bin Huang, Virginia Tech

Image Filtering in Spatial domain. Computer Vision Jia-Bin Huang, Virginia Tech Image Filtering in Spatial domain Computer Vision Jia-Bin Huang, Virginia Tech Administrative stuffs Lecture schedule changes Office hours - Jia-Bin (44 Whittemore Hall) Friday at : AM 2: PM Office hours

More information

Discrete Random Variables Day 1

Discrete Random Variables Day 1 Discrete Random Variables Day 1 What is a Random Variable? Every probability problem is equivalent to drawing something from a bag (perhaps more than once) Like Flipping a coin 3 times is equivalent to

More information

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO Antennas and Propagation b: Path Models Rayleigh, Rician Fading, MIMO Introduction From last lecture How do we model H p? Discrete path model (physical, plane waves) Random matrix models (forget H p and

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Runs of Homozygosity in European Populations Citation for published version: McQuillan, R, Leutenegger, A-L, Abdel-Rahman, R, Franklin, CS, Pericic, M, Barac-Lauc, L, Smolej-

More information

Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4,

Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4, 1 Inference of population structure using dense haplotype data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers,3 and Daniel Falush,4, 1 Department of Mathematics, University of Bristol, Bristol,

More information

TREES OF GENES IN POPULATIONS

TREES OF GENES IN POPULATIONS 1 TREES OF GENES IN POPULATIONS Joseph Felsenstein Abstract Trees of ancestry of copies of genes form in populations, as a result of the randomness of birth, death, and Mendelian reproduction. Considering

More information

Confidence Intervals. Class 23. November 29, 2011

Confidence Intervals. Class 23. November 29, 2011 Confidence Intervals Class 23 November 29, 2011 Last Time When sampling from a population in which 30% of individuals share a certain characteristic, we identified the reasonably likely values for the

More information

Pedigree Reconstruction using Identity by Descent

Pedigree Reconstruction using Identity by Descent Pedigree Reconstruction using Identity by Descent Bonnie Kirkpatrick Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-43 http://www.eecs.berkeley.edu/pubs/techrpts/2010/eecs-2010-43.html

More information

USE OF BASIC ELECTRONIC MEASURING INSTRUMENTS Part II, & ANALYSIS OF MEASUREMENT ERROR 1

USE OF BASIC ELECTRONIC MEASURING INSTRUMENTS Part II, & ANALYSIS OF MEASUREMENT ERROR 1 EE 241 Experiment #3: USE OF BASIC ELECTRONIC MEASURING INSTRUMENTS Part II, & ANALYSIS OF MEASUREMENT ERROR 1 PURPOSE: To become familiar with additional the instruments in the laboratory. To become aware

More information

ARTICLE A Genomewide Admixture Map for Latino Populations

ARTICLE A Genomewide Admixture Map for Latino Populations ARTICLE A Genomewide Admixture Map for Latino Populations Alkes L. Price, Nick Patterson, Fuli Yu, David R. Cox, Alicja Waliszewska, Gavin J. McDonald, Arti Tandon, Christine Schirmer, Julie Neubauer,

More information

Online Appendix. Intergenerational Mobility and the Informational Content of Surnames. José V. Rodríguez Mora. University of Edinburgh and CEPR

Online Appendix. Intergenerational Mobility and the Informational Content of Surnames. José V. Rodríguez Mora. University of Edinburgh and CEPR Online Appendix Intergenerational Mobility and the Informational Content of Surnames Maia Güell University of Edinburgh, CEP (LSE), CEPR & IZA José V. Rodríguez Mora University of Edinburgh and CEPR November

More information

Laboratory 1: Uncertainty Analysis

Laboratory 1: Uncertainty Analysis University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can

More information

G R AD E 4 UNIT 3: FRACTIONS - LESSONS 1-3

G R AD E 4 UNIT 3: FRACTIONS - LESSONS 1-3 G R AD E UNIT : FRACTIONS - LESSONS - KEY CONCEPT OVERVIEW In these lessons, students explore fraction equivalence. They show how fractions can be expressed as the sum of smaller fractions by using different

More information

1. Do you live in Allegheny County, Pennsylvania? 2. Is your annual household income more than $50,000? 3. Do you have a paying job?

1. Do you live in Allegheny County, Pennsylvania? 2. Is your annual household income more than $50,000? 3. Do you have a paying job? United Way of Allegheny County would like to know more about the problems that make it harder for people in our region to get and keep employment. In this survey, we ll be asking you about the transportation

More information

Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes.

Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes. Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes Introduction African Ancestry: The hypothesis, based on considerable circumstantial

More information

Inference of Population Structure using Dense Haplotype Data

Inference of Population Structure using Dense Haplotype Data using Dense Haplotype Data Daniel John Lawson 1, Garrett Hellenthal 2, Simon Myers 3., Daniel Falush 4,5. * 1 Department of Mathematics, University of Bristol, Bristol, United Kingdom, 2 Wellcome Trust

More information

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees Detection of Misspecified Relationships in Inbred and Outbred Pedigrees Lei Sun 1, Mark Abney 1,2, Mary Sara McPeek 1,2 1 Department of Statistics, 2 Department of Human Genetics, University of Chicago,

More information

Chapter 4 Displaying and Describing Quantitative Data

Chapter 4 Displaying and Describing Quantitative Data Chapter 4 Displaying and Describing Quantitative Data Overview Key Concepts Be able to identify an appropriate display for any quantitative variable. Be able to guess the shape of the distribution of a

More information

One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8,... The assumed standard deviation = 110

One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8,... The assumed standard deviation = 110 SMAM 314 Computer Assignment 3 1.Suppose n = 100 lightbulbs are selected at random from a large population.. Assume that the light bulbs put on test until they fail. Assume that for the population of light

More information

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror Image analysis CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror A two- dimensional image can be described as a function of two variables f(x,y). For a grayscale image, the value of f(x,y) specifies the brightness

More information

Illumina GenomeStudio Analysis

Illumina GenomeStudio Analysis Illumina GenomeStudio Analysis Paris Veltsos University of St Andrews February 23, 2012 1 Introduction GenomeStudio is software by Illumina used to score SNPs based on the Illumina BeadExpress platform.

More information

Chpt 2. Frequency Distributions and Graphs. 2-3 Histograms, Frequency Polygons, Ogives / 35

Chpt 2. Frequency Distributions and Graphs. 2-3 Histograms, Frequency Polygons, Ogives / 35 Chpt 2 Frequency Distributions and Graphs 2-3 Histograms, Frequency Polygons, Ogives 1 Chpt 2 Homework 2-3 Read pages 48-57 p57 Applying the Concepts p58 2-4, 10, 14 2 Chpt 2 Objective Represent Data Graphically

More information

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2

Coalescence time distributions for hypothesis testing -Kapil Rajaraman 498BIN, HW# 2 Coalescence time distributions for hypothesis testing -Kapil Rajaraman (rajaramn@uiuc.edu) 498BIN, HW# 2 This essay will be an overview of Maryellen Ruvolo s work on studying modern human origins using

More information

1 NOTE: This paper reports the results of research and analysis

1 NOTE: This paper reports the results of research and analysis Race and Hispanic Origin Data: A Comparison of Results From the Census 2000 Supplementary Survey and Census 2000 Claudette E. Bennett and Deborah H. Griffin, U. S. Census Bureau Claudette E. Bennett, U.S.

More information

Physics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming)

Physics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming) Physics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming) Purpose: The purpose of this lab is to introduce students to some of the properties of thin lenses and mirrors.

More information

Section 1.5 Graphs and Describing Distributions

Section 1.5 Graphs and Describing Distributions Section 1.5 Graphs and Describing Distributions Data can be displayed using graphs. Some of the most common graphs used in statistics are: Bar graph Pie Chart Dot plot Histogram Stem and leaf plot Box

More information

Statistical Hypothesis Testing

Statistical Hypothesis Testing Statistical Hypothesis Testing Statistical Hypothesis Testing is a kind of inference Given a sample, say something about the population Examples: Given a sample of classifications by a decision tree, test

More information

MULTISPECTRAL IMAGE PROCESSING I

MULTISPECTRAL IMAGE PROCESSING I TM1 TM2 337 TM3 TM4 TM5 TM6 Dr. Robert A. Schowengerdt TM7 Landsat Thematic Mapper (TM) multispectral images of desert and agriculture near Yuma, Arizona MULTISPECTRAL IMAGE PROCESSING I SENSORS Multispectral

More information

The Coalescent. Chapter Population Genetic Models

The Coalescent. Chapter Population Genetic Models Chapter 3 The Coalescent To coalesce means to grow together, to join, or to fuse. When two copies of a gene are descended from a common ancestor which gave rise to them in some past generation, looking

More information

Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves

Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves Journal of Heredity, 17, 1 16 doi:1.19/jhered/esw8 Original Article Advance Access publication December 1, 16 Original Article Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale

More information

Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost

Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost Huang et al. Genetics Selection Evolution 2012, 44:25 Genetics Selection Evolution RESEARCH Open Access Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost Yijian

More information

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS

DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS Adv. Appl. Prob. 31, 1027 1035 (1999) Printed in Northern Ireland Applied Probability Trust 1999 DISCUSSION: RECENT COMMON ANCESTORS OF ALL PRESENT-DAY INDIVIDUALS It is a pleasure to be able to comment

More information

Empirical Path Loss Models

Empirical Path Loss Models Empirical Path Loss Models 1 Free space and direct plus reflected path loss 2 Hata model 3 Lee model 4 Other models 5 Examples Levis, Johnson, Teixeira (ESL/OSU) Radiowave Propagation August 17, 2018 1

More information

DNA sequencing is an invaluable tool for understanding

DNA sequencing is an invaluable tool for understanding INVESTIGATION Population Genetics Models of Local Ancestry Simon Gravel 1 Genetics Department, Stanford University, Stanford, California 9435-512 ABSTRACT Migrations have played an important role in shaping

More information

Big Y-700 White Paper

Big Y-700 White Paper Big Y-700 White Paper Powering discovery in the field of paternal ancestry Authors: Caleb Davis, Michael Sager, Göran Runfeldt, Elliott Greenspan, Arjan Bormans, Bennett Greenspan, and Connie Bormans Last

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

x y

x y 1. Find the mean of the following numbers: ans: 26.25 3, 8, 15, 23, 35, 37, 41, 48 2. Find the median of the following numbers: ans: 24 8, 15, 2, 23, 41, 83, 91, 112, 17, 25 3. Find the sample standard

More information

DNA: Statistical Guidelines

DNA: Statistical Guidelines Frequency calculations for STR analysis When a probative association between an evidence profile and a reference profile is made, a frequency estimate is calculated to give weight to the association. Frequency

More information

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing Arthur J. Eisenberg, Ph.D. Director DNA Identity Laboratory UNT-Health Science Center eisenber@hsc.unt.edu PATERNITY TESTING

More information

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL

Part I. Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Part I Concepts and Methods in Bacterial Population Genetics COPYRIGHTED MATERIAL Chapter 1 The Coalescent of Bacterial Populations Mikkel H. Schierup and Carsten Wiuf 1.1 BACKGROUND AND MOTIVATION Recent

More information

NON-RANDOM MATING AND INBREEDING

NON-RANDOM MATING AND INBREEDING Instructor: Dr. Martha B. Reiskind AEC 495/AEC592: Conservation Genetics DEFINITIONS Nonrandom mating: Mating individuals are more closely related or less closely related than those drawn by chance from

More information

Lesson Sampling Distribution of Differences of Two Proportions

Lesson Sampling Distribution of Differences of Two Proportions STATWAY STUDENT HANDOUT STUDENT NAME DATE INTRODUCTION The GPS software company, TeleNav, recently commissioned a study on proportions of people who text while they drive. The study suggests that there

More information

Walter Steets Houston Genealogical Forum DNA Interest Group April 7, 2018

Walter Steets Houston Genealogical Forum DNA Interest Group April 7, 2018 Ancestry DNA and GEDmatch Walter Steets Houston Genealogical Forum DNA Interest Group April 7, 2018 Today s agenda Recent News about DNA Testing DNA Cautions: DNA Data Used for Forensic Purposes New Technology:

More information

AFRICAN ANCEvSTRY OF THE WHITE AMERICAN POPULATION*

AFRICAN ANCEvSTRY OF THE WHITE AMERICAN POPULATION* AFRICAN ANCEvSTRY OF THE WHITE AMERICAN POPULATION* ROBERT P. STUCKERT Department of Sociology and Anthropology, The Ohio State University, Columbus 10 Defining a racial group generally poses a problem

More information