Population Structure. Population Structure

Similar documents
Bottlenecks reduce genetic variation Genetic Drift

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

NON-RANDOM MATING AND INBREEDING

BIOL 502 Population Genetics Spring 2017

Lecture 6: Inbreeding. September 10, 2012

Population Genetics 3: Inbreeding

Investigations from last time. Inbreeding and neutral evolution Genes, alleles and heterozygosity

Decrease of Heterozygosity Under Inbreeding

Kinship and Population Subdivision

CONGEN. Inbreeding vocabulary

Chapter 2: Genes in Pedigrees

Inbreeding and self-fertilization

Inbreeding and self-fertilization

ville, VA Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX

Lecture 1: Introduction to pedigree analysis

PopGen3: Inbreeding in a finite population

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations

University of Washington, TOPMed DCC July 2018

BIOL Evolution. Lecture 8

AFDAA 2012 WINTER MEETING Population Statistics Refresher Course - Lecture 3: Statistics of Kinship Analysis

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

Genome-Wide Association Exercise - Data Quality Control

DNA: Statistical Guidelines

U among relatives in inbred populations for the special case of no dominance or

fbat August 21, 2010 Basic data quality checks for markers

Methods of Parentage Analysis in Natural Populations

9Consanguineous marriage and recessive

I genetic distance for short-term evolution, when the divergence between

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

Objective: Why? 4/6/2014. Outlines:

Forward thinking: the predictive approach

Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves

Characterization of the global Brown Swiss cattle population structure

Linkage Analysis in Merlin. Meike Bartels Kate Morley Danielle Posthuma

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

Nature Genetics: doi: /ng Supplementary Figure 1. Quality control of FALS discovery cohort.

Comparative method, coalescents, and the future

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/70

Population Genetics. Joe Felsenstein. GENOME 453, Autumn Population Genetics p.1/74

Detecting inbreeding depression is difficult in captive endangered species

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

CONDITIONS FOR EQUILIBRIUM

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Statistical methods in genetic relatedness and pedigree analysis

This is a repository copy of Context-dependent associations between heterozygosity and immune variation in a wild carnivore.

INFERRING PURGING FROM PEDIGREE DATA

D became evident that the most striking consequences of inbreeding were increases

Inbreeding Using Genomics and How it Can Help. Dr. Flavio S. Schenkel CGIL- University of Guelph

Cover Page. The handle holds various files of this Leiden University dissertation

Breeding a Royal Line - a cautionary tale

Supplementary Note: Analysis of Latino populations from GALA and MEC reveals genomic loci with biased local ancestry estimation

4. Kinship Paper Challenge

Analysis of geographically structured populations: Estimators based on coalescence

A hidden Markov model to estimate inbreeding from whole genome sequence data

Received October 29, 1920 TABLE OF CONTENTS

2 The Wright-Fisher model and the neutral theory

Population Structure and Genealogies

Genetic Analysis for Spring- and Fall- Run San Joaquin River Chinook Salmon for the San Joaquin River Restoration Program

The Coalescent. Chapter Population Genetic Models

SUPPLEMENTARY INFORMATION

Developing Conclusions About Different Modes of Inheritance

Package EILA. February 19, Index 6. The CEU-CHD-YRI admixed simulation data

1) Using the sightings data, determine who moved from one area to another and fill this data in on the data sheet.

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

Received December 28, 1964

NIH Public Access Author Manuscript Genet Res (Camb). Author manuscript; available in PMC 2011 April 4.

Edinburgh Research Explorer

LASER server: ancestry tracing with genotypes or sequence reads

Impact of inbreeding Managing a declining Holstein gene pool Dr. Filippo Miglior R&D Coordinator, CDN, Guelph, Canada

Exact Inbreeding Coefficient and Effective Size of Finite Populations Under Partial Sib Mating

TREES OF GENES IN POPULATIONS

Reduction of inbreeding in commercial females by rotational mating with several sire lines

Kelmemi et al. BMC Medical Genetics (2015) 16:50 DOI /s

Recent effective population size estimated from segments of identity by descent in the Lithuanian population

Popstats Parentage Statistics Strength of Genetic Evidence In Parentage Testing

ORIGINAL ARTICLE Purging deleterious mutations in conservation programmes: combining optimal contributions with inbred matings

Characterization of the Global Brown Swiss Cattle Population Structure

Implementing single step GBLUP in pigs

TDT vignette Use of snpstats in family based studies

Genetic Conservation of Endangered Animal Populations

STAT 536: The Coalescent

White Paper Global Similarity s Genetic Similarity Map

Coalescents. Joe Felsenstein. GENOME 453, Winter Coalescents p.1/39

Thecompletegenomesequenceofa Neanderthal from the Altai Mountains

Exercise 4 Exploring Population Change without Selection

Coalescents. Joe Felsenstein. GENOME 453, Autumn Coalescents p.1/48

ARTICLE PRIMUS: Rapid Reconstruction of Pedigrees from Genome-wide Estimates of Identity by Descent

Mehdi Sargolzaei L Alliance Boviteq, St-Hyacinthe, QC, Canada and CGIL, University of Guelph, Guelph, ON, Canada. Summary

Two-point linkage analysis using the LINKAGE/FASTLINK programs

Section 6.4. Sampling Distributions and Estimators

Human Genetic Isolation and Population Structure of Hancock County, Tennessee

Bioinformatics I, WS 14/15, D. Huson, December 15,

Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost

Web-based Y-STR database for haplotype frequency estimation and kinship index calculation

genetics paper pets By the end of the eighth grade, students are Learning with Introduction to inheritance by Valerie Raunig Finnerty

ARTICLE Using Genomic Inbreeding Coefficient Estimates for Homozygosity Mapping of Rare Recessive Traits: Application to Taybi-Linder Syndrome

Pedigrees How do scientists trace hereditary diseases through a family history?

Transcription:

Nonrandom Mating HWE assumes that mating is random in the population Most natural populations deviate in some way from random mating There are various ways in which a species might deviate from random mating We will focus on the two most common departures from random mating: inbreeding population subdivision or substructure

Nonrandom Mating: Inbreeding Inbreeding occurs when individuals are more likely to mate with relatives than with randomly chosen individuals in the population Increases the probability that offspring are homozygous, and as a result the number of homozygous individuals at genetic markers in a population is increased Increase in homozygosity can lead to lower fitness in some species Increase in homozygosity can have a detrimental effect: For some species the decrease in fitness is dramatic with complete infertility or inviability after only a few generations of brother-sister mating

Nonrandom Mating: Population Subdivision For subdivided populations, individuals will appear to be inbred due to more homozygotes than expected under the assumption of random mating. Wahlund Effect: Reduction in observed heterozygosity (increased homozygosity) because of pooling discrete subpopulations with different allele frequencies that do not interbreed as a single randomly mating unit.

Wright s F Statistics Sewall Wright invented a set of measures called F statistics for departures from HWE for subdivided populations. F stands for fixation index, where fixation being increased homozygosity F IS is also known as the inbreeding coefficient. The correlation of uniting gametes relative to gametes drawn at random from within a subpopulation (Individual within the Subpopulation) F ST is a measure of population substructure and is most useful for examining the overall genetic divergence among subpopulations Is defined as the correlation of gametes within subpopulations relative to gametes drawn at random from the entire population (Subpopulation within the Total population).

Wright s F Statistics F IT is not often used. It is the overall inbreeding coefficient of an individual relative to the total population (Individual within the Total population).

Genotype Frequencies for Inbred Individuals Consider a bi-allelic genetic marker with alleles A and a. Let p be the frequency of allele A and q = 1 p the frequency of allele a in the population. Consider an individual with inbreeding coefficient F. What are the genotype frequencies for this individual at the marker? Genotype AA Aa aa Frequency

Generalized Hardy-Weinberg Deviations The table below gives genotype frequencies at a marker for when the HWE assumption does not hold: Genotype AA Aa aa Frequency p 2 (1 F ) + pf 2pq(1 F ) q 2 (1 F ) + qf where q = 1 p The F parameter describes the deviation of the genotype frequencies from the HWE frequencies. When F = 0, the genotype frequencies are in HWE. The parameters p and F are sufficient to describe genotype frequencies at a single locus with two alleles.

F st for Subpopulations Example in Gillespie (2004) Consider a population with two equal sized subpopulations. Assume that there is random mating within each subpoulation. Let p 1 = 1 4 and p 2 = 3 4 Below is a table with genotype frequencies Genotype A AA Aa aa 1 1 3 9 Freq. Subpop 1 4 16 8 16 3 9 3 1 Freq. Subpop 2 4 16 8 16 Are the subpopulations in HWE? What are the genotype frequencies for the entire population? What should the genotypic frequencies be if the population is in HWE at the marker?

F st for Subpopulations Fill in the table below. Are there too many homozygotes in this population? Allele Genotype A AA Aa aa 1 1 3 9 Freq. Subpop 1 4 16 8 16 3 Freq. Subpop 2 Freq. Population Hardy-Weinberg Frequencies To obtain a measure of the excess in homozygosity from what we would expect under HWE, solve 4 9 16 3 8 1 16 What is F st? 2pq(1 F ST ) = 3 8

F st for Subpopulations Fill in the table below. Are there too many homozygotes in this population? Allele Genotype A AA Aa aa 1 1 3 9 Freq. Subpop 1 4 16 8 16 3 9 3 1 Freq. Subpop 2 4 16 8 16 1 5 3 5 Freq. Population 2 16 8 16 1 1 1 1 Hardy-Weinberg Frequencies 2 4 2 4 To obtain a measure of the excess in homozygosity from what we would expect under HWE, solve What is F st? 2pq(1 F ST ) = 3 8

F st for Subpopulations The excess homozygosity requires that F ST = For the previous example the allele frequency distribution for the two subpopulations is given. At the population level, it is often difficult to determine whether excess homozygosity in a population is due to inbreeding, to subpopulations, or other causes. European populations with relatively subtle population structure typically have an F st value around.01 (e.g., ancestry from northwest and southeast Europe), F st values that range from 0.1 to 0.3 have been observed for the most divergent populations (Cavalli-Sforza et al. 1994).

F st for Subpopulations The excess homozygosity requires that F ST = 1 4 For the previous example the allele frequency distribution for the two subpopulations is given. At the population level, it is often difficult to determine whether excess homozygosity in a population is due to inbreeding, to subpopulations, or other causes. European populations with relatively subtle population structure typically have an F st value around.01 (e.g., ancestry from northwest and southeast Europe), F st values that range from 0.1 to 0.3 have been observed for the most divergent populations (Cavalli-Sforza et al. 1994).

F st for Subpopulations Nelis et al. (PLOS One, 2009) looked at the genetic structure for various populations Obtained pairwise F st values for the four HapMap sample populations Europeans (CEU) - Africans (YRI): 0.153 Europeans (CEU) - Japanese (JPT): 0.111 Europeans (CEU) - Chinese (CHB): 0.110 Africans (YRI) - Chinese (CHB): 0.190 Africans (YRI) - Japanese (JPT): 0.192 Chinese (CHB) - Japanese (JPT): 0.007

F st for Subpopulations F st can be generalized to populations with an arbitrary number of subpopulations. The idea is to find an expression for F st in terms of the allele frequencies in the subpopulations and the relative sizes of the subpopulations. Consider a single population and let r be the number of subpopulations. Let p be the frequency of the A allele in the population, and let p i be the frequency of A in subpopulation i, where i = 1,..., r F st is often defined as F st = of the p i s with E(p i ) = p. σ2 p p(1 p), where σ2 p is the variance

F st for Subpopulations Let the relative contribution of subpopulation i be c i, where r c i = 1. i=1 Genotype AA Aa aa Freq. Subpop i pi 2 2p i q i qi 2 Freq. Population r i=1 c ipi 2 r i=1 c i2p i q r i i=1 c iqi 2 where q i = 1 p i In the population, we want to find the value F st such that 2pq(1 F st ) = r i=1 c i2p i q i Rearranging terms: F st = 2pq r i=1 c i2p i q i 2pq Now 2pq = 1 p 2 q 2 and r i=1 c i2p i q i = 1 r i=1 c i(p 2 i + q 2 i )

F st for Subpopulations So can show that = r i=1 F st = c i(pi 2 + qi 2) p2 q 2 2pq [ r i=1 c ipi 2 p 2] + [ r i=1 c iqi 2 q 2] 2pq = Var(p i) + Var(q i ) 2pq = 2Var(p i) 2p(1 p) = Var(p i) p(1 p) = σ 2 p p(1 p)

Estimating F st Let n be the total number of sampled individuals from the population and let n i be the number of sampled individuals from subpopulation i Let ˆp i be the allele frequency estimate of the A allele for the sample from subpopulation i Let ˆp = n i i n ˆp i A simple F st estimate is ˆF ST1 = s2 ˆp(1 ˆp), where s2 is the sample variance of the ˆp i s.

Estimating F st Weir and Cockerman (1984) developed an estimate based on the method of moments. Their estimate is MSA = 1 r 1 r n i (ˆp i ˆp) 2 i=1 1 MSW = i (n i 1) ˆF ST2 = where n c = i n i i n2 i i n i r n i ˆp i (1 ˆp i ) i=1 MSA MSW MSA + (n c 1)MSW

GAW 14 COGA Data The Collaborative Study of the Genetics of Alcoholism (COGA) provided genome screen data for locating regions on the genome that influence susceptibility to alcoholism. There were a total of 1,009 individuals from 143 pedigrees with each pedigree containing at least 3 affected individuals. Individuals labeled as white, non-hispanic were considered. Estimated self-kinship and inbreeding coefficients using genome-screen data

COGA Data Histogram for Estimated Self Kinship Values Frequency 0 100 200 300 mean =.511 0.50 0.55 0.60 0.65 Estimated Self Kinship Coefficient Historgram for Estimated Inbreeding Coefficients Frequency 0 100 200 300 mean =.011 0.00 0.05 0.10 0.15 Estimated Inbreeding Coefficient

References Nelis M, Esko T, Mgi R, Zimprich F, Zimprich A, et al. (2009) Genetic Structure of Europeans: A View from the NorthEast. PLoS ONE 4, e5472. doi:10.1371/journal.pone.0005472. Weir BS, Cockerham CC (1984). Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358-1370.