A hidden Markov model to estimate inbreeding from whole genome sequence data

Similar documents
CONGEN. Inbreeding vocabulary

Decrease of Heterozygosity Under Inbreeding

Objective: Why? 4/6/2014. Outlines:

Chapter 2: Genes in Pedigrees

Inbreeding Using Genomics and How it Can Help. Dr. Flavio S. Schenkel CGIL- University of Guelph

NON-RANDOM MATING AND INBREEDING

Kinship/relatedness. David Balding Professor of Statistical Genetics University of Melbourne, and University College London.

Mehdi Sargolzaei L Alliance Boviteq, St-Hyacinthe, QC, Canada and CGIL, University of Guelph, Guelph, ON, Canada. Summary

Gene coancestry in pedigrees and populations

BIOL 502 Population Genetics Spring 2017

A general quadratic programming method for the optimisation of genetic contributions using interior point algorithm. R Pong-Wong & JA Woolliams

Bottlenecks reduce genetic variation Genetic Drift

Lecture 6: Inbreeding. September 10, 2012

Optimum contribution selection conserves genetic diversity better than random selection in small populations with overlapping generations

Lecture 1: Introduction to pedigree analysis

Trends in genome wide and region specific genetic diversity in the Dutch Flemish Holstein Friesian breeding program from 1986 to 2015

Comparative method, coalescents, and the future

Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost

Inbreeding depression in corn. Inbreeding. Inbreeding depression in humans. Genotype frequencies without random mating. Example.

BIOL Evolution. Lecture 8

Management of genetic variability in French small ruminants with and without pedigree information

Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves

Factors affecting phasing quality in a commercial layer population

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Investigations from last time. Inbreeding and neutral evolution Genes, alleles and heterozygosity

20 th Int. Symp. Animal Science Days, Kranjska gora, Slovenia, Sept. 19 th 21 st, 2012.

Forensic use of the genomic relationship matrix to validate and discover livestock. pedigrees

Population Genetics 3: Inbreeding

Estimating Ancient Population Sizes using the Coalescent with Recombination

Genome-Wide Association Exercise - Data Quality Control

University of Washington, TOPMed DCC July 2018

Impact of inbreeding Managing a declining Holstein gene pool Dr. Filippo Miglior R&D Coordinator, CDN, Guelph, Canada

Implementing single step GBLUP in pigs

PopGen3: Inbreeding in a finite population

Characterization of the global Brown Swiss cattle population structure

Population Structure. Population Structure

Recent effective population size estimated from segments of identity by descent in the Lithuanian population

Breeding a Royal Line - a cautionary tale

Estimation of the Inbreeding Coefficient through Use of Genomic Data

Ancestral Recombination Graphs

Populations. Arindam RoyChoudhury. Department of Biostatistics, Columbia University, New York NY 10032, U.S.A.,

ARTICLE Using Genomic Inbreeding Coefficient Estimates for Homozygosity Mapping of Rare Recessive Traits: Application to Taybi-Linder Syndrome

Detection of Misspecified Relationships in Inbred and Outbred Pedigrees

Inbreeding Levels and Pedigree Structure of Landrace, Yorkshire and Duroc Populations of Major Swine Breeding Farms in Republic of Korea

Inbreeding and self-fertilization

Nature Genetics: doi: /ng Supplementary Figure 1. Quality control of FALS discovery cohort.

Population Genetics using Trees. Peter Beerli Genome Sciences University of Washington Seattle WA

BIOINFORMATICS. Efficient Genome Ancestry Inference in Complex Pedigrees with Inbreeding

GENEALOGICAL ANALYSIS IN SMALL POPULATIONS: THE CASE OF FOUR SLOVAK BEEF CATTLE BREEDS

Inbreeding and self-fertilization

Detecting inbreeding depression is difficult in captive endangered species

Pedigree Reconstruction using Identity by Descent

D became evident that the most striking consequences of inbreeding were increases

Population analysis of the local endangered Přeštice Black-Pied pig breed. Krupa, E., Krupová, Z., Žáková, E., Kasarda, R., Svitáková, A.

NIH Public Access Author Manuscript Genet Res (Camb). Author manuscript; available in PMC 2011 April 4.

Statistical methods in genetic relatedness and pedigree analysis

Genetic Conservation of Endangered Animal Populations

Walter Steets Houston Genealogical Forum DNA Interest Group January 6, 2018

Walter Steets Houston Genealogical Forum DNA Interest Group February 24, 2018

GENETICS AND BREEDING. Calculation and Use of Inbreeding Coefficients for Genetic Evaluation of United States Dairy Cattle

Comparison of genetic diversity in dual-purpose and beef Pinzgau populations

Genetic variability of Lizard canary breed inferred from pedigree analysis

LASER server: ancestry tracing with genotypes or sequence reads

Walter Steets Houston Genealogical Forum DNA Interest Group April 7, 2018

Viral epidemiology and the Coalescent

INFERRING PURGING FROM PEDIGREE DATA

Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms

ARTICLE PRIMUS: Rapid Reconstruction of Pedigrees from Genome-wide Estimates of Identity by Descent

4. Kinship Paper Challenge

Conservation Genetics Inbreeding, Fluctuating Asymmetry, and Captive Breeding Exercise

Linkage Analysis in Merlin. Meike Bartels Kate Morley Danielle Posthuma

Genetic management without pedigree: effectiveness of a breeding circle in a rare sheep breed

Edinburgh Research Explorer

Characterization of the Global Brown Swiss Cattle Population Structure

Methods of Parentage Analysis in Natural Populations

Identification of the Hypothesized African Ancestry of the Wife of Pvt. Henry Windecker Using Genomic Testing of the Autosomes.

Autosomal DNA. What is autosomal DNA? X-DNA

Small populations are particularly vulnerable to extinction due

Determining Relatedness from a Pedigree Diagram

Analysis of geographically structured populations: Estimators based on coalescence

Inbreeding and its effect on fitness traits in captive populations of North Persian leopard and Mhorr gazelle

CAGGNI s DNA Special Interest Group

ville, VA Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX

Maximum likelihood pedigree reconstruction using integer programming

Growing the Family Tree: The Power of DNA in Reconstructing Family Relationships

LABOGENA. Genetic Analysis Laboratory for Animal Species. Group of Economic Interest Jouy en Josas 2004 LABOGENA - MYB 1

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Reljanović, M., Ristov, S., Ćubrić Ćurik, V., Čaćić, M., Ferenčaković, M., Ćurik, I.

Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations

Guidelines. General Rules for ICAR. Section 1 - General Rules

Merging pedigree databases to describe and compare mating practices and gene flow between pedigree dogs in France, Sweden and the UK

Methods to estimate effective population size using pedigree data: Examples in dog, sheep, cattle and horse

Copy number variations and quantitative trait loci in South African Brahman cattle

Pedigree Reconstruction Using Identity by Descent

SNP variant discovery in pedigrees using Bayesian networks. Amit R. Indap

Reduction of inbreeding in commercial females by rotational mating with several sire lines

DNA: UNLOCKING THE CODE

fbat August 21, 2010 Basic data quality checks for markers

Runs of homozygosity: windows into population history and trait architecture

Linear and Curvilinear Effects of Inbreeding on Production Traits for Walloon Holstein Cows

Transcription:

A hidden Markov model to estimate inbreeding from whole genome sequence data Tom Druet & Mathieu Gautier Unit of Animal Genomics, GIGA-R, University of Liège, Belgium Centre de Biologie pour la Gestion des Populations, INRA, France

Introduction Controlling inbreeding in livestock species or in small populations Recessive defects, inbreeding depression, etc. Genomic data Observation of realized inbreeding Pedigree sometimes unavailable

Genomic inbreeding F Estimation with genomic relationship matrix (GRM) Reference population Independent SNPs Global estimate Runs of homozygosity (ROH) Parameter definitions Allele frequencies not used Inappropriate for low-fold sequencing

Hidden Markov models Models the genome as a mosaic of IBD (inbred) and non- IBD segments (e.g., Leutenegger, 2003 - AJHG) 10020110102111100200202021211012110210110120101210011

Hidden Markov models Models the genome as a mosaic of IBD (inbred) and non- IBD segments (e.g., Leutenegger, 2003 - AJHG) 10020110102111100200202021211012110210110120101210011

Emission probabilities Probability of genotype given IBD status (emission prob.): IBD Non-IBD A i A i p i p i ² A i A j ε 2p i p j

Transition probabilities Absence of coancestry change is e -α (α is the transition rate: recombination rate & time to common ancestor) Prob. new coancestry is IBD is F Prob. New coancestry is non-ibd equals (1-F)

Transition probabilities Transition matrix: IBD IBD Non-IBD (1-e -α )F Non-IBD (1-e -α )(1-F)

Transition probabilities Transition matrix: IBD Non-IBD IBD e -α (1-e -α )(1-F) Non-IBD (1-e -α )F e -α

Transition probabilities Transition matrix: IBD Non-IBD IBD e -α + (1-e -α )F (1-e -α )(1-F) Non-IBD (1-e -α )F e -α +(1-e -α )(1-F)

Extension to WGS data Replace genotypes in emission probabilities: Use genotype likelihoods or phred scores incorporating uncertainty on genotype calls (from VCF): P(Data IBD) = p i P(A i A i Data) + p j P(A j A j Data) + ε P(A i A j Data)

Extension to WGS data Replace genotypes in emission probabilities : Use genotype likelihoods or phred scores incorporating uncertainty on genotype calls (from VCF) Use allele counts (allele depth AD) P(AD IBD) = p i P(AD A i A i ) + p j P(AD A j A j ) ε included

Extension to WGS data Replace genotypes in emission probabilities : Use genotype likelihoods or phred scores incorporating uncertainty on genotype calls (from VCF) Use allele counts (allele depth AD) Recent implementations: BCFtools / RoH (Narasimhan et al. Bionformatics, 2016) ngsf-hmm (Viera et al. Bionformatics, 2016)

Limitation Assumes a single inbreeding event (one ancestor) Still a single reference population In livestock species, complex inbreeding Many common ancestors over many generations Variable Ne over time (including bottlenecks)

Mixture of inbreeding classes Mixture of several IBD and nonibd with different age (G) Emission probabilities unchanged Transition probabilities same principle Each distribution with its own mixing proportions

Mixture of inbreeding classes Mixture of several IBD and nonibd with different age (G) Emission probabilities unchanged Transition probabilities same principle Each distribution with its own mixing proportions 10020110102111100200202020200012110210110120101220011

Testing with simulations One distribution (1 age), 500 individuals, medians

Estimated F ~ Simulated F Simulated F = 0.05 and G = 64

Two simulated distributions Simulated Age, G1 = 16 & G2 = 256

Two simulated distributions Mixture of 10 predefined classes (9 IBD, 1 nonibd)

Summary of simulations Simulations with varying age, number of distributions, type of markers, low-fold sequencing data, errors Assessing with estimated age, mixing (1 dist.), global F,, local F, population and individual estimates, estimating K Better when younger F, larger F, more markers, higher MAF, higher cover, large age differences

Belgian Blue cattle (634 bulls) Proportion inbreeding per age class Total F

WGS data (high cover @114x) Sire x MGS mating: expected 25% at G3

WGS data (high cover @114x) Sire x MGS mating: expected 25% at G3 Chr Length (Mb) #het snps #snps Prop. het 2 92.385886 23 192567 1.2e-4 1 51.469735 0 117044 0 21 46.047682 1 107278 9.3e-6 16 44.281690 0 81934 0 2 34.592319 13 80042 1.6e-4 4 33.943960 4 84630 4.7e-5 4 32.406205 0 64784 0 20 30.317150 6 70982 8.4e-5 10 27.445232 2 62643 3.2e-5 23 26.648470 1 74953 1.3e-5

BBB WGS (@10-15x) Longest IBD segments for one sire Chr Lenght (HD) #Het #SNPs Length (WGS geno) #Het #SNPs Prop. Het Lenght (Gen. Lik) 9 94.6 2 23298 84.6 375 182480 0.0025 94.6 22 46.4 1 11834 34.1 82 69465 0.0012 45.2 13 34.0 0 7031 31.3 141 59879 0.0023 34.1 20 20.6 0 5418 20.5 127 48748 0.0026 20.7 8 16.2 0 3331 9.3 41 19566 0.0021 16.2 BovineHD WGS called genotypes WGS likelihoods

BBB WGS (@10-15x) Repartition in IBD classes (geno vs gen. likelihoods)

Whole Genome Sequence 50 sequenced Belgian Blue sires

Conclusions The model uses all the information Sequence of genotypes, allele frequencies, error rates The model classifies inbreeding in different age classes Better than just one (open perspectives) The model estimates local and global inbreeding The model can work with genotyping arrays and sequence data With different allelic spectra