Introduction to Coalescent Models. Biostatistics 666 Lecture 4

Similar documents
Introduction to Coalescent Models. Biostatistics 666

Particle Filters. Ioannis Rekleitis

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

MTBF PREDICTION REPORT

ESTIMATION of population parameters in classical

High Speed ADC Sampling Transients

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

Fault Locations in Transmission Systems by Evolutionary Algorithms

Fall 2018 #11 Games and Nimbers. A. Game. 0.5 seconds, 64 megabytes

Comparison of Two Measurement Devices I. Fundamental Ideas.

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J.

Machine Learning in Production Systems Design Using Genetic Algorithms

Review: Our Approach 2. CSC310 Information Theory

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Uncertainty in measurements of power and energy on power networks

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

The genealogical history of a population The coalescent process. Identity by descent Distribution of pairwise coalescence times

Calculation of the received voltage due to the radiation from multiple co-frequency sources

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Localization of FACTS Devices for Optimal Power Flow Using Genetic Algorithm

Intelligent and Robust Genetic Algorithm Based Classifier

1 GSW Multipath Channel Models

Revision of Lecture Twenty-One

Joint Power Control and Scheduling for Two-Cell Energy Efficient Broadcasting with Network Coding

Open Access Node Localization Method for Wireless Sensor Networks Based on Hybrid Optimization of Differential Evolution and Particle Swarm Algorithm

Queen Bee genetic optimization of an heuristic based fuzzy control scheme for a mobile robot 1

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Safety and resilience of Global Baltic Network of Critical Infrastructure Networks related to cascading effects

USE OF GPS MULTICORRELATOR RECEIVERS FOR MULTIPATH PARAMETERS ESTIMATION

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

A Simple Satellite Exclusion Algorithm for Advanced RAIM

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

NEW EVOLUTIONARY PARTICLE SWARM ALGORITHM (EPSO) APPLIED TO VOLTAGE/VAR CONTROL

Evolving Crushers. P. Hingston L. Barone L. While

Downloaded from ijiepr.iust.ac.ir at 5:13 IRST on Saturday December 15th 2018

Performance Study of OFDMA vs. OFDM/SDMA

Adaptive Phase Synchronisation Algorithm for Collaborative Beamforming in Wireless Sensor Networks

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Forecasting Stock Returns using Evolutionary Artificial Neural Networks 1

Information-Theoretic Comparison of Channel Capacity for FDMA and DS-CDMA in a Rayleigh Fading Environment

Investigation of Hybrid Particle Swarm Optimization Methods for Solving Transient-Stability Constrained Optimal Power Flow Problems

Utility-based Routing

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

N( E) ( ) That is, if the outcomes in sample space S are equally likely, then ( )

Digital Transmission

ASFALT: Ā S imple F āult-tolerant Signature-based L ocalization T echnique for Emergency Sensor Networks

Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications over Cognitive Radio Networks

PSO and ACO Algorithms Applied to Location Optimization of the WLAN Base Station

Research on the Process-level Production Scheduling Optimization Based on the Manufacturing Process Simplifies

Solving Haplotype Assembly Problem Using Harmony Search

A Genetic Algorithm Based Multi Objective Service Restoration in Distribution Systems

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

Coverage Maximization in Mobile Wireless Sensor Networks Utilizing Immune Node Deployment Algorithm

On Operational Availability of a Large Software-Based Telecommunications System

A Predictive QoS Control Strategy for Wireless Sensor Networks

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

Multiple Robots Formation A Multiobjctive Evolution Approach

Development and Performance Evaluation of Mismatched Filter using Differential Evolution

COGNITIVE RADIO ENGINE MODEL UTILIZING SOFT FUSION BASED GENETIC ALGORITHM FOR COOPERATIVE SPECTRUM OPTIMIZATION

Algorithms for Genetics: Basics of Wright Fisher Model and Coalescent Theory

COMPARISION OF POTENTIAL PATHS SELECTED BY A MALICIOUS ENTITY WITH HAZARDOUS MATERIALS : MINIMIZATION OF TIME VS. MINIMIZATION OF DISTANCE

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

Optimization of Shortest Path of Multiple Transportation Model Based on Cost Analyses

Key-Words: - Automatic guided vehicles, Robot navigation, genetic algorithms, potential fields

Power Minimization Under Constant Throughput Constraint in Wireless Networks with Beamforming

Opportunistic Beamforming for Finite Horizon Multicast

New Parallel Radial Basis Function Neural Network for Voltage Security Analysis

A Fuzzy-based Routing Strategy for Multihop Cognitive Radio Networks

Approximating User Distributions in WCDMA Networks Using 2-D Gaussian

FEATURE SELECTION FOR SMALL-SIGNAL STABILITY ASSESSMENT

ph fax

Multichannel Frequency Comparator VCH-315. User Guide

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Modelling Service Time Distribution in Cellular Networks Using Phase-Type Service Distributions

Hybrid Differential Evolution based Concurrent Relay-PID Control for Motor Position Servo Systems

ROBUST IDENTIFICATION AND PREDICTION USING WILCOXON NORM AND PARTICLE SWARM OPTIMIZATION

Mooring Cost Sensitivity Study Based on Cost-Optimum Mooring Design

Topology Control for C-RAN Architecture Based on Complex Network

Rational Secret Sharing without Broadcast

Application of Intelligent Voltage Control System to Korean Power Systems

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

Finding Proper Configurations for Modular Robots by Using Genetic Algorithm on Different Terrains

An Optimal Model and Solution of Deployment of Airships for High Altitude Platforms

Analog Circuit Design with Variable Length Chromosomes

Network Theory. EC / EE / IN. for

1.0 INTRODUCTION 2.0 CELLULAR POSITIONING WITH DATABASE CORRELATION

THEORY OF YARN STRUCTURE by Prof. Bohuslav Neckář, Textile Department, IIT Delhi, New Delhi. Compression of fibrous assemblies

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality.

A Tool for Evolving Artificial Neural Networks

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Equivalent Circuit Model of Electromagnetic Behaviour of Wire Objects by the Matrix Pencil Method

Spatio-temporal community dynamics induced by frequency dependent interactions

Distributed Uplink Scheduling in EV-DO Rev. A Networks

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

Distributed Channel Allocation Algorithm with Power Control

Design of IIR digital filter using Simulated Annealing

Transcription:

Introducton to Coalescent Models Bostatstcs 666 Lecture 4

Last Lecture Lnkage Equlbrum Expected state for dstant markers Lnkage Dsequlbrum Assocaton between neghborng alleles Expected to decrease wth dstance Measures of lnkage dsequlbrum D, D and ² or r 2

Prevously DNA sequence varaton Types of DNA varants Allele frequences Genotype frequences Hardy-Wenberg Equlbrum

Makng predctons What allele frequences do we expect? How much varaton n a gene? How are neghborng varants related?

Smple Approach: Smulaton. N startng sequences 2. Sample N offsprng sequences Apply mutatons accordng to µ 3. Increment tme 4. If enough tme has passed Generate fnal sample Stop. 5. Otherwse, return to step.

Smulatng a Populaton Sequences Tme

Today Introduce coalescent approach Framework for studyng genetc varaton Provdes ntuton on patterns of varaton Provdes analytcal solutons

Am Gene genealoges: Descrptons of relatedness between sequences Analogous to phylogenetc trees for speces The shape of the genealogy depends on populaton hstory, selecton, etc. Together wth mutaton rate, genealogy predcts DNA varaton

Genealogy Hstory of a partcular set of sequences Descrbes ther relatedness Specfes dvergence tmes Includes only a subset of the populaton Most Recent Common Ancestor (MRCA)

Coalescent approach Generate genealogy for a sample of sequences. Introduces computatonal and analytcal convenence. Instead of proceedng forward through tme, go backwards!

Hstory of the Populaton

Genealogy of Fnal Populaton

Levels of Complexty Hstory of the populaton Includes sequences that are extnct Hstory of all modern sequences Includes sequences that we haven t sampled Hstory of a subset of modern sequences Mnmalst approach!

Parameters we wll focus on Mutaton rate (µ) Populaton Sze Haplod populaton (N chromosomes) Dplod populaton (2N chromosomes) Tme (t) Sample sze (n) Recombnaton rate (r)

Other Parameters Selecton For gene of nterest For neghborng gene Demographc parameters Mgraton Populaton Structure Populaton Growth

Mutaton Model The mutaton process s complex Rate depends on surroundng sequence Reverse mutatons are possble Two smple models are popular Infnte alleles Every mutaton generates a dfferent allele Infnte stes Every mutaton occurs at a dfferent ste

Mutaton Model Focus on nfnte stes model Mutaton rate n genomc DNA s ~0-8 / bp Recurrent mutatons should be very rare Scaled mutaton rate parameter, e.g.: 000 bp sequence 0-8 mutatons per base par per generaton µ 0-5 per sequence per generaton

Neutral Varants Varants that have do not affect ftness Accumulate nexorably through tme Lost through genetc drft Do not affect genealogy

Example: Modelng Accumulaton of Mutatons Populaton of dentcal sequences Sample one descendant after t generatons How many mutatons have accumulated? Hnt: depends on mutaton rate µ and tme t Tougher questons How many mutatons have been fxed? How much varaton n the total populaton?

So far Dvergence of a sngle sequence Accumulaton of mutatons Depends on tme t Depends on mutaton rate µ Does not depend on populaton sze N Does not depend on populaton growth Next: A par of sequences!

A tougher example Sample of two sequences 00 bp each How many dfferences are expected? Populaton of sze, N 000 Mutaton rate µ 0-8 / bp / generaton µ 0-6 / 00 bp / generaton

Genealogy of two sequences MRCA Tme T(2) Sequence Sequence 2 Mutatons between MRCA and Sequence?

Genealogy of two sequences MRCA Tme T(2) Sequence Sequence 2 Total mutatons n genealogy?

Number of mutatons S Dstrbuted as Posson, condtonal on total tree length E(S) µe(t tot ) Var(S) E[Var(S T)] + Var[E(S T)] µe(t tot ) + µ²var(t tot ) T tot s the total length of all branches

Estmatng T(2) Probablty that two sequences have dstnct ancestors n prevous generaton N P( 2) N N Probablty of dstnct ancestors for t generatons s P(2) t

Probablty of MRCA at tme t+ P(2) t ( P(2)) N N N t N N t N e t N

For n > 2 Coalescence when two sequences have common ancestor For smplcty, consder the possblty of multple smultaneous coalescent events to be neglgble Requrements for no coalescence: Pck one ancestor for sequence Pck dstnct ancestor for sequence 2 Pck yet another ancestor for sequence 3

Estmatng P(n) Probablty that n sequences have n dstnct ancestors n prevous generaton P( n) n N N n 2 N Assume: N s large n s small Terms of order N -2 can be gnored

Probablty of Coalescence at Tme t+ t N n t t e N n N n N n n P n P 2 2 2 2 )) ( ( ) (

Tme to next coalescent event Use an exponental dstrbuton to approxmate tme to next coalescent event Decay Rate Mean λ λ n 2 N N n 2

T(j) For convenence, measure tme to next coalescent event n unts: N generatons for haplods 2N generatons for dplods E( T j ) / j 2 How would you calculate tme to MRCA of n sequences?

Total Tme n Tree Sum of all the branch lengths Total evolutonary tme avalable e.g. for mutatons to occur 2 2 2 2 2 ) ( 2 ) ( ) ( n n n n tot T T E

T MRCA vs. T TOT T MRCA T TOT.0.2.4.6.8 2.0 Relatve Sum of Branch Lengths 2 4 6 8 0 0 20 40 60 80 00 0 20 40 60 80 00 Number of Sequences Number of Sequences Relatve Tme to MRCA

Number of Segregatng Stes Commonly named S Total number of mutatons n genealogy Assumng no recurrent mutaton A functon of the total length of the genealogy T tot

Expected number of mutatons Factor N for haplods, 2N for dplods Populaton genetcsts defne θ4nµ (for dplods) For gene mappng, θ s usually recombnaton rate Populaton genetcsts, use r for recombnaton rates ( ) 2 / / 4 ) ( 2 ) ( n n n N T E N S E θ µ µ

Expected number of mutatons Factor N for haplods, 2N for dplods Populaton genetcsts defne θ4nµ (for dplods) For gene mappers, θ s usually the recombnaton rate Populaton genetcsts, use r for recombnaton rates ( ) 2 / / 4 ) ( 2 ) ( n n n N T E N S E θ µ µ

E(S) as a functon of n Expected Number of Segregatng Stes 0 2 4 6 8 0 2 4 Parameters N 0,000 ndvduals µ 0-4 θ 4 2 3 4 5 6 7 8 9 0 2 4 6 8 20 Sample Sze

More about S Very large varance Var( S) θ n / + θ 2 n / 2 Most of the varance contrbuted by early coalescent events (.e. wth small n)

Var(S) as a functon of n 2 3 4 5 6 7 8 9 0 2 4 6 8 20 Sample Sze Parameters N 0,000 ndvduals µ 0-4 θ 4 Varance n Number of Segregatng Stes 0 0 20 30 40 50 60 70

Inferences about θ Could be estmated from S Dvde by expected length of genealogy ˆ θ n S / Could then be used to: Estmate N, f mutaton rate µ s known Estmate µ, f populaton sze N s known

^ Var(θ) as a functon of N Varance n Estmate of Theta 0.0 0.2 0.4 0.6 0.8.0.2 Parameters N 0,000 ndvduals µ 0-4 θ 4 2 5 8 4 7 20 23 26 29 32 35 38 4 44 47 50 Sample Sze

Alternatve Estmator for θ Count parwse dfferences between sequences Compute average number of dfferences ~ θ n 2 n n S j j +

Today Probablty of coalescence events Length of genealogy and ts branches Expected number of mutatons Smple estmates of θ

Recommended Readng Rchard R. Hudson (990) Gene genealoges and the coalescent process Oxford Surveys n Evolutonary Bology, Vol. 7. D. Futuyma and J. Antonovcs (Eds). Oxford Unversty Press, New York.