Download Molecular Evolution: Reconstructing Evolutionary Trees from Distance Matrices and more Slides Discrete Mathematics in PDF only on Docsity! Molecular Evolution Docsity.com Outline • Evolutionary Tree Reconstruction • “Out of Africa” hypothesis • Did we evolve from Neanderthals? • Distance Based Phylogeny • Neighbor Joining Algorithm • Additive Phylogeny • Least Squares Distance Phylogeny • UPGMA • Character Based Phylogeny • Small Parsimony Problem • Fitch and Sankoff Algorithms • Large Parsimony Problem • Evolution of Wings • HIV Evolution • Evolution of Human Repeats Docsity.com Evolutionary Tree of Bears and Raccoons
40-- |
apt
sot
2
20-
Millions of years ago
15-
10-
ED PANDA
® Docsity.com
Evolutionary Trees: DNA-based Approach • 40 years ago: Emile Zuckerkandl and Linus Pauling brought reconstructing evolutionary relationships with DNA into the spotlight • In the first few years after Zuckerkandl and Pauling proposed using DNA for evolutionary studies, the possibility of reconstructing evolutionary trees by DNA analysis was hotly debated • Now it is a dominant approach to study evolution. Docsity.com Out of Africa Hypothesis • Around the time the giant panda riddle was solved, a DNA-based reconstruction of the human evolutionary tree led to the Out of Africa Hypothesis that claims our most ancient ancestor lived in Africa roughly 200,000 years ago Docsity.com mtDNA analysis supports “Out of Africa” Hypothesis • African origin of humans inferred from: • African population was the most diverse (sub-populations had more time to diverge) • The evolutionary tree separated one group of Africans from a group containing all five populations. • Tree was rooted on branch between groups of greatest difference. Docsity.com Evolutionary Tree of Humans (mtDNA) The evolutionary tree separates one group of Africans from a group containing all five populations. Vigilant, Stoneking, Harpending, Hawkes, and Wilson (1991) Docsity.com Evolutionary Tree of Humans: (microsatellites) • Neighbor joining tree for 14 human populations genotyped with 30 microsatellite loci. Docsity.com Rooted and Unrooted Trees In the unrooted tree the position of the root (“oldest ancestor”) is unknown. Otherwise, they are like rooted trees Docsity.com Distances in Trees • Edges may have weights reflecting: • Number of mutations on evolutionary path from one species to another • Time estimate for evolution of one species into another • In a tree T, we often compute dij(T) - the length of a path between leaves i and j dij(T) – tree distance between i and j Docsity.com Distance in Trees: an Exampe d1,4 = 12 + 13 + 14 + 17 + 12 = 68 i j Docsity.com Fitting Distance Matrix • Given n species, we can compute the n x n distance matrix Dij • Evolution of these genes is described by a tree that we don’t know. • We need an algorithm to construct a tree that best fits the distance matrix Dij Docsity.com Fitting Distance Matrix • Fitting means Dij = dij(T) Lengths of path in an (unknown) tree T Edit distance between species (known) Docsity.com Reconstructing a 3 Leaved Tree • Tree reconstruction for any 3x3 matrix is straightforward • We have 3 leaves i, j, k and a center vertex c Observe: dic + djc = Dij dic + dkc = Dik djc + dkc = Djk Docsity.com Additive Distance Matrices Matrix D is ADDITIVE if there exists a tree T with dij(T) = Dij NON-ADDITIVE otherwise Docsity.com Distance Based Phylogeny Problem • Goal: Reconstruct an evolutionary tree from a distance matrix • Input: n x n distance matrix Dij • Output: weighted tree T with n leaves fitting D • If D is additive, this problem has a solution and there is a simple algorithm to solve it Docsity.com Using Neighboring Leaves to Construct the Tree • Find neighboring leaves i and j with parent k • Remove the rows and columns of i and j • Add a new row and column corresponding to k, where the distance from k to any other leaf m can be computed as: Dkm = (Dim + Djm – Dij)/2 Compress i and j into k, iterate algorithm for rest of tree Docsity.com Finding Neighboring Leaves • Closest leaves aren’t necessarily neighbors • i and j are neighbors, but (dij = 13) > (djk = 12) • Finding a pair of neighboring leaves is a nontrivial problem! Docsity.com Neighbor Joining Algorithm • In 1987 Naruya Saitou and Masatoshi Nei developed a neighbor joining algorithm for phylogenetic tree reconstruction • Finds a pair of leaves that are close to each other but far from other leaves: implicitly finds a pair of neighboring leaves • Advantages: works well for additive and other non- additive matrices, it does not have the flawed molecular clock assumption Docsity.com Degenerate Triples • A degenerate triple is a set of three distinct elements 1≤i,j,k≤n where Dij + Djk = Dik • Element j in a degenerate triple i,j,k lies on the evolutionary path from i to k (or is attached to this path by an edge of length 0). Docsity.com Finding Degenerate Triples • If there is no degenerate triple, all hanging edges are reduced by the same amount δ, so that all pair- wise distances in the matrix are reduced by 2δ. • Eventually this process collapses one of the leaves (when δ = length of shortest hanging edge), forming a degenerate triple i,j,k and reducing the size of the distance matrix D. • The attachment point for j can be recovered in the reverse transformations by saving Dij for each collapsed leaf. Docsity.com Reconstructing Trees for Additive Distance Matrices
Docsity.com
bid sao
Ole eno
Ulne on
geri oso in
qjo on
Aoud
Character-Based Tree Reconstruction • Better technique: • Character-based reconstruction algorithms use the n x m alignment matrix (n = # species, m = #characters) directly instead of using distance matrix. • GOAL: determine what character strings at internal nodes would best explain the character strings for the n observed species Docsity.com Parsimony and Tree Reconstruction
ACCC ACCC
4 { 4 4
ACCA ACCG ACCA ATCC
ak
ATCG ATCC ATCG ACCG
Less More
Parsimonious Parsimonious
Score: 6 Score: 5
® Docsity.com
Character-Based Tree Reconstruction
(cont'd)
(a) Parsimony Score=3 (b) Parsimeany Score=2
Figure 10.16 If we label a tree's leaves with characters (in this case, eyebrows and
mouth, each with two states), and choose labels for each internal vertex, we umplicitly
create a parsimony score for the tree. By changing the labels in (a) we are able to create
a tree with a betber parsimony score in (b).
3 Docsity.com
Small Parsimony Problem • Input: Tree T with each leaf labeled by an m- character string. • Output: Labeling of internal vertices of the tree T minimizing the parsimony score. • We can assume that every leaf is labeled by a single character, because the characters in the string are independent. Docsity.com HIV Transmission • Took multiple samples from the patient, the woman, and controls (non-related HIV+ people) • In every reconstruction, the woman’s sequences were found to be evolved from the patient’s sequences, indicating a close relationship between the two • Nesting of the victim’s sequences within the patient sequence indicated the direction of transmission was from patient to victim • This was the first time phylogenetic analysis was used in a court case as evidence (Metzker, et. al., 2002) Docsity.com Evolutionary Tree Leads to
Conviction
Patient
V1.BCM.RT
wae V2Z.BCM.RT
T— V1.MIC.RT
| V2.MIC.RT
1
14
2
RT
a,
16
® Docsity.com
Minimum Spanning Trees • The first algorithm for finding a MST was developed in 1926 by Otakar Borůvka. Its purpose was to minimize the cost of electrical coverage in Bohemia. • The Problem • Connect all of the cities but use the least amount of electrical wire possible. This reduces the cost. • We will see how building a MST can be used to study evolution of Alu repeats Docsity.com Prim’s Algorithm Example
4 6 4 jS 6 4 6 4 5 6
4
Why Prim Algorithm Constructs Minimum Spanning Tree? • Proof: • This proof applies to a graph with distinct edges • Let e be any edge that Prim algorithm chose to connect two sets of nodes. Suppose that Prim’s algorithm is flawed and it is cheaper to connect the two sets of nodes via some other edge f • Notice that since Prim algorithm selected edge e we know that cost(e) < cost(f) • By connecting the two sets via edge f, the cost of connecting the two vertices has gone up by exactly cost(f) – cost(e) • The contradiction is that edge e does not belong in the MST yet the MST can’t be formed without using edge e Docsity.com Minimum Spanning Tree As An
Evolutionary Tree
{| Alus 1: AluJo
subfamilies 9- AluSx
3: AluSq
4: AluSp
5: Aluy
2 | Alus 6: AluYas
subfamilies
3
Aly
subfamilies
5
The evolutionary tree of the 31 Repbase Update subfamilies,
defined as their Minimum Spanning Tree (Kruskal 1956).
14 leaves in this tree = at least 14 A/u source elements.
Docsity.com