Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture Notes on Introduction to Tree Models | BSC 5936, Study notes of Biology

Material Type: Notes; Class: ST:TEACH/LEARN SCIEN; Subject: BIOLOGICAL SCIENCES; University: Florida State University; Term: Fall 2005;

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-fgl-1
koofers-user-fgl-1 🇺🇸

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture Notes on Introduction to Tree Models | BSC 5936 and more Study notes Biology in PDF only on Docsity! Tree Models [JF:33] Fredrik Ronquist October 24, 2005 1 Introduction A phylogenetic model used in statistical inference contains two essential components: the model of the tree and the model of the evolutionary process occurring on that tree. Much of our discussion of phylogenetic models thus far has focused on the latter component, the model of character evolution. In this lecture, we will move our attention to the tree part of the model. 2 The non-clock tree model The non-clock model is the standard model used in phylogenetic inference. As mentioned in one of the previous lectures, it can be considered as a branch-breaking model which allows the evolutionary rate to be different for each branch in the tree. Because of this, the non-clock tree model has a large number of free parameters, one for each branch in the tree. If there are n tips in the tree, there are 2n− 3 branches and hence branch length parameters in the tree model. This means that, for a typical phylogenetic problem, the largest number of free parameters comes from the branch lengths. For instance, consider a GTR model applied to a non-clock tree with 100 tips. There are 8 free parameters in the GTR model (3 free parameters from the stationary state frequencies and 5 from the substitution rates) but an astonishing 197 branch length parameters. In standard maximum likelihood analysis of phylogeny, branch lengths are maximized and not integrated out of the model. This is somewhat surprising given that there are so many branch length parameters; both from a statistical robustness perspective and from an efficiency perspective it would appear preferable to use integrated likelihood to remove them. The reason that this is not 1 BSC5936-Fall 2005-PB,FR Computational Evolutionary Biology done is that the ‘obvious’ parametrization of branch lengths does not allow them to be integrated out (Fig. 1). If we focus on just one branch length, there is typically a maximum likelihood peak at a relatively short length and then the likelihood decreases as the length increases. However, there is a boundary value; the likelihood over the branch length in focus can never be lower than the probability of drawing a sequence at random, that is (1/4)n under the Jukes Cantor model, where n is the length of the sequence. This means that the integrated likelihood over all possible branch length values from 0 to infinity is not bounded; it is infinite. Rather than imposing some arbitrary upper limit on branch lengths, most likelihoodists prefer to bite the bullet and optimize branch lengths instead. Figure 1: Likelihood as a function of branch length. In Bayesian phylogenetics, we need to formulate a prior that allows the posterior to be marginalized over branch lengths. In early applications, a typical solution was to assign branch lengths a uniform prior between 0 and some arbitrary cut-off value that was outside the plausible range of branch length values, for instance 10 or 100. However, this occasionally resulted in strange posteriors because so much of the prior was on extremely long branches. Therefore, Bayesian phylogeneticists today typically use exponential priors on branch lengths. The exponential prior goes from 0 to infinity but it dies off fast enough to produce a bounded integral. This results in the likelihood times the prior to be bounded as well. The exponential branch length prior may seem far from vague. However, if we choose a differ- ent parametrization, the perspective becomes different. In particular, consider the possibility of measuring branch length in terms of change probability instead of in terms of the total amount of expected change per site. With the right parameter choice, the exponential prior is actually uniform on the change probability. For the Jukes Cantor model, there is only one change probability, and it goes from 0 to 3/4. A uniform prior on the change probability can easily be integrated out; that 2 BSC5936-Fall 2005-PB,FR Computational Evolutionary Biology 4 The birth-death process The prior on clock trees described above arises from a special case of the Yule model, which assumes that species evolve by a birth and death process in which the birth and death rates are constant through time. Assume that the instantaneous birth rate is λ and the death rate is µ. Then the probability that a lineage leaves at least one descendant after time t in the Yule model, that is, the probability that it survives until time t, is: s(t) = Prob(n > 0|t) = λ− µ λ− µe(µ−λ)t where n is the number of descendants. The probability that there will be exactly one descendant after time t is: p1(t) = Prob(n = 0|t) = s(t)2e(µ−λ)t We can now calculate the probability of a set of relative speciation times b = {b1, b2, b3, ..., bn−2} given the birth rate λ, the death rate µ, and the tree height T for a set of s species as: p(b|T, λ, µ) ∝ ∏ i λp1(bi) νT where νT = 1− s(T )e(µ−λ)T = ∫ T 0 λp1(t) dt If we set λ = µ, that is, a situation where the birth and death rates are equal, we get p(b|T, λ, µ) ∝ ∏ i 1 + µ (1 + µbi)2 If, instead, we set µ = 0 (no extinction) we get p(b|T, λ, µ) ∝ 1 5 BSC5936-Fall 2005-PB,FR Computational Evolutionary Biology that is, the prior probability distribution we discussed in the previous section on clock trees. In other words, the uniform prior probability distribution on branching times is the one resulting from a pure birth model. The birth-death model can also be combined with a sampling probability, that is, we can accom- modate the common situation that not all tips are present in the sample we are analyzing. See Yang and Rannala (1997) for details. Felsenstein claims that Yang and Rannala’s sampling model is different from randomly sampling n individuals from the survivors of the birth and death process but I am uncertain whether this statement is correct. There are several interesting aspects of the birth-death model. One is that it allows us to estimate birth and death rates by looking at patterns of branch lengths. Many deep nodes suggest a group that has a high extinction rate; many shallow nodes indicate a high speciation rate. 5 The coalescent model The coalescent model is very similar to the birth-death model but it differs in several small details. It will be covered in detail in coming lectures. 6 Relaxed clock tree models Despite the large reduction in the number of parameters, it is often possible to reject the clock model in favor of the non-clock model with real data because of significant rate heterogeneity. In these situations, one can explore the middle ground of relaxed clock models. They allow some variation in evolutionary rates across branches but not as extreme as the non-clock model. We have covered these models previously and they will not be discussed further here. 7 Tree balance and speciation rate There has been ample discussion in the literature about tree balance, that is, how asymmetric trees are, and what it tells us about evolution (Fig. 4). The Yule process tends to generate symmetric trees and several empirical studies suggest that real trees are more asymmetric than expected under this model. Several other models have been explored in the search for a satisfactory explanation 6 BSC5936-Fall 2005-PB,FR Computational Evolutionary Biology of the preponderance of unbalanced trees but none is entirely satisfactory. It seems clear, though, that an important reason for the lack of balance is that speciation and extinction rates are inherited and can vary across the tree. For instance, several studies have shown that the origination of key ecological traits is associated with increased numbers of extant species. Despite the apparent misfit between the Yule process and some of the patterns we see in real trees, it is nevertheless a useful null model and exploratory tool in examining speciation and extinction patterns. In this respect, it resembles the clock model. Figure 4: Tree balance 8 Study Questions 1. How many free parameters are there in a non-clock model for n taxa? In a clock model for n taxa? 2. Why is integrated likelihood rarely used to deal with branch length parameters even though they are nuisance parameters? 3. Describe an appropriate prior for Bayesian analysis of clock trees 4. What are the advantages of an exponential prior over a uniform prior on branch lengths? 5. What is tree balance? 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved