Download Phylogenetics: Character-Based Methods for Tree Construction - Prof. Drena Leigh Dobbs and more Assignments Bioinformatics in PDF only on Docsity! #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 BCB 444/544 Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05 2BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 – 169 Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics • Chp 17 and Chp 18 Required Reading (before lecture) 3BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) 4BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas 5BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB • Sharon Roth Dent MD Anderson Cancer Center • Role of chromatin and chromatin modifying proteins in regulating gene expression • Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB • Jianzhi George Zhang U. Michigan • Evolution of new functions for proteins • Nov 9 Fri - BCB Faculty Seminar 2:10 in 102 SciI • Amy Andreotti ISU • Something about NMR 6BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Tree Construction • Two main categories of tree building methods • Distance-based • Overall similarity between sequences • Character-based • Consider the entire MSA 8BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Summary of Distance-Based Methods • Clustering-based methods: • Computationally very fast and can handle large datasets that other methods cannot • Not guaranteed to find the best tree • Optimality-based methods: • Better overall accuracies • Computationally slow • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node 9BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Character-Based Methods • Based directly on the sequence characters in the MSA rather than overall distances • Count mutational events accumulated on sequences • Evolutionary dynamics of each character can be studied and ancestral sequences inferred • Two popular approaches • Parsimony • Maximum Likelihood (ML) 10BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Parsimony • Parsimony is based on Occam’s Razor – the simplest explanation is most likely correct • Goal: Find the tree that allows evolution of the sequences with the fewest changes 11BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Parsimony • Parsimony score of a tree: The smallest (weighted) number of steps required by the tree • Two parsimony problems: • Large Parsimony problem: Find the tree with the lowest parsimony score • Small Parsimony problem: Given a tree, find its parsimony score • Use the small parsimony problem to solve the large parsimony problem 12BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Algorithms for Small Parsimony • Fitch’s algorithm: • Based on set operations • Evolutionary steps have the same weight • Sankoff’s algorithm: • Based on dynamic programming • Allows steps to have different weights • Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Branch and Bound 26BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Branch and Bound • One way to find a reasonable lower bound quickly: • Use UPGMA or NJ to build a complete tree • Calculate the parsimony score of this tree and use it as a lower bound in our search 27BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Heuristic Search • Shortcuts have been designed to reduce the search space • Idea: Build a tree quickly (by NJ or some other fast method) and rearrange parts of it to explore some of the possible trees • Branch swapping • Nearest neighbor interchange • Subtree pruning and regrafting • Tree bisection and reconnection 28BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Nearest-Neighbor Interchange 29BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Subtree Pruning and Regrafting 30BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Tree Bisection and Reconnection #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Stepwise Addition – Another Heuristic • A greedy method • Start with 3 taxon tree • Add one taxon at a time • Keep only the best tree found so far • No guarantee of optimality, but may provide a good starting point for a search 32BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Maximum Likelihood Method • ML is based on a Markov model of evolution • Observed: The species labeling the leaves • Hidden: The ancestral states • Transition probabilities: The mutation probabilities • Assumptions: • Only mutations are allowed • Sites are independent 33BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Models of Evolution at a Site • Transition probability matrix: M = [mij], i,j {A,C,T,G} Where mij = Prob(i -> j mutation in 1 time unit) Branches may have different lengths 34BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 The Probability of an Assignment A G C T Probability = mTG · mGA · mGG · mTT · mTC · mTT G T T 35BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Ancestral Reconstruction: Most Likely Assignment A G C T L* = maxX,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Y X Z Compute using Viterbi algorithm 36BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Likelihood of a Tree A G C T L* = ∑X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Y X Z Compute using forward algorithm #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Maximum Likelihood Comments • ML is robust • ML converges to the correct answer as more data is added • Can put in a Bayesian statistical framework to obtain a distribution of possible phylogenies • ML can be slow 38BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Phylogenetic Tree Evaluation • Bootstrapping • Jackknifing • Bayesian Simulation • Statistical difference tests (are two trees significantly different?) • Kishino-Hasegawa Test (paired t-test) • Shimodaira-Hasegawa Test (χ2 test) 39BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bootstrapping • A bootstrap sample is obtained by sampling sites randomly with replacement • Obtain a data matrix with same number of taxa and number of characters as original one • Construct trees for samples • For each branch in original tree, compute fraction of bootstrap samples in which that branch appears • Assigns a bootstrap support value to each branch • Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples 40BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bootstrapping Comments • Bootstrapping doesn’t really assess the accuracy of a tree, only indicates the consistency of the data • To get reliable statistics, bootstrapping needs to be done on your tree 500 – 1000 times, this is a big problem if your tree took a few days to construct 41BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Jackknifing • Another resampling technique • Randomly delete half of the sites in the dataset • Construct new tree with this smaller dataset, see how often taxa are grouped • Advantage – sites aren’t duplicated • Disadvantage – again really only measuring consistency of the data 42BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bayesian Simulation • Using a Bayesian ML method to produce a tree automatically calculates the probability of many trees during the search • Most trees sampled in the Bayesian ML search are near an optimal tree