Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Phylogenetics: Character-Based Methods for Tree Construction - Prof. Drena Leigh Dobbs, Assignments of Bioinformatics

An overview of character-based methods for phylogenetics tree construction. It covers the basics of parsimony and maximum likelihood approaches, algorithms like fitch's and sankoff's, and methods for finding most parsimonious trees such as branch and bound. The document also discusses the challenges of finding a true tree and introduces various heuristics and programs for phylogenetic analysis.

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-0w6
koofers-user-0w6 🇺🇸

5

(1)

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Phylogenetics: Character-Based Methods for Tree Construction - Prof. Drena Leigh Dobbs and more Assignments Bioinformatics in PDF only on Docsity! #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 BCB 444/544 Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05 2BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 – 169 Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics • Chp 17 and Chp 18 Required Reading (before lecture) 3BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) 4BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas 5BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB • Sharon Roth Dent MD Anderson Cancer Center • Role of chromatin and chromatin modifying proteins in regulating gene expression • Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB • Jianzhi George Zhang U. Michigan • Evolution of new functions for proteins • Nov 9 Fri - BCB Faculty Seminar 2:10 in 102 SciI • Amy Andreotti ISU • Something about NMR 6BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Tree Construction • Two main categories of tree building methods • Distance-based • Overall similarity between sequences • Character-based • Consider the entire MSA 8BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Summary of Distance-Based Methods • Clustering-based methods: • Computationally very fast and can handle large datasets that other methods cannot • Not guaranteed to find the best tree • Optimality-based methods: • Better overall accuracies • Computationally slow • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node 9BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Character-Based Methods • Based directly on the sequence characters in the MSA rather than overall distances • Count mutational events accumulated on sequences • Evolutionary dynamics of each character can be studied and ancestral sequences inferred • Two popular approaches • Parsimony • Maximum Likelihood (ML) 10BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Parsimony • Parsimony is based on Occam’s Razor – the simplest explanation is most likely correct • Goal: Find the tree that allows evolution of the sequences with the fewest changes 11BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Parsimony • Parsimony score of a tree: The smallest (weighted) number of steps required by the tree • Two parsimony problems: • Large Parsimony problem: Find the tree with the lowest parsimony score • Small Parsimony problem: Given a tree, find its parsimony score • Use the small parsimony problem to solve the large parsimony problem 12BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Algorithms for Small Parsimony • Fitch’s algorithm: • Based on set operations • Evolutionary steps have the same weight • Sankoff’s algorithm: • Based on dynamic programming • Allows steps to have different weights • Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Branch and Bound 26BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Branch and Bound • One way to find a reasonable lower bound quickly: • Use UPGMA or NJ to build a complete tree • Calculate the parsimony score of this tree and use it as a lower bound in our search 27BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Heuristic Search • Shortcuts have been designed to reduce the search space • Idea: Build a tree quickly (by NJ or some other fast method) and rearrange parts of it to explore some of the possible trees • Branch swapping • Nearest neighbor interchange • Subtree pruning and regrafting • Tree bisection and reconnection 28BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Nearest-Neighbor Interchange 29BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Subtree Pruning and Regrafting 30BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Tree Bisection and Reconnection #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Stepwise Addition – Another Heuristic • A greedy method • Start with 3 taxon tree • Add one taxon at a time • Keep only the best tree found so far • No guarantee of optimality, but may provide a good starting point for a search 32BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Maximum Likelihood Method • ML is based on a Markov model of evolution • Observed: The species labeling the leaves • Hidden: The ancestral states • Transition probabilities: The mutation probabilities • Assumptions: • Only mutations are allowed • Sites are independent 33BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Models of Evolution at a Site • Transition probability matrix: M = [mij], i,j {A,C,T,G} Where mij = Prob(i -> j mutation in 1 time unit) Branches may have different lengths 34BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 The Probability of an Assignment A G C T Probability = mTG · mGA · mGG · mTT · mTC · mTT G T T 35BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Ancestral Reconstruction: Most Likely Assignment A G C T L* = maxX,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Y X Z Compute using Viterbi algorithm 36BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Likelihood of a Tree A G C T L* = ∑X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Y X Z Compute using forward algorithm #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Maximum Likelihood Comments • ML is robust • ML converges to the correct answer as more data is added • Can put in a Bayesian statistical framework to obtain a distribution of possible phylogenies • ML can be slow 38BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Phylogenetic Tree Evaluation • Bootstrapping • Jackknifing • Bayesian Simulation • Statistical difference tests (are two trees significantly different?) • Kishino-Hasegawa Test (paired t-test) • Shimodaira-Hasegawa Test (χ2 test) 39BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bootstrapping • A bootstrap sample is obtained by sampling sites randomly with replacement • Obtain a data matrix with same number of taxa and number of characters as original one • Construct trees for samples • For each branch in original tree, compute fraction of bootstrap samples in which that branch appears • Assigns a bootstrap support value to each branch • Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples 40BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bootstrapping Comments • Bootstrapping doesn’t really assess the accuracy of a tree, only indicates the consistency of the data • To get reliable statistics, bootstrapping needs to be done on your tree 500 – 1000 times, this is a big problem if your tree took a few days to construct 41BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Jackknifing • Another resampling technique • Randomly delete half of the sites in the dataset • Construct new tree with this smaller dataset, see how often taxa are grouped • Advantage – sites aren’t duplicated • Disadvantage – again really only measuring consistency of the data 42BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods 11/05/07 Bayesian Simulation • Using a Bayesian ML method to produce a tree automatically calculates the probability of many trees during the search • Most trees sampled in the Bayesian ML search are near an optimal tree
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved