Download Phylogenetics: Distance-Based Methods for Tree Building in Molecular Biology - Prof. Drena and more Lab Reports Bioinformatics in PDF only on Docsity! #30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 BCB 444/544 Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02 2BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Wed Oct 30 - Lecture 29 Phylogenetics Basics • Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 - 169 Required Reading (before lecture) 3BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) 4BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 BCB 544 "Team" Projects Last week of classes will be devoted to Projects • Written reports due: • Mon Dec 3 (no class that day) • Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period See Guidelines for Projects posted online 5BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas 6BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Bob Jernigan BBMB, ISU • Control of Protein Motions by Structure #30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Chp 10 - Phylogenetics SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 10 Phylogenetics Basics • Evolution and Phylogenetics • Terminology • Gene Phylogeny vs. Species Phylogeny • Forms of Tree Representation • Why Finding a True Tree is Dificult • Procedure of Building a Phylogenetic Tree 8BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Tree Building Procedure • Choose molecular markers • Perform MSA • Choose a model of evolution • Determine tree building method • Assess tree reliability 9BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Choice of Molecular Markers • Very closely related organisms - nucleic acid sequence will show more differences • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences • Very distantly related species - use highly conserved protein sequences 10BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Multiple Sequence Alignment • Most critical step in tree building - cannot build correct tree without correct alignment • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one • Most alignments need manual editing • Make sure important functional residues align • Align secondary structure elements • Use full alignment or just parts 11BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Automatic Editing of Alignments • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences • Gblocks – detect and eliminate poorly aligned positions and divergent regions 12BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 How do we measure divergence between sequences? • Simple measure – just count the number of substitutions observed between the sequences in the MSA • Problem – number of substitutions may not represent the number of evolutionary events that actually occurred #30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Clustering-Based Methods • E.g., UPGMA and Neighbor-Joining • A cluster is a set of taxa • Interspecies distances translate into intercluster distances • Clusters are repeatedly merged • “Closest” clusters merged first • Distances are recomputed after merging 26BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 UPGMA • UPGMA – Unweighted Pair Group Method Using Arithmetic Average • Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree) • This assumption is usually wrong • So why use UPGMA? • Very fast 27BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 UPGMA Example 28BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 UPGMA Example 29BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 UPGMA Example 30BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 UPGMA Example #30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining • Idea: Find a pair of taxa that are close to each other but far from other taxa • Implicitly finds a pair of neighboring taxa • No molecular clock assumption 32BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining • NJ corrects for unequal evolutionary rates between sequences by using a conversion step • The conversion step requires calculation of “r-values” and “transformed r-values” 33BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining ∑= iji dr The r-value for a sequence is: The sum of the distances between sequence i and all other sequences 34BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining 2 ' − = n rr ii The transformed r-value for a sequence is: Where n is the number of taxa Transformed r-values are used to determine the distance of a taxon to the nearest node 35BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining ( )jiijij rrdd +−= 2 1' The converted distance between two sequences is: These converted distances are used in building the tree 36BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining ( )[ ] 2 '' jiij iu rrd d −+ = The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is: #30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example 0.550.700.60D 0.450.35C 0.40B CBA 38BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example • Initialize tree into a star shape with all taxa connected to the center • Step 1: Compute r-values and transformed r-values for all taxa 675.0 2 35.1 24 ' 35.16.035.04.0 == − = =++=++= A A ADACABA rr dddr 39BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example • Step 2: Compute converted distances ( ) ( ) 05.1 55.135.1 2 14.0 2 1' −= +−= +−= BAABAB rrdd 40BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example -1.05-1-1D -1-1C -1.05B CBA • Step 3: Fill out converted distance matrix 41BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example • Step 4: Create a node by merging closest taxa • In this example, the distance between A and B is the same as the distance between C and D • We can pick either pair to start with • Let’s pick A and B and create a node called U U ? ? B A DC BA 42BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 Neighbor Joining Example • Step 5: Compute branch lengths • Use the equation for computing the distance from a taxa to a node ( )[ ] ( )[ ] 15.0 2 775.0675.04.0 2 '' = −+ = −+ = BAABAU rrdd U 0.15 0.25 B A