Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Phylogenetic Trees in Bioinformatics: Building and Analyzing Evolutionary Relationships - , Study notes of Computer Science

The concepts of phylogenetic trees in the context of bioinformatics, focusing on methods to determine the evolutionary relationships between organisms based on their features. Topics include constructing rooted trees, minimizing state changes with sankoff's algorithm, and clustering sequences using upgma and neighbor-joining. The document also touches upon maximum likelihood methods and tree analysis and display.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-1cz
koofers-user-1cz 🇺🇸

10 documents

1 / 18

Toggle sidebar

Related documents


Partial preview of the text

Download Phylogenetic Trees in Bioinformatics: Building and Analyzing Evolutionary Relationships - and more Study notes Computer Science in PDF only on Docsity! CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 14 phylogenetic trees CMSC423 Fall 2008 2 Phylogeny questions • Given several organisms & a set of features (usually sequence, but also morphological: wing shape/color...) • A. Given a phylogenetic tree – figure out what the ancestors looked like (what are the features of internal nodes) • B. Find the phylogenetic tree that best describes the common evolutionary heritage of the organisms wings, feathers, teeth claws, no wings, fur ? A C AB B B A C C CMSC423 Fall 2008 5 Example 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 10 0 1 0 1 CMSC423 Fall 2008 6 Sankoff's algorithm • At each node v in the tree store s(v,t) – best parsimony score for subtree rooted at v if character stored at v is t • Traverse the tree in post-order and update s(v,t) as follows – assume node v has children u and w – s(v,t) = mini {s(u,i) + score(i,t)} + minj {s(w,j) + score(j,t)} • Character at root will be the one that maximizes s(root, t) • Note – this solves the weighted version. For unweighted set score (i,i) = 0, score(i,j) = 1 for any i,j CMSC423 Fall 2008 7 Trees as clustering • Start with a distance matrix – distance (e.g. alignment distance) between any two sequences (leaves) • Intuitively – want to cluster together the most similar sequences • UPGMA – Unweighted Pair Group Method using Arithmetic averages – Build pairwise distance matrix (e.g. from a multiple alignment) – Pick pair of sequences that are closest to each other and cluster them – create internal node that has the sequences as children – Repeat, including newly created internal nodes in the distance matrix – Key element – must be able to quickly compute distance between clusters (internal nodes) – weighted distance 1 2 1 2 ,1 2 1( , ) ( , ) | || | p cl q cl D cl cl D p q cl cl ∈ ∈ = ∑ CMSC423 Fall 2008 10 Trees as clustering • Note that both UPGMA and NJ assume distance matrix is additive: D(i,j) + D(j,k) = D(i,k) - usually not true but close • Also, NJ can be proven to build the optimal tree! • But, simple alignment distance is not a good metric CMSC423 Fall 2008 11 Maximum likelihood • For every branch S->T of length t, compute P(T|S,t) – likelihood that sequence S could have evolved in time t into sequence T • Find tree that maximizes the likelihood • Note that likelihood of a tree can be computed with an algorithm similar to Sankoffs • However, no simple way to find a tree given the sequences – most approaches use heuristic search techniques • Often, start with NJ tree – then "tweak" it to improve likelihood CMSC423 Fall 2008 12 Tree analysis & display CMSC423 Fall 2008 15 Drawing trees • Trees are easy to draw – just need to figure out how much space the leaves will take • Step 1 – calculate how much space each node will take (how many leaves from current node) • Step 2 – spread out the nodes according to # of leaves • Many ways of optimizing: e.g. width, area • For large trees – 3D displays (there's more room in 3D) – interactive displays (expand contract nodes as needed) CMSC423 Fall 2008 16 Analysis example • Build multiple alignment (e.g. Muscle, ClustalW) • Clean up alignment – manual editing – filters (pre-defined structure information) • Build tree – PAUP – parsimony & others – Phylip – maximum likelihood – Tree-Puzzle –maximum likelihood – etc... (many packages) • Integrated system – ARB – www.arb-home.de CMSC423 Fall 2008 17 Antibiotic resistance in Staphylococcus aureus Green boxes – individual strains in a phylogenetic tree Red diamonds, yellow triangle - acquisition of resistance Hexagon – loss of resistance
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved