Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture 8: Bioinformatic Algorithms - Sequence Alignment and Suffix Trees, Study notes of Computer Science

A part of the lecture notes for cmsc423: bioinformatic algorithms, databases and tools, fall 2008. It covers the topics of sequence alignment, exact and inexact, and the use of suffix trees for matching and substring searches. The document also discusses suffix links and their applications, such as finding repeats and longest common substrings.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-e86-1
koofers-user-e86-1 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture 8: Bioinformatic Algorithms - Sequence Alignment and Suffix Trees and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 8 Sequence alignment: exact alignment inexact alignment dynamic programming, gapped alignment CMSC423 Fall 2008 2 Suffix trees for matching • Suffix trees use O(n) space • Suffix trees can be constructed in O(n) time • Is CAT part of ATCATG ? • Match from root, char by char • If run out of query – found match • otherwise, there is no match • intuition: CAT is the prefix of some suffix AT 1,2 G$ 6,7 T 2,2 CATG$ 3,7 G$ 6,7 CATG$ 3,7 G$ 6,7 CATG$ 3,7 4 1 6 5 2 3 $ 7,7 7 CMSC423 Fall 2008 5 Why do we care? • Suffix trees are used for – mapping reads to a genome (e.g. personal genomics) – comparing genomes (comparative genomics) – finding repeats – identifying genome signatures • Exact matching – what to expect on exams – build a suffix tree for a string – answer some questions about one of the algorithms, e.g. for Z algorithm – is it necessary j be the farthest reaching Z- value or just any Z value extending past i? – do something with the help of some of the algorithms (e.g. look for repeats that occur exactly twice, etc.) CMSC423 Fall 2008 6 Suffix arrays • Suffix trees are expensive > 20 bytes / base • Suffix arrays: lexicographically sort all suffixes • Can quickly find the correct suffix through binary search • Note: much less space, but longer running time (incur a log(n) term) ATG 4 ATCATG 1 CATG 3 G 6 TCATG 2 TG 5 CMSC423 Fall 2008 7 Suffix arrays and compression • Burrows-Wheeler transform BANANA BANANA$ ANANA$B NANA$BA ANA$BAN NA$BANA A$BANAN $BANANA $BANANA A$BANAN ANA$BAN ANANA$B BANANA$ NA$BANA NANA$BA sort ANNB$AA compress character before the suffix BWT Note: characters in last column occur in same order as in first column Useful for matching within BWT
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved