Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Sequence Alignment: Inexact Alignment and Dynamic Programming, Study notes of Computer Science

An overview of sequence alignment, focusing on inexact alignment using dynamic programming and gapped alignment. It covers global and local alignment, the concept of gap penalties, and the use of affine gap penalties. The document also discusses the running times and space requirements of these algorithms, as well as the sources of alignment scores.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-ghb
koofers-user-ghb 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Sequence Alignment: Inexact Alignment and Dynamic Programming and more Study notes Computer Science in PDF only on Docsity! 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 9 Sequence alignment: inexact alignment dynamic programming, gapped alignment, heuristics Play around with alignments • USC alignment library (seqaln) http://www.mhoenicka.de/software/cygwinports/seqaln.html 2 Global alignment recap C - - A G A C T G A G GATGC AGCGTAG GTCAGAC Value(A,A) = 10 Value(A,G) = -5 Value(A,-) = -2 Score[i,j] is the maximum of: 1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned) 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) Global alignment recap 1920910140-14-28C -28-24-20-16-12-8-40- -24 -20 -16 -12 -8 -4 - 2024131434-10A G A C T G 2410141848-6 1014378-6-2 -13 -9 -5 A -2 2 6 G -134812 048-31 -14-10-6-22 GATGC AG-C-GTAG -GTCAG-AC Value(A,A) = 10 Value(A,G) = -5 Value(A,-) = -4 Score[i,j] is the maximum of: 1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned) 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) 5 Running times • All these algorithms run in O(mn) – quadratic time • Note – this is significantly worse than exact matching • On Wednesday we'll talk about speed-up opportunities • BTW, how much space is needed? • If we only need to find the best score (not the exact alignment as well) – O(min(m,n)) • If we need to find the best alignment – elegant divide and conquer algorithm leads to linear space solution. Where do the alignment scores come from? • PAM matrices – PAM1 – based on frequency of mutations between closely related proteins (within 1 "evolutionary step") – PAM 2 - ... within 2 evolutionary steps – ... PAM 250 – commonly used • BLOSUM matrices – Frequency of mutations between proteins that are x% similar – BLOSUM100 – based on proteins that are exactly the same (e.g. score(A,A) is defined but not score(A,G) ) – BLOSUM62 – commonly used • gap scores usually determined empirically 6 BLOSUM62 Heuristics • What if limit the # of differences allowed? E.g. we expect the sequences to be very similar. • Compute 'banded' alignment – stay within # of differences (k) from the diagonal. • Optimal alignment cannot stray too far from diagonal • What if we do not know k? Do binary search to find it k k O(km) running time and space 7 Exclusion methods • Assume P must match T with at most k errors. Find places in T where P cannot match. • Split P into floor(n/k+1)-sized chunks. • If P matches T with less than k errors => at least one chunk matches with no errors • Use any exact matching algorithm to find places where a chunk matches T, then run dynamic programming in that vicinity. • Running time, on average O(m) Exclusion methods Exact match Putative alignment Text Pattern
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved