Download Bioinformatic Algorithms: Dynamic Programming for Inexact String Alignment and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 10 inexact alignment dynamic programming, gapped alignment CMSC423 Fall 2008 2 Intuition • What is the best way to align strings S1 and S2? • just look at last character for now – what is it aligned to? S1[n] S2[m] S1[n] S2[m] S1[n] S2[m] AG-C-GTAG -GTCAG-A- CMSC423 Fall 2008 5 How do you output the result? • Goal: produce the “nice” string with gaps that is shown in the examples • Idea: create the string backwards – starting from the right • As you follow backtrack pointers: – if you follow diagonal pointer – add characters to both output strings (aligned versions of original strings) – if you move up – add gap character to string represented on the y axis, add string character to string represented on x axis – if you move left – gap goes in string on x axis and character in string on y axis • When you reach (0,0) output the two aligned strings CMSC423 Fall 2008 6 Local vs. global alignment • Can we change the algorithm to allow S1 to be a substring of S2? ACAGTTGACCCGTGCAT ----TG-CC-G------ • Key idea: gaps at the end of S2 are free • Simply change the first row in the DP table to 0s • Answer is no longer Score[n, m], rather the largest value in the last row CMSC423 Fall 2008 7 Sub-string alignment 00000000- -6 -4 -2 - T G C A G 262830186 18208 810 GATGC AGCGTAG CGT CMSC423 Fall 2008 10 Various flavors of alignment • Alignment problem also called "edit distance" – how many changes do you have to make to a string to convert it into another one. • Edit distance also called Levenshtein distance • Local alignment – Smith-Waterman • Global alignment – Needleman-Wunsch 11 Gap penalties CMSC423 Fall 2008 12 How much do we pay for gaps? • In the edit-distance/alignment framework Cost(n gaps in a row) = n * Cost(gap) • This doesn't work for e.g. RNA-DNA alignments ACAGTTCGACTAGAGGACCTAGACCACTCTGT TTCGA----------TAGACCAC • Affine gap penalties Cost(n gaps in a row) = Cost(gap open) + n * Cost(gap) • Gap opening penalty is high, gap extension penalty is low (once we start a gap we might as well pile more gaps on top)