Download Chaining Algorithms and Multiple Alignment - Lecture Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 12 chaining algorithms multiple alignment CMSC423 Fall 2008 2 Jobs • Applied Predictive Technologies – looking for the best students – focus on databases (forwarded by Daniel Hackner) -not bioinformatics CMSC423 Fall 2008 5 Chaining in 1D • Sort the endpoints (starts, ends) of the intervals • For every interval j, store V[j] – best score of a chain ending in j • MAX – store highest V[j] seen sofar • Process endpoints in increasing order of x coordinate • If we encounter left end (start) of interval j – V[j] = weight(j) + MAX • If we encounter right end (end) of interval j – MAX = max{V[j], MAX} • Running time? CMSC423 Fall 2008 6 Chaining in 2-D • Easy to do in O(n2) (n - # of intervals) • View alignments as "boxes" • All boxes in a chain must follow each other in a "diagonal" order, i.e. the range of the x coordinates and y coordinates of any two boxes in a chain cannot overlap • Similar to 1-D approach except at each step we must check if current box can extend any of the previously built chains • V[j] = maxall previous boxes k {V[k] + weight(j)} • More complex algorithm leads to O(n log n) running time CMSC423 Fall 2008 7 Multiple sequence alignment CMSC423 Fall 2008 10 But....here's a solution • Dynamic programming solution. e.g. 3 sequences • Score(i, j, k) – optimal alignment between s1[1..i], s2[1..j], s3[1..k] – do DP as usual • s(i,j,k) = max { s(i-1, j-1, k-1) + match(s1[i], s2[j], s3[k]), ... s1s2 s3 CMSC423 Fall 2008 11 But... it's expensive • 3 sequences – need to fill in the cube O(n3) • k sequences – k-dimensional cube O(nk) time/space • There are tricks that can help – similar to AI techniques for reducing the search space • Basic idea – if we can estimate optimal score, we can prune the search space. • Note – these are just heuristics – not guaranteed to work faster CMSC423 Fall 2008 12 Alternative – approximation algorithm • Can we efficiently compute a multiple alignment with a score that's not too bad? • The Star method: – build all k2 pairwise alignments (O(k2n2)) – pick sequence sc that is closest to all other sequences: sum si D(sc, si) is minimal over all choices of sc – iteratively align each sequence to sc • Theorem: sum-of-pairs score of star alignment is at most twice as big as optimal multiple alignment score