Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Dynamic Programming in Computational Biology: RNA Structure & Alignment, Study notes of Algorithms and Programming

The dynamic programming solutions for computing the secondary structure of rna and the sequence alignment problem. It covers the definition of rna secondary structure, the computational problem, and the dynamic programming solution. The document also explains the concept of edit distance and similarity metric in the context of sequence alignment. It is part of the cs473ug course taught by viswanathan.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-p7c
koofers-user-p7c 🇺🇸

10 documents

1 / 96

Toggle sidebar

Related documents


Partial preview of the text

Download Dynamic Programming in Computational Biology: RNA Structure & Alignment and more Study notes Algorithms and Programming in PDF only on Docsity! CS 473ug: Algorithms Mahesh Viswanathan vmahesh@cs.uiuc.edu 3232 Siebel Center University of Illinois, Urbana-Champaign Spring 2008 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Part I Dynamic Programming in Computational Biology Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution RNA Ribonucleic Acid (RNA): Key components in cellular processes Single-stranded RNA can be viewed as a string of bases: A (adenine),C (cytosine) ,G (guanine),U (uracil) Secondary Structure: The single strand loops back to form bonds between base pairs. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution RNA molecule A C G U C G A U U C G A G CG AAU C G U A A CG A U A C G AGC AU A G C G G C U A G A C Figure: RNA structure for ACGUCGAUUCGAGCGAAUCGUAACGAUACGAGCAUAGCGGCUAGACViswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Secondary Structure Definition Secondary structure on an RNA (string) B is a set of pairs S = {(i , j) | 1 ≤ i , j ≤ n} satisfying 1 [Watson-Crick] No base appears in more than one pair, and each pair is either {A,U} or {C ,G} 2 [No sharp turns] Ends of each pair are separated by distance at least 4, i.e., i < j − 4 3 [Non-crossing] If (i , j) and (k , `) are pairs then we cannot have i < k < j < ` Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Secondary Structure Definition Secondary structure on an RNA (string) B is a set of pairs S = {(i , j) | 1 ≤ i , j ≤ n} satisfying 1 [Watson-Crick] No base appears in more than one pair, and each pair is either {A,U} or {C ,G} 2 [No sharp turns] Ends of each pair are separated by distance at least 4, i.e., i < j − 4 3 [Non-crossing] If (i , j) and (k , `) are pairs then we cannot have i < k < j < ` Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Examples of Secondary Structure A U G G G G C A U A U G G G G C A U Figure: Has sharp turn A G U U GG C C A U A G U U G G C C A U Figure: Has crossing A U G U GG C C A U A U G U G G C C A U Figure: Correct pairing Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Examples of Secondary Structure A U G G G G C A U A U G G G G C A U Figure: Has sharp turn A G U U GG C C A U A G U U G G C C A U Figure: Has crossing A U G U GG C C A U A U G U G G C C A U Figure: Correct pairing Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Free Energy Example Consider the RNA molecule AGGAUCGCCU A G G A UC G C C U A G G A UC G C C U Which structure is exhibited? Observation RNA assumes the secondary structure of minimum free energy. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Free Energy Example Consider the RNA molecule AGGAUCGCCU A G G A UC G C C U A G G A UC G C C U Which structure is exhibited? Observation RNA assumes the secondary structure of minimum free energy. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Free Energy Example Consider the RNA molecule AGGAUCGCCU A G G A UC G C C U A G G A UC G C C U Which structure is exhibited? Observation RNA assumes the secondary structure of minimum free energy. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Defining Subproblems For b1 . . . bn, let Opt(j) be the maximum number of base pairs in b1 . . . bj Case bn is not paired in secondary structure. Optimal structure is obtained from the optimal structure for b1, . . . bn−1, i.e., Opt(n − 1) Case bn is paired with bi in optimal pairing. 1 i n Results in two subproblems: find optimal structure in b1 . . . bi−1 (Opt(i − 1)) and bi+1 . . . bn−1 (???) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Defining Subproblems For b1 . . . bn, let Opt(j) be the maximum number of base pairs in b1 . . . bj Case bn is not paired in secondary structure. Optimal structure is obtained from the optimal structure for b1, . . . bn−1, i.e., Opt(n − 1) Case bn is paired with bi in optimal pairing. 1 i n Results in two subproblems: find optimal structure in b1 . . . bi−1 (Opt(i − 1)) and bi+1 . . . bn−1 (???) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Defining Subproblems For b1 . . . bn, let Opt(j) be the maximum number of base pairs in b1 . . . bj Case bn is not paired in secondary structure. Optimal structure is obtained from the optimal structure for b1, . . . bn−1, i.e., Opt(n − 1) Case bn is paired with bi in optimal pairing. 1 i n Results in two subproblems: find optimal structure in b1 . . . bi−1 (Opt(i − 1)) and bi+1 . . . bn−1 (???) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Subproblem Formulation Notation Let Opt(i , j) be the optimal pairing for the string bibi+1 . . . bj Relationship of subproblems To compute Opt(i , j) there are 3 cases Case j − i ≤ 4. Opt(i , j) = 0 as not sharp turns allowed. Case j not paired. Opt(i , j) = Opt(i , j − 1) Case j paired with i ≤ k < j − 4. Then Opt(i , j) = 1 + Opt(i , k − 1) + Opt(k + 1, j − 1) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Subproblem Formulation Notation Let Opt(i , j) be the optimal pairing for the string bibi+1 . . . bj Relationship of subproblems To compute Opt(i , j) there are 3 cases Case j − i ≤ 4. Opt(i , j) = 0 as not sharp turns allowed. Case j not paired. Opt(i , j) = Opt(i , j − 1) Case j paired with i ≤ k < j − 4. Then Opt(i , j) = 1 + Opt(i , k − 1) + Opt(k + 1, j − 1) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution Subproblem Formulation Notation Let Opt(i , j) be the optimal pairing for the string bibi+1 . . . bj Relationship of subproblems To compute Opt(i , j) there are 3 cases Case j − i ≤ 4. Opt(i , j) = 0 as not sharp turns allowed. Case j not paired. Opt(i , j) = Opt(i , j − 1) Case j paired with i ≤ k < j − 4. Then Opt(i , j) = 1 + Opt(i , k − 1) + Opt(k + 1, j − 1) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution The Algorithm for k = 0 to 4 for i = 1 to n-k M[i,i+k] = 0 for k = 5 to n-1 for i = 1 to n-k M[i,i+k] = max (M[i,i+k-1], maxj<k (1 + M[i,i+j-1] + M[i+j+1,i+k]) Running time is O(n3) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment RNA overview Secondary Structure Computational Problem Dynamic Programming Solution The Algorithm for k = 0 to 4 for i = 1 to n-k M[i,i+k] = 0 for k = 5 to n-1 for i = 1 to n-k M[i,i+k] = max (M[i,i+k-1], maxj<k (1 + M[i,i+j-1] + M[i+j+1,i+k]) Running time is O(n3) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Edit Distance Definition Edit distance between two words X and Y is the number of letter insertions, letter deletions and letter substitutions required to obtaine Y from X . Example The edit distance between FOOD and MONEY is at most 4 FOOD→ MOOD→ MON tD→ MONED→ MONEY Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Edit Distance: Alternate View Alignment Place words one on top of the other, with gaps in the first word indicating insertions, and gaps in the second word indicating deletions. F O O D M O N E Y Formally, an alignment is a set M of pairs (i , j) such that each index appears at most once, and there is no “crossing”. In the above example, this is M = {(1, 1), (2, 2), (3, 3), (4, 5)}. Cost of an alignment is the number of mismatched columns. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Edit Distance Problem Problem Given two words, find the edit distance between them, i.e., an alignment of smallest cost. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Applications Spell-checkers and Dictionaries Unix diff DNA sequence alignment . . . but, we need a new metric Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Applications Spell-checkers and Dictionaries Unix diff DNA sequence alignment . . . but, we need a new metric Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Similarity Metric Definition For two strings X and Y , the cost of alignment M is [Gap penalty] For each gap in the alignment, we incur a cost δ [Mismatch cost] For each pair p and q that have been matched in M, we incur cost αpq; typically αpp = 0 Edit distance is special case when δ = αpq = 1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Similarity Metric Definition For two strings X and Y , the cost of alignment M is [Gap penalty] For each gap in the alignment, we incur a cost δ [Mismatch cost] For each pair p and q that have been matched in M, we incur cost αpq; typically αpp = 0 Edit distance is special case when δ = αpq = 1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Problem Structure Observation Let X = x1x2 · · · xm and Y = y1y2 · · · yn. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. Case xm and yn are matched. Pay mismatch cost αxmyn plus cost of aligning strings x1 · · · xm−1 and y1 · · · yn−1 Case xm is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm−1 and y1 · · · yn Case yn is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm and y1 · · · yn−1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Problem Structure Observation Let X = x1x2 · · · xm and Y = y1y2 · · · yn. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. Case xm and yn are matched. Pay mismatch cost αxmyn plus cost of aligning strings x1 · · · xm−1 and y1 · · · yn−1 Case xm is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm−1 and y1 · · · yn Case yn is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm and y1 · · · yn−1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Problem Structure Observation Let X = x1x2 · · · xm and Y = y1y2 · · · yn. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. Case xm and yn are matched. Pay mismatch cost αxmyn plus cost of aligning strings x1 · · · xm−1 and y1 · · · yn−1 Case xm is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm−1 and y1 · · · yn Case yn is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm and y1 · · · yn−1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Problem Structure Observation Let X = x1x2 · · · xm and Y = y1y2 · · · yn. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. Case xm and yn are matched. Pay mismatch cost αxmyn plus cost of aligning strings x1 · · · xm−1 and y1 · · · yn−1 Case xm is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm−1 and y1 · · · yn Case yn is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm and y1 · · · yn−1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Problem Structure Observation Let X = x1x2 · · · xm and Y = y1y2 · · · yn. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. Case xm and yn are matched. Pay mismatch cost αxmyn plus cost of aligning strings x1 · · · xm−1 and y1 · · · yn−1 Case xm is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm−1 and y1 · · · yn Case yn is unmatched. Pay gap penalty plus cost of aligning x1 · · · xm and y1 · · · yn−1 Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Subproblems and Recurrence Optimal Costs Let Opt(i , j) be optimal cost of aligning x1 · · · xi and y1 · · · yj . Then Opt(i , j) = min(αxiyj +Opt(i−1, j−1), δ+Opt(i−1, j), δ+Opt(i , j−1)) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Aligning words Sequence Alignment Problem Dynamic Programming Solution Dynamic Programming Solution for all i M[i,0] = iδ for all j M[0,j] = jδ for i = 1 to m for j = 1 to n M[i,j] = min (αxi yj + M[i-1,j-1], δ + M[i-1,j], δ + M[i,j-1] Analysis Running time is O(mn) Space used is O(mn) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Sequence Alignment in Practice Typically the DNA sequences that are aligned are about 105 letters long! So about 1010 ops and 1010 bytes needed The killer is the 10GB storage Can we reduce space requirements? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Sequence Alignment in Practice Typically the DNA sequences that are aligned are about 105 letters long! So about 1010 ops and 1010 bytes needed The killer is the 10GB storage Can we reduce space requirements? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Optimizing Space Recall M(i , j) = min(αxiyj +M(i−1, j−1), δ+M(i−1, j), δ+M(i , j−1)) Entries in jth column only depend on j − 1st column and earlier entries in jth column Only store the current column and the previous column reusing space; N(i , 0) stores M(i , j − 1) and N(i , 1) stores M(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Optimizing Space Recall M(i , j) = min(αxiyj +M(i−1, j−1), δ+M(i−1, j), δ+M(i , j−1)) Entries in jth column only depend on j − 1st column and earlier entries in jth column Only store the current column and the previous column reusing space; N(i , 0) stores M(i , j − 1) and N(i , 1) stores M(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Optimizing Space Recall M(i , j) = min(αxiyj +M(i−1, j−1), δ+M(i−1, j), δ+M(i , j−1)) Entries in jth column only depend on j − 1st column and earlier entries in jth column Only store the current column and the previous column reusing space; N(i , 0) stores M(i , j − 1) and N(i , 1) stores M(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Analyzing Space Efficiency From the m × n matrix M we can construct the actual alignment (exercise) Matrix N computes cost of optimal alignment but no way to construct the actual alignment Space efficient computation of alignment? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Analyzing Space Efficiency From the m × n matrix M we can construct the actual alignment (exercise) Matrix N computes cost of optimal alignment but no way to construct the actual alignment Space efficient computation of alignment? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Analyzing Space Efficiency From the m × n matrix M we can construct the actual alignment (exercise) Matrix N computes cost of optimal alignment but no way to construct the actual alignment Space efficient computation of alignment? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Shortest Paths and Sequence Alignment Proposition Let f (i , j) denote the length of shortest path from (0, 0) to (i , j) in the edit distance graph. Then f (i , j) = Opt(i , j). Proof. By induction on i + j Clearly holds for the base case of i + j = 0 Assume proposition holds whenever i ′ + j ′ < i + j The last edge on shortest path to (i , j) either originates from (i − 1, j − 1) or (i − 1, j) or (i , j − 1) f (i , j) = min(αxi yj + f (i − 1, j − 1), δ + f (i − 1, j), δ + f (i , j − 1)) = min(αxi yj + Opt(i − 1, j − 1), δ + Opt(i − 1, j), δ + Opt(i , j − 1)) = Opt(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Shortest Paths and Sequence Alignment Proposition Let f (i , j) denote the length of shortest path from (0, 0) to (i , j) in the edit distance graph. Then f (i , j) = Opt(i , j). Proof. By induction on i + j Clearly holds for the base case of i + j = 0 Assume proposition holds whenever i ′ + j ′ < i + j The last edge on shortest path to (i , j) either originates from (i − 1, j − 1) or (i − 1, j) or (i , j − 1) f (i , j) = min(αxi yj + f (i − 1, j − 1), δ + f (i − 1, j), δ + f (i , j − 1)) = min(αxi yj + Opt(i − 1, j − 1), δ + Opt(i − 1, j), δ + Opt(i , j − 1)) = Opt(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Shortest Paths and Sequence Alignment Proposition Let f (i , j) denote the length of shortest path from (0, 0) to (i , j) in the edit distance graph. Then f (i , j) = Opt(i , j). Proof. By induction on i + j Clearly holds for the base case of i + j = 0 Assume proposition holds whenever i ′ + j ′ < i + j The last edge on shortest path to (i , j) either originates from (i − 1, j − 1) or (i − 1, j) or (i , j − 1) f (i , j) = min(αxi yj + f (i − 1, j − 1), δ + f (i − 1, j), δ + f (i , j − 1)) = min(αxi yj + Opt(i − 1, j − 1), δ + Opt(i − 1, j), δ + Opt(i , j − 1)) = Opt(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Shortest Paths and Sequence Alignment Proposition Let f (i , j) denote the length of shortest path from (0, 0) to (i , j) in the edit distance graph. Then f (i , j) = Opt(i , j). Proof. By induction on i + j Clearly holds for the base case of i + j = 0 Assume proposition holds whenever i ′ + j ′ < i + j The last edge on shortest path to (i , j) either originates from (i − 1, j − 1) or (i − 1, j) or (i , j − 1) f (i , j) = min(αxi yj + f (i − 1, j − 1), δ + f (i − 1, j), δ + f (i , j − 1)) = min(αxi yj + Opt(i − 1, j − 1), δ + Opt(i − 1, j), δ + Opt(i , j − 1)) = Opt(i , j) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Alignment corresponding to Paths The path in graph determines an alignment If path follows edge from (i − 1, j − 1) to (i , j) then match i and j If path takes (i − 1, j) to (i , j) then add gap in X If path takes (i , j − 1) to (i , j) then add gap in Y Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Computing length of shortest paths . . . . . . . . . . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · . . . . . . i, j δ δ αxi yj m, n 0, 0 Figure: f (i , j) only depends on previous column values f can be computed in O(mn) time and O(m + n) space. What about actual paths?? Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Properties of f (·, ·) and g(·, ·) I Proposition Length of shortest path from (0, 0) to (m, n) that passes through (i , j) is f (i , j) + g(i , j). Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Properties of f (·, ·) and g(·, ·) II Proposition If index q minimizes f (q, n/2) + g(q, n/2) then shortest path from (0, 0) to (m, n) goes through (q, n/2) Proof. Let `∗ be length of shortest path. Thus, we have `∗ ≤ f (q, n/2) + g(q, n/2). Let shortest path go through (p, n/2). Then, `∗ = f (p, n/2) + g(p, n/2) Since q minimizes f (q, n/2) + g(q, n/2), we get `∗ = f (p, n/2) + g(p, n/2) ≥ f (q, n/2) + g(q, n/2) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Properties of f (·, ·) and g(·, ·) II Proposition If index q minimizes f (q, n/2) + g(q, n/2) then shortest path from (0, 0) to (m, n) goes through (q, n/2) Proof. Let `∗ be length of shortest path. Thus, we have `∗ ≤ f (q, n/2) + g(q, n/2). Let shortest path go through (p, n/2). Then, `∗ = f (p, n/2) + g(p, n/2) Since q minimizes f (q, n/2) + g(q, n/2), we get `∗ = f (p, n/2) + g(p, n/2) ≥ f (q, n/2) + g(q, n/2) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Properties of f (·, ·) and g(·, ·) II Proposition If index q minimizes f (q, n/2) + g(q, n/2) then shortest path from (0, 0) to (m, n) goes through (q, n/2) Proof. Let `∗ be length of shortest path. Thus, we have `∗ ≤ f (q, n/2) + g(q, n/2). Let shortest path go through (p, n/2). Then, `∗ = f (p, n/2) + g(p, n/2) Since q minimizes f (q, n/2) + g(q, n/2), we get `∗ = f (p, n/2) + g(p, n/2) ≥ f (q, n/2) + g(q, n/2) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Recursive Algorithm 1 Compute f (·, n/2) and g(·, n/2) 2 Find q that minimizes f (q, n/2) + g(q, n/2) 3 Recursively compute the path from (0, 0) to (q, n/2) 4 Recursively compute the path from (q, n/2) to (m, n) Space Analysis Step 1 uses O(m + n) space Each recursive call (inductive assumption) uses O(m + n) space Total space used (by reusing) is O(m + n) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Recursive Algorithm 1 Compute f (·, n/2) and g(·, n/2) 2 Find q that minimizes f (q, n/2) + g(q, n/2) 3 Recursively compute the path from (0, 0) to (q, n/2) 4 Recursively compute the path from (q, n/2) to (m, n) Space Analysis Step 1 uses O(m + n) space Each recursive call (inductive assumption) uses O(m + n) space Total space used (by reusing) is O(m + n) Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Running Time Analysis 1 Compute f (·, n/2) and g(·, n/2) 2 Find q that minimizes f (q, n/2) + g(q, n/2) 3 Recursively compute the path from (0, 0) to (q, n/2) 4 Recursively compute the path from (q, n/2) to (m, n) Time Analysis Step 1 takes O(mn) time Step 3 takes T (q, n/2) and step 4 takes T (m − q, n/2) T (m, n) ≤ cmn + T (q, n/2) + T (m − q, n/2) T (m, 2) ≤ cm T (2, n) ≤ cn Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Running Time Analysis 1 Compute f (·, n/2) and g(·, n/2) 2 Find q that minimizes f (q, n/2) + g(q, n/2) 3 Recursively compute the path from (0, 0) to (q, n/2) 4 Recursively compute the path from (q, n/2) to (m, n) Time Analysis Step 1 takes O(mn) time Step 3 takes T (q, n/2) and step 4 takes T (m − q, n/2) T (m, n) ≤ cmn + T (q, n/2) + T (m − q, n/2) T (m, 2) ≤ cm T (2, n) ≤ cn Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Running Time Analysis 1 Compute f (·, n/2) and g(·, n/2) 2 Find q that minimizes f (q, n/2) + g(q, n/2) 3 Recursively compute the path from (0, 0) to (q, n/2) 4 Recursively compute the path from (q, n/2) to (m, n) Time Analysis Step 1 takes O(mn) time Step 3 takes T (q, n/2) and step 4 takes T (m − q, n/2) T (m, n) ≤ cmn + T (q, n/2) + T (m − q, n/2) T (m, 2) ≤ cm T (2, n) ≤ cn Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Solving the Recurrence Guess that T (m, n) ≤ kmn; prove by induction that guess is correct. Guess hold for base cases, when k ≥ c Assume claim holds for all (m′, n′) where m′n′ ≤ mn T (m, n) ≤ cmn + T (q, n/2) + T (m − q, n/2) ≤ cmn + kqn/2 + k(m − q)n/2 = (c + k/2)mn If k = 2c then claim holds in the inductive step. Viswanathan CS473ug RNA Secondary Structure Sequence Alignment Space Efficient Sequence Alignment Computing optimal alignment cost Graph Interpretation of Problem Space Efficient Sequence Alignment Solving the Recurrence Guess that T (m, n) ≤ kmn; prove by induction that guess is correct. Guess hold for base cases, when k ≥ c Assume claim holds for all (m′, n′) where m′n′ ≤ mn T (m, n) ≤ cmn + T (q, n/2) + T (m − q, n/2) ≤ cmn + kqn/2 + k(m − q)n/2 = (c + k/2)mn If k = 2c then claim holds in the inductive step. Viswanathan CS473ug
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved