Download RNA Folding – Bioinformatic Algorithms, Databases and Tools - Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity! CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 20 RNA folding RNA folding • Function of RNA molecules depends on how they fold, based on nucleotide base-pairing From multiple alignment to structure • Find columns in the alignment where mutations are correlated • Mutual information - how correlated are the columns? GCCUUCGGGC GACAUCGGUC GGCUUTGGCC (......) M i,j=∑ xi ,x j f xi x j log f xix jf xi f x j Mi,j = mutual information between columns i and j fxixj = frequency of each of 16 pairs of nucleotides at columns i and j fxi = frequency of each of 4 nucleotides at column i fxj = frequency of each of 4 nucleotides at column j Mutual information • Ranges from 0 to 2 for a 4-letter alphabet • Correlated columns - mutual information high • Advantages: – Don't need to know how RNA folds - pseudo-knots should “pop” out of the alignment • Disadvantages: – Need many sequences in an alignment (to compute frequencies) – The aligned sequences must be sufficiently divergent (conserved columns provide no information) Nussinov's algorithm • Assumes no pseudo-knots • Dynamic programming approach – maximize # of pairings • S – string of nucleotides representing the RNA molecule • Sub-problem – F[i,j] – score of folding just S[i..j] • Initial values: F[i-1,i] = F[i,i] = F[i, i+1] = 0 G G G A A A U C C G G G A A A U C C F[i+1, j] F[i, j - 1] F[i+1, j-1] + 1 (if paired) maxkF[i,k] + F[k+1,j] GGGAAAUCC ((.(..))) .((..())) G G G A A A U C C G G G A A A U C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 1 2 1 1 2 2 3 2 3 0 0 0 0 0 0 0 0 A better objective function • Find the RNA fold that minimizes the Gibbs free energy • Zucker's algorithm keeps track of: – Stacking energy - f(# of base-pairs in a stem) – Loop energy - f(length of loop) – Bulge energy - f(length of bulge) – Dangle energy - f(length of dangle) • Computation is done with an extension of the traditional (Nussinov) dynamic programming approach • One extension: compute sub-optimal folds – during backtracking, try multiple paths •