Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Inexact Alignment: Dynamic Programming for Bioinformatics, Study notes of Computer Science

A part of the lecture notes for cmsc423: bioinformatic algorithms, databases and tools, fall 2008. It covers the topic of inexact alignment, focusing on dynamic programming and gapped alignment. The importance of inexact alignment due to redundancy in genetic code, the need to account for gaps in aligning ests and sequencing errors. It also includes examples of hemoglobin sequences and the concept of longest common subsequence (lcs). The document then proceeds to explain the dynamic programming approach for sequence alignment, where mis-alignments are no longer free, and the recurrences and dynamic programming table are presented.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-z0f
koofers-user-z0f 🇺🇸

10 documents

1 / 11

Toggle sidebar

Related documents


Partial preview of the text

Download Inexact Alignment: Dynamic Programming for Bioinformatics and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 9 inexact alignment dynamic programming, gapped alignment CMSC423 Fall 2008 2 Inexact alignment CMSC423 Fall 2008 5 HBB_HUMAN      FFESFGDLSTPDAVMGNPKVKAHGKKVL-----GAFSDGLAHLDNLKGTF HBB_HORSE      FFDSFGDLSNPGAVMGNPKVKAHGKKVL-----HSFGEGVHHLDNLKGTF HBA_HUMAN      YFPHF-DLS-----HGSAQVKGHGKKVA-----DALTNAVAHVDDMPNAL HBA_HORSE      YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVGHLDDLPGAL MYG_PHYCA      KFDRFKHLKTEAEMKASEDLKKHGVTVL-----TALGAILKKKGHHEAEL GLB5_PETMA     FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS LGB2_LUPLU     LFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL                *  :   .       . .:: *.  :       :.   : Several hemoglobins From http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/multiple.html CMSC423 Fall 2008 6 Warm-up – Longest Common Subsequence • Given two strings of letters, identify longest string of letters that occurs, in the same order, in both strings AG C GTAG G C G A GTCAG A • Find the longest chain of 1s, moving to the right and down 11A G A C T G 111 11 A 1 G 1 1 11 GATGC CMSC423 Fall 2008 7 Dynamic programming • Idea: re-use previously computed information • LCS[i,j] – longest common subsequence of strings S1[1..i], S2[1..j] 11A G A C T G 111 11 A 1 G 1 1 11 GATGC i j LCS[i,j] is the maximum of: 1.if S1[i] = S2[j] LCS[i-1, j-1] + 1 else LCS[i -1, j-1] 2. LCS[i – 1, j] 3. LCS[i, j – 1] Goal: find LCS[m,n] CMSC423 Fall 2008 10 The recurrences AG-C-GTAG -GTCAG-A- Score[i,j] is the maximum of: 1. Score[i-1, j-1] + Value[S1[i],S2[j]] AG-C-G AG-C-G -GTCAG -GTCAT 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) AG-C-GT -GTCAG- 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) AG-C- -GTCA CMSC423 Fall 2008 11 The dynamic programming table Score[i,j] is the maximum of: 1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned) 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) -14-12-10-8-6-4-20- -14 -10 -8 -6 -4 -2 - A G A C T G -8 -6 -4 A 4 6 8 G 16 4 6 GATGC Value (A, A) = 10 Value (A, G) = -5 Value (A, -) = -2 Note: we only look at 3 adjacent boxes
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved