Download Pairwise Sequence Alignment using Needleman-Wunsch Method and more Exams Geriatrics in PDF only on Docsity! Week 3 – DNA sequence alignment DNA sequence alignment Pairwise sequence alignment M b h i l i bl i bi i f i• ay e t e s ng e most mportant pro em n o n ormat cs • Several types of “mismatches” Substitutions I insert ons Deletion • Matches/mismatches are typically quantified through a score-based method – for instance a gap may result in a penalty of 5 but a substitution of only 1, - , , - • The best alignment is the one that maximizes the score – minimizes the penalties. • Testing all possible pairwise combinations and then selecting the one with the – highest score – would be in most cases computationally too demanding. • Therefore, we need to use an algorithm that guarantees detection of the alignment, while not requiring the exhaustive search of all the alignment space This can be GMS6181 Genomics and Bioinformatics . done by dynamic programming. Pairwise sequence alignment Week 3 – DNA sequence alignment Dynamic programming • Guaranteed to find the best alignment (Needleman and Wunsch, 1970) • First time dynamic programming was used for the analysis of biological information • Utili it ti t i th d f l l tizes an era ve ma r x me o o ca cu a on • As the iterations occur, a score value is estimated based on pre-defined penalties for matches, mismatches and gaps: S(i-1,j-1) + c(i,j) S(i,j) = max S(i-1,j) + c(i,-) S(i,j-1) + c(-,j) GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = 3 Mismatch = -3 Gap = -5 5 T -25 4 G -20 3 A -15 S(i-1,j-1) + c(i,j) 2 G -10 1 A -5 S(i,j) = max S(i-1,j) + c(i,-) S(i,j-1) + c(-,j) 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 Step 1: Fill in gap values GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = 3 Mismatch = -3 Gap = -5 5 T -25 4 G -20 3 A -15S l hi h f h ibl S(i-1,j-1) + c(i,j) 2 G -10 1 A -5 e ect g est o t ree poss e events S(i,j) = max S(i-1,j) + c(i,-) S(i,j-1) + c(-,j) 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 Step 2: Fill other values GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = 3 Mismatch = -3 Gap = -5 5 T -25 4 G -20 3 A -15S l hi h f h ibl S(0,0) + c(A,A) = 0 + 3 = +3 2 G -10 1 A -5 e ect g est o t ree poss e events S(1,1) = max S(0,1) + c(A,-) = (-5) + (-5) = -10 S(1,0) + c(-,A) = (-5) + (-5) = -10 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 Step 2: Fill other values GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = +3 Mismatch = -3 Gap = -5 5 T -25 4 G -20 3 A -15 -7S l hi h f h ibl S(2,0) + c(A,A) = (-10) + (+3) = -7 2 G -10 -2 1 A -5 +3 e ect g est o t ree poss e events S(3,1) = max S(2,1) + c(A,-) = (-2) + (-5) = -7 S(3,0) + c(-,A) = (-15) + (-5) = -20 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 Step 2: Fill other values GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = +3 Mismatch = -3 Gap = -5 5 T -25 -17 -15 -7 +1 4 G -20 -12 -10 -2 0 3 A -15 -7 -5 +3 -2S l hi h f h ibl S(4,3) + c(T,T) = (-2) + (+3) = +1 2 G -10 -2 0 -5 -10 1 A -5 +3 -2 -7 -12 e ect g est o t ree poss e events S(5,4) = max S(4,4) + c(T,-) = (0) + (-5) = -5 S(5,3) + c(-,T) = (-7) + (-5) = -12 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 Step 2: Fill other values GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment • For example, assume the following penalties have been given for an alignment of the sequences ACAT and AGAGT Match = +3 Mismatch = -3 Gap = -5 5 T -25 -17 -15 -7 +1 4 G -20 -12 -10 -2 0 3 A -15 -7 -5 +3 -2S l hi h f h ibl S(4,3) + c(T,T) = (-2) + (+3) = +1 2 G -10 -2 0 -5 -10 1 A -5 +3 -2 -7 -12 e ect g est o t ree poss e events S(5,4) = max S(4,4) + c(T,-) = (0) + (-5) = -5 S(5,3) + c(-,T) = (-7) + (-5) = -12 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 The Score of this alignment, considering the penalty values GMS6181 Genomics and Bioinformatics assigned, is +1. Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment 5 T -25 -17 -15 4 G -20 -12 -10 -2 3 A -15 -7 -5 +3 S(3,2) + c(G,A) = (-5) + (-3) = -8 2 G -10 -2 0 -5 1 A -5 +3 -2 -7 S(4,3) = max S(3,3) + c(G,-) = (+3) + (-5) = -2 S(4,2) + c(-,A) = (-10) + (-5) = -15 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment 5 T -25 -17 -15 4 G -20 -12 -10 3 A -15 -7 -5 +3 S(2,2) + c(A,A) = (0) + (+3) = +3 2 G -10 -2 0 -5 1 A -5 +3 -2 -7 S(3,3) = max S(2,3) + c(A,-) = (-5) + (-5) = -10 S(3,2) + c(-,A) = (-12) + (-5) = -17 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 GMS6181 Genomics and Bioinformatics Pairwise sequence alignment (Needleman & Wunsch algor.) Week 3 – DNA sequence alignment 5 T -25 -17 4 G -20 -12 3 A -15 -7 S(1,1) + c(G,C) = (+3) + (-3) = 0 2 G -10 -2 0 1 A -5 +3 -2 S(2,2) = max S(1,2) + c(G,-) = (-2) + (-5) = -7 S(2,1) + c(-,C) = (-2) + (-5) = -7 0 - 0 -5 -10 -15 -20 i - A C A T j 0 1 2 3 4 GMS6181 Genomics and Bioinformatics BLAST Pairwise sequence alignment – global vs. local methods Week 3 – DNA sequence alignment • Global pairwise alignment methods such as that proposed by N&W always find the optimal solution, however, they are computationally too demanding making their use essentially unpractical when comparing long sequences, or when comparing one sequence against a large set of sequences in a database. • One alternative is to use a local alignment method – these attempt to find local regions of high similarity. • The most extensively recognized local alignment search tool is BLAST (Altschul et al. 1990) GMS6181 Genomics and Bioinformatics BLAST Pairwise sequence alignment – global vs. local methods Week 3 – DNA sequence alignment • BLAST is an heuristic alignment search method – i.e. finding the best local alignment is not guaranteed, but the final result is very close to it. • Generates a probability value (e-value) that a given hit would be observed by chance (for instance, and e-value of 1 indicates that there is a probability that at least one sequence would be found in the database) • E-values take into consideration three factors: 1. The bit score (more on that later) 2. Length of query 3. Length of database GMS6181 Genomics and Bioinformatics BLAST Week 3 – DNA sequence alignment Bit score • Calculated from the Raw Score (R) >gi|87042769|gb|DQ370117.1| Pinus nelsonii cinnamyl alcohol dehydrogenase (CAD) gene GCAGGATATTGTCATTGCTTTCAGTTATTAGATTTCAATGGCATGAAATGAATGCAAATTGAGTAGATAA R = aI + bX – cO -dG I = number of identities, a = reward X = number of mismatches, b = penalties O = number of gaps, c = penalty for opening a gap GMS6181 Genomics and Bioinformatics G = number of gap extensions, d = penalty for extending a gap