Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Dynamic Programming in Sequence Alignment: A Comprehensive Approach, Study notes of Biology

An in-depth exploration of dynamic programming in sequence alignment. It covers the concepts of scoring matrices, total score, local and global alignment, edit graphs, and pseudo-code. The document also discusses the importance of dynamic programming in solving problems in bioinformatics and its applications in various alignment algorithms like needleman-wunsch, smith-waterman, and local alignment.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-yw1
koofers-user-yw1 🇺🇸

2.5

(4)

10 documents

1 / 40

Toggle sidebar

Related documents


Partial preview of the text

Download Dynamic Programming in Sequence Alignment: A Comprehensive Approach and more Study notes Biology in PDF only on Docsity! Sequence Alignment and Dynamic Programming J. R. Quine Department of Mathematics Sources Part of this lecture is taken from Calculating the secrets of life, Chapter 3: Seeing conserved signals: using algorithms to detect similarities between biosequences, by Eugene W. Meyers. You can print low resolution copies of each page directly from this link. Another reference for this material is Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, by Richard Durbin. Sequence similarity We are looking for similarities between nucleotide or amino acid sequences. Some possible implications of sequence similarity are Proteins have a common evolutionary origin Proteins have a similar function Improving the Score Using Gaps (Without Penalty) 1. ATTACG ATATCG 2. ATTA-CG A-TATCG 3. AT-TACG ATAT-CG The scores are 4, 5, 5 respectively Important concepts Scoring (or substitution) matrix δ. The simplest is the unit scoring matrix as in dot matrix techniques. Total score of alignment, Σi δ(ai , bi). Local alignment, global alignment Edit graph. This helps to formalize the ideas of dynamic programming Rule for Scoring Each Vertex 4 in the Edit Graph S(i,7) = max{ S(a@—1,j —1) + 6(a;,6;), S(a— 1,7) + 6(a;, —), S(i,j —1) + 4(—, bj) } S scores In The Edit Graph S(i-J, j-1) S(i, j-1 ) sa, ~ S(i-1, j )——> Sia, j) 5(4.,-) 4 The Completed Edit Graph Completed graph Dynamic Programming Dynamic programming is a general computational paradigm of wide applicability. A problem can be solved by dynamic programming if the final answer can be determined by computing a tableau of answers to progressively larger subproblems. 4 Alignment Algorithms =» Needleman-Wunsch » Global alignment =» Smith-Waterman » Local alignment Who Invented Dynamic Programming? from Introduction to Computational Biology (Maps, sequences and genomes) by Michael S. Waterman: Homework Problem Calculate the dynamic programming matrix and an optimal alignment for the DNA sequences GAATTC and GATTA, scoring +2 for a match -1 for a mismatch -2 for a gap (2 is the gap penalty) (Note: Do this twice for the different ways to score a gap at the beginning or end of the sequence) Amino Acid Sequences For proteins we work with strings from a 20+ letter alphabet A Ala Alanine R Arg Arginine N Asn Asparagine D Asp Aspartic acid C Cys Cysteine Amino Acid Sequences Q Gln Glutamine E Glu Glutamic acid G Gly Glycine H His Histidine I Ile Isoleucine L Leu Leucine Scoring Matrices for Amino Acids We want a scoring matrix or substitution matrix S for amino acids. A scoring matrix should reflect amino acid properties. A better score should result If amino acids with similar properties are aligned. So S(a,b) should be positive if the residues are very similar and negative if very unsimilar. Amino Acid Properties Type of Amino Acid Properties Amino Acids Amino acids with aliphatic hydrophobic side chains The hydrophobic side chains of these amino acids will not form hydrogen bonds or ionic bonds with other groups. These hydrophobic amino acids tend to be buried in the centre of proteins away from the surrounding aqueous environment. Ala, Val, Leu, lle, Met, Pro, Phe, Trp. Amino acids with uncharged but polar side chains The side chains of these amino acids are uncharged at physiological pH. Ser, Tyr, Asp, Gln, Cys. Amino acids with acidic side chains These have a carboxylic acid group in their side chain and are very hydrophilic. Asp, Glu. Amino acids with basic side chains The positive charge on these side chains makes them hydrophilic and they are likely to be found at the protein surface Lys, Arg, His. Neutral side chain The single hydrogen atom side chain has no strong hydrophobic or hydrophilic properties. Gly Scoring Matrix Should the scoring matrix be decided by scientists with a knowledge of biochemistry, or should it be computed from a analysis of the current database of sequences? Bioinformatics or Biochemistry? Homework Problem Amino acids D, E and K are all charged; V, I, and L are all hydrophobic. What is the average BLOSUM50 score within the charged group of three? Within the hydrophobic group? Between the two groups? Three Alignments, Good, Less Good, and Ugly To illustrate problems in evaluating the significance of an alignment, the figure below shows examples of three pairwise alignments, all to the same region of the human alpha globin protein sequence (SWISS-PROT database identifier HBA_HUMAN). The central line in each alignment indicates identical positions with letters, and similar positions with a plus sign. (Similar pairs of residues are those which have a positive score in the substitution matrix used to score the alignment). tire Alignments (a) HEA_HUMAN GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VE+HGRKV A+++++AH+D++ +4+44+4+L94+LH KL AHBB_HUMAN GNPKVEKAHGEKVLGAFSDGLAHLDNLEGTFATLSELHCDEL (b) HBA_HUMAN GSAQVKGHGKEKVADALTNAVAHV---D--DMPNALSALSDLHAHKL ++ #¢eeHe KV + +A +e +L+ L+++He K LGB2_LUPLU NNPELQAHAGEKVFELVYEAAIQLOVTGVVVTDATLENLGSVHVSKG (c) HBA_HUMAN GSAQVKGHGEKEVADALTNAVAHVDDMPNALSALSD-~---LHAHKL GS+ + G+ *DL ++ H+ De A +AL D *eAHe F11G11.2 GSGYLVGDSLTFVYDLL--VAQHTADLLAANAALLDEF PQFKAHQE Figure 2.1 Three sequence alignments to a fragment of human alpha globin. (a) Clear similarity to human beta globin. (b) A structurally ne ee tee ee eee 6S goin alignment to a nematode glutathione S-transferase homologue saree Third Alignment (c), Ugly (c) shows an alignment with a similar number of identities or conservative changes as (b). However, in this case we are looking at a spurious alignment to a protein, a nematode glutathiamine S- has a completely different structure and function. transferase homologue, that Role of Statistics We would like some probabilistic measure, an alignment score, indicating how closely two sequences are related. We want the score to tell us if there is a significant relationship between the sequences or if what looks good is just a random occurrence. A warm up, the birthday paradox. The probability of n random people having different birthdays is 365!/((365-n)!365 n) . This is ½ for n = 23. Log Odds The score for the maximal scoring alignment between two amino acid sequences using the BLOSUM substitution matrix is based on a log odds system. The theory does not take account of gaps. Some possibilities for scoring gaps are given by the linear gap penalty and the gap extension penalty.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved