Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Dynamic Programming in Bioinformatics: Sequence Alignment - Prof. Drena Leigh Dobbs, Lab Reports of Bioinformatics

A lecture note from a bioinformatics course at iowa state university (isu) in fall 2007. It covers the topic of dynamic programming as applied to sequence alignment in the context of bioinformatics. The motivation for sequence alignment, the differences between orthologs and paralogs, sequence similarity versus identity, and the goal of sequence alignment. It also introduces the concept of a scoring function and the difference between global and local alignment. Adapted from various sources, including brown and caragea (2007) and slides from altman, fernandez-baca, batzoglou, craven, hunter, and page.

Typology: Lab Reports

Pre 2010

Uploaded on 09/02/2009

koofers-user-tvn
koofers-user-tvn 🇺🇸

5

(1)

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Dynamic Programming in Bioinformatics: Sequence Alignment - Prof. Drena Leigh Dobbs and more Lab Reports Bioinformatics in PDF only on Docsity! #5 - Dynamic Programming 8/29/07 BCB 444/544 Fall 07 Dobbs 1 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 1 BCB 444/544 Lecture 5 Dynamic Programming #5_Aug29 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 2 Required Reading (before lecture) Mon Aug 27 - for Lecture #4 Pairwise Sequence Alignment • Chp 3 - pp 31-41 Wed Aug 29 - for Lecture #5 Dynamic Programming • Eddy: What is Dynamic Programming? 2004 Nature Biotechnol 22:909 Thurs Aug 30 - Lab #2: Databases, ISU Resources & Pairwise Sequence Alignment Fri Aug 31 - for Lecture #6 Scoring Matrices and Alignment Statistics • Chp 3 - pp 41-49 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 3 Review: Chp 2- Biological Databases • Xiong: Chp 2 Introduction to Biological Databases • What is a Database? • Types of Databases • Biological Databases • Pitfalls of Biological Databases • Information Retrieval from Biological Databases 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 4 Types of Databases 3 Major types of electronic databases: 1. Flat files - simple text files • no organization to facilitate retrieval 2. Relational - data organized as tables ("relations") • shared features among tables allows rapid search 3. Object-oriented - data organized as "objects" • objects associated hierarchically 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 5 Examples of Biological Databases 1- Primary • DNA sequences • GenBank - USA • European Molecular Biology Lab - EMBL • DNA Data Bank of Japan - DDBJ • Structures (Protein, DNA, RNA) • PDB - Protein Data Bank • NDB - Nucleic Acid Data Bank 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 6 Examples of Biological Databases 2- Secondary • Protein sequences • Swiss-Prot, TreEMBL, PIR • these recently combined into UniProt 3- Specialized • Species-specific (or "taxonomic" specific) • Flybase, WormBase, AceDB, PlantDB • Molecule-specific, disease-specific See: http://www.oxfordjournals.org/nar/database/c/ #5 - Dynamic Programming 8/29/07 BCB 444/544 Fall 07 Dobbs 2 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 7 BEWARE! SUMMARY: #2- Biological Databases Who was that Icelandic fellow? 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 8 Chp 3- Sequence Alignment SECTION II SEQUENCE ALIGNMENT Xiong: Chp 3 Pairwise Sequence Alignment • Evolutionary Basis • Sequence Homology versus Sequence Similarity • Sequence Similarity versus Sequence Identity • Methods • Scoring Matrices • Statistical Significance of Sequence Alignment Adapted from Brown and Caragea, 2007, with some slides from: Altman, Fernandez-Baca, Batzoglou, Craven, Hunter, Page. 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 9 Motivation for Sequence Alignment "Sequence comparison lies at the heart of bioinformatics analysis." Jin Xiong Sequence comparison is important for drawing functional & evolutionary inferences re: new genes/proteins Pairwise sequence alignment is fundamental; it used to: • Search for common patterns of characters • Establish pair-wise correspondence between related sequences Pairwise sequence alignment is basis for: • Database searching (e.g., BLAST) • Multiple sequence alignment (MSA) 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 10 Homology Homology has a very specific meaning in evolutionary & computational biology - & term is often used incorrectly For us: Homology = similarity due to descent from a common evolutionary ancestor But, HOMOLOGY ≠ SIMILARITY When 2 sequences share a sufficiently high degree of sequence similarity (or identity), we may infer that they are homologous We can infer homology from similarity (can't prove it!) 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 11 Orthologs vs Paralogs 2 types of homologous sequences: • Orthologs - "same genes" in different species; • result of common ancestry • corresponding proteins have "same" functions (e.g., human α-globin & mouse α-globin) • Paralogs - "similar genes" within a species; • result of gene duplication events • proteins may (or may not) have similar functions (e.g., human α-globin & human β-globin) A is the parent gene Speciation leads to B & C Duplication leads to C’ B and C are Orthologous C and C’ are Paralogous Speciation Duplication B A C C' 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 12 Sequence Homology vs Similarity • Homologous sequences - sequences that share a common evolutionary ancestry • Similar sequences - sequences that have a high percentage of aligned residues with similar physicochemical properties (e.g., size, hydrophobicity, charge) IMPORTANT: • Sequence homology: • An inference about a common ancestral relationship, drawn when two sequences share a high enough degree of sequence similarity • Homology is qualitative • Sequence similarity: • The direct result of observation from a sequence alignment • Similarity is quantitative; can be described using percentages #5 - Dynamic Programming 8/29/07 BCB 444/544 Fall 07 Dobbs 5 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 25 Global vs Local Alignment Which should be used when? Both are important but it is critical to use right method for a given task! Global alignment: • Good for: aligning closely related sequences of similar length • Not good for: divergent sequences or sequences with different lengths Local Alignment: • Good for: searching for conserved patterns (domains or motifs) in DNA or protein sequences • Not good for: generating an alignment of closely related sequences Global and local alignments are fundamentally similar; they differ only in optimization strategy used to align similar residues 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 26 Alignment Algorithms 3 major methods for pairwise sequence alignment: 1. Dot matrix analysis 2. Dynamic programming 3. Word or k-tuple methods (later, in Chp 4) 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 27 Dot Matrix Method (Dot Plots) • Place 1 sequence along top row of matrix • Place 2nd sequence along left column of matrix • Plot a dot each time there is a match between an element of row sequence and an element of column sequence • For proteins, usually use more sophisticated scoring schemes than "identical match" • Diagonal lines indicate areas of match • Contiguous diagonal lines reveal alignment; "breaks" = gaps (indels) A C A C G A CC G G 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 28 Interpretation of Dot Plots When comparing 2 sequences: • Diagonal lines of dots indicate regions of similarity between 2 sequences • Reverse diagonals (perpendicular to diagonal) indicate inversions • What do similar patterns mean when comparing a sequence with itself (reverse complement)? • e.g.: Reverse diagonals crossing diagonals (X's) indicate palindromes Exploring Dot Plots 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 29 Dot Matrix Variations Compare 2 sequences • Identify matching regions • Identities for DNA seqs • Similarities for protein seqs Compare sequence with itself • Identify repeated regions • Identify inverted repeats • Identify palindromes For long sequences? • Too many dots! Noisy! • Instead of per "residue," plot one dot per "window" of n matching residues to reduce noise 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 30 Strengths & Weakneses of Dot Plots Strengths: • Fast and easy • Allows direct visual identification of regions of similarity • Repeats, inversions, etc. are readily apparent • Displays all possible matches Weaknesses: • Doesn't generate full alignment - user must "connect the diagonals" • No statistical assessment of quality of alignment (score) • Impractical and noisy for long sequences • Difficult to scale up to muliple alignment #5 - Dynamic Programming 8/29/07 BCB 444/544 Fall 07 Dobbs 6 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 31 Dynamic Programming A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above another with spaces inserted in both to reveal similarity For Pairwise sequence alignment 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 32 Global alignment: Scoring CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches: α Mismatch penalty: β Space/gap penalty: γ Score = αw – βx - γy w = #matches x = #mismatches y = #spaces 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 33 Global alignment: Scoring C T G T C G – C T G C - T G C – C G – T G - -5 10 10 -2 -5 -2 -5 -5 10 10 -5 Total = 11 Reward for matches: 10 Mismatch penalty: 2 Space/gap penalty: 5 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 34 Optimum Alignment • Score of an alignment is a measure of its quality • Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 35 Alignment algorithms • Global: Needleman-Wunsch • Local: Smith-Waterman • Both NW and SW use dynamic programming • Variations: • Gap penalty functions • Scoring matrices 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 36 Dynamic Programming (DP) • As computer science concept - formalized in early 1950's by Bellman at RAND Corporation “Frequently, however, there are only a polynomial number of subproblems… If we keep track of the solution to each subproblem solved, and simply look up the answer when needed, we obtain a polynomial-time algorithm. “ ----Aho, Hopcroft, Ullman • Reported to biologists for sequence alignment problems by Needleman & Wunsch, 1969 #5 - Dynamic Programming 8/29/07 BCB 444/544 Fall 07 Dobbs 7 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 37 Key Idea Score of the best possible alignment that ends at a given pair of positions (i,j) in two sequences is the score of the best alignment previous to those two positions PLUS the score for aligning those two positions Next best alignment = previous best + local best 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 38 Problem Formulation and Notations Given two sequences (strings) • X = x1x2…xN of length N x = AGC N = 3 • Y = y1y2…yM of length M y = AAAC M = 4 Construct a matrix with (N+1) x (M+1) elements, where S(i,j) = score of best alignment of x[1..i]=x1x2…xi with y[1..j]=y1y2…yj S(2,3) = score of best alignment of AG (x1x2) to AAA (y1y2y3) x1 x2 x3 y1 y2 y3 y4 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 39 Dynamic Programming 4 Components: 1. Recursive definition for optimal score 2. Matrix for storing optimal scores of subproblems 3. Bottom-up approach for filling the matrix, by solving smallest subproblems first 4. Traceback of path through matrix to recover the optimal alignment(s) that gave the optimal score 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 40 Global Alignment: Algorithm ! S(i, j) = Score of optimal alignment of x 1..i and y 1..j ! x 1.. i = Prefix of length i of x y 1.. j = Prefix of length j of y 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 41 ! S(i, j) =max S(i "1, j "1) +#(Si,Tj ) S(i "1, j) " $ S(i, j "1) " $ % & ' ( ' ! S(i,0) = "i # $ S(0, j) = " j # $ Initial conditions: Recursive definition: For 1 ≤ i ≤ n, 1 ≤ j ≤ m: S(i,j) satisfies the following relationships: Calculating Score of Optimum Alignment 8/29/07BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming 42 Computing the best current score S(N,M) S(0,0)=0 S(i,j) S(i-1,j)S(i-1,j-1) S(i,j-1) ! S(i, j) =max S(i "1, j "1) +#(xi,y j ) S(i "1, j) + $ S(i, j "1) + $ % & ' ( ' 0 0 1 N 1 M ! S(i,0) = "i # $ S(0, j) = " j # $ InitializationRecursion
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved