Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Inexact Alignment Dynamic Programming, and Gapped Alignment | CMSC 423, Study notes of Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-mot
koofers-user-mot 🇺🇸

5

(2)

10 documents

1 / 16

Toggle sidebar

Related documents


Partial preview of the text

Download Inexact Alignment Dynamic Programming, and Gapped Alignment | CMSC 423 and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 9 inexact alignment dynamic programming, gapped alignment CMSC423 Fall 2008 2 Recap 5 Local alignment recap C - - A G A C T G A G GATGC AGCGTAG GTCAGAC Value(A,A) = 10 Value(A,G) = -5 Value(A,-) = -2 Score[i,j] is the maximum of: 0. 0 1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned) 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) CMSC423 Fall 2008 6 Alignment scores CMSC423 Fall 2008 7 Where do the alignment scores come from? • PAM matrices – PAM1 – based on frequency of mutations between closely related proteins (within 1 "evolutionary step") – PAM 2 - ... within 2 evolutionary steps – ... PAM 250 – commonly used • BLOSUM matrices – Frequency of mutations between proteins that are x% similar – BLOSUM100 – based on proteins that are exactly the same (e.g. score(A,A) is defined but not score(A,G) ) – BLOSUM62 – commonly used • gap scores usually determined empirically CMSC423 Fall 2008 10 Heuristics • What if limit the # of differences allowed? E.g. we expect the sequences to be very similar. • Compute 'banded' alignment – stay within # of differences (k) from the diagonal. • Optimal alignment cannot stray too far from diagonal • What if we do not know k? Do binary search to find it k k O(km) running time and space CMSC423 Fall 2008 11 Exclusion methods • Assume P must match T with at most k errors. Find places in T where P cannot match. • Split P into floor(n/k+1)-sized chunks. • If P matches T with less than k errors => at least one chunk matches with no errors • Use any exact matching algorithm to find places where a chunk matches T, then run dynamic programming in that vicinity. • Running time, on average O(m) CMSC423 Fall 2008 12 Exclusion methods Exact match Putative alignment Text Pattern CMSC423 Fall 2008 15 Chaining approach • Extends the FASTA idea • Search for exact matches • Find the longest consistent chain of exact matches • Fill in the gaps in the chain using Smith-Waterman • This is the approach used by MUMmer (Delcher et al.) • MUM – maximally unique match (see mummer.sourceforge.net) CMSC423 Fall 2008 16 Chaining in 1-D • Input: multiple overlapping intervals on a line • Output: highest weight set of non-overlapping intervals • Weight could be length of interval, or Smith-Waterman score, etc. • Sort the endpoints (starts, ends) of the intervals • For every interval j, store V[j] – best score of a chain ending in j • MAX – store highest V[j] seen sofar • Process endpoints in increasing order of x coordinate • If we encounter left end (start) of interval j – V[j] = weight(j) + MAX • If we encounter right end (end) of interval j – MAX = max{V[j], MAX} • Running time?
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved