Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Inexact Alignment Dynamic Programming, and Gapped Alignment | CMSC 423, Study notes of Computer Science

University of Maryland Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-mot 🇺🇸

(2)

10 documents

1 / 16

Partial preview of the text

Download Inexact Alignment Dynamic Programming, and Gapped Alignment | CMSC 423 and more Study notes Computer Science in PDF only on Docsity! CMSC423 Fall 2008 1 CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 9 inexact alignment dynamic programming, gapped alignment CMSC423 Fall 2008 2 Recap 5 Local alignment recap C - - A G A C T G A G GATGC AGCGTAG GTCAGAC Value(A,A) = 10 Value(A,G) = -5 Value(A,-) = -2 Score[i,j] is the maximum of: 0. 0 1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned) 2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) 3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) CMSC423 Fall 2008 6 Alignment scores CMSC423 Fall 2008 7 Where do the alignment scores come from? • PAM matrices – PAM1 – based on frequency of mutations between closely related proteins (within 1 "evolutionary step") – PAM 2 - ... within 2 evolutionary steps – ... PAM 250 – commonly used • BLOSUM matrices – Frequency of mutations between proteins that are x% similar – BLOSUM100 – based on proteins that are exactly the same (e.g. score(A,A) is defined but not score(A,G) ) – BLOSUM62 – commonly used • gap scores usually determined empirically CMSC423 Fall 2008 10 Heuristics • What if limit the # of differences allowed? E.g. we expect the sequences to be very similar. • Compute 'banded' alignment – stay within # of differences (k) from the diagonal. • Optimal alignment cannot stray too far from diagonal • What if we do not know k? Do binary search to find it k k O(km) running time and space CMSC423 Fall 2008 11 Exclusion methods • Assume P must match T with at most k errors. Find places in T where P cannot match. • Split P into floor(n/k+1)-sized chunks. • If P matches T with less than k errors => at least one chunk matches with no errors • Use any exact matching algorithm to find places where a chunk matches T, then run dynamic programming in that vicinity. • Running time, on average O(m) CMSC423 Fall 2008 12 Exclusion methods Exact match Putative alignment Text Pattern CMSC423 Fall 2008 15 Chaining approach • Extends the FASTA idea • Search for exact matches • Find the longest consistent chain of exact matches • Fill in the gaps in the chain using Smith-Waterman • This is the approach used by MUMmer (Delcher et al.) • MUM – maximally unique match (see mummer.sourceforge.net) CMSC423 Fall 2008 16 Chaining in 1-D • Input: multiple overlapping intervals on a line • Output: highest weight set of non-overlapping intervals • Weight could be length of interval, or Smith-Waterman score, etc. • Sort the endpoints (starts, ends) of the intervals • For every interval j, store V[j] – best score of a chain ending in j • MAX – store highest V[j] seen sofar • Process endpoints in increasing order of x coordinate • If we encounter left end (start) of interval j – V[j] = weight(j) + MAX • If we encounter right end (end) of interval j – MAX = max{V[j], MAX} • Running time?

Documents

questions

Inexact Alignment Dynamic Programming, and Gapped Alignment | CMSC 423, Study notes of Computer Science

Related documents

Partial preview of the text