Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Notes on Restriction Mapping Algorithms - Bioinformatics | CAP 5510, Study notes of Computer Science

University of Central Florida (UCF)Computer Science

Material Type: Notes; Class: BIOINFORMATICS; Subject: Computer Applications; University: University of Central Florida; Term: Fall 2009;

Typology: Study notes

2009/2010

Uploaded on 02/24/2010

koofers-user-z6m 🇺🇸

10 documents

1 / 41

Partial preview of the text

Download Notes on Restriction Mapping Algorithms - Bioinformatics | CAP 5510 and more Study notes Computer Science in PDF only on Docsity! Restriction Mapping Algorithms • In this presentation, we will give algorithms to reconstruct the ordering of segments produced from an unknown DNA sequence by using restriction enzymes. We will consider only partial restriction digest. • These algorithms are not used in modern day biotechnology but the techniques illustrate the application of branch‐and‐bound techniques in mathematical biology and was a hot research topic in the 70s and 80s. Discovering Restriction Enzymes • HindII ‐ first restriction enzyme – was discovered accidentally in 1970 while studying how the bacterium Haemophilus influenzae takes up DNA from the virus • Recognizes and cuts DNA at sequences: – GTGCAC – GTTAAC Full Restriction Digest e Cutting DNA at each restriction site creates multiple restriction fragments: Restriction Sites v < < L I £C L <5> at betel er Ds 4 <Q> <5> at 35 e Is it possible to reconstruct the order of the fragments from the sizes of the fragments {3,5,5,9} ? Full Restriction Digest: Multiple Solutions e Alternative ordering of restriction fragments: E =r FF —>_ 1 FS vs Restriction Sites Vv Vv Vv E <5> J E <g> 1 E <5> JE <3> J Partial Restriction Digest • The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites • This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts • This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence Partial Digest Fundamentals the set of n integers representing the location of all cuts in the restriction map, including the start and end (0,5,14.19,22) the multiset of integers representing lengths of each of the nC2 fragments produced from a partial digest (multiset). This set is the mutual difference set of all elements in X the total number of cuts (=5) X: n: DX: One More Partial Digest Example X 0 2 4 7 10 0 2 4 7 10 2 2 5 8 4 3 6 7 3 10 Representation of DX = {2, 2, 3, 3, 4, 5, 6, 7, 8, 10} as a two dimensional table, with elements of X = {0, 2, 4, 7, 10} along both the top and left side. The elements at (i, j) in the table is xj – xi for 1 ≤ i < j ≤ n. Partial Digest Problem: Formulation Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points • Input: The multiset of pairwise distances L, containing nC2 =n(n‐1)/2 integers • Output: A set X, of n integers, such that DX = L • L is given find X. Brute Force Algorithms Also known as exhaustive search algorithms; examine every possible variant to find a solution Efficient in rare cases; usually impractical Partial Digest: Brute Force 1. Find the restriction fragment of maximum length M. M is the length of the DNA sequence. 2. For every possible set X={0, x2, … ,xn-1, M} compute the corresponding DX 5. If DX is equal to the experimental partial digest L, then X is the correct restriction map BruteForcePDP 1. BruteForcePDP(L, n): 2. M <‐ maximum element in L 3. for every set of n – 2 integers 0 < x2 < … xn-1 < M 4. X <‐ {0,x2,…,xn-1,M} 5. Form DX from X 6. if DX = L 7. return X 8. output “no solution” Efficiency of AnotherBruteForcePDP • It’s more efficient, but still slow • If L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will be extremely slow, but AnotherBruteForcePDP will be quite fast • Fewer sets are examined, but runtime is still exponential: O(n2n‐4). Branch and Bound Algorithm for PDP 1. Begin with X = {0} 2. Remove the largest element in L and place it in X 3. See if the element fits on the right or left side of the restriction map 4. When it fits, find the other lengths it creates and remove those from L 5. Go back to step 1 until L is empty Branch and Bound Algorithm for PDP 1. Begin with X = {0} 2. Remove the largest element in L and place it in X 3. See if the element fits on the right or left side of the restriction map 4. When it fits, find the other lengths it creates and remove those from L 5. Go back to step 1 until L is empty WRONG ALGORITHM (may have to backtrack because of the choice at step ) An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 10 } We know now n=5 since nC2=10. So, we begin by setting x5 = 10 so that X = { 0, 10 } and remove 10 from L. The new L is L = { 2, 2, 3, 3, 4, 5, 6, 7, 8}. An Example Take next largest element 8 from L.We have two choices x2 = 2 or x4 = 8. But since the two cases are symmetric, we can assume x2 = 2 . We remove elements x5 ‐ x2 =8 and x2 – x1 =2 from L. The new sets are: X={0,2,10} and L={2,3,3,4,5,6,7} An Example L = { 2, 2, 3, 3, 4, 5, 6, 7} X = { 0, 2, 10 } We have two choices again. We could take 7 from L and make x4 = 7 or x3 = 3. If we choose x3 = 3 then D(x3, X )=(3,1,7)* But, since 1 is not an element of L, this is a wrong choice. So, we choose explore x4 = 7 and D(x4, X ) = {7, 5, 3}. *We define D(y, X )= multiset of distances between a point y and all points in a set X D(x4, X) = {7, 5, 3} = {7 – 0, 7 – 2, 10 – 7} An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 2, 7, 10 } We are left with one choice x3 = 4. D(x3 , X) = {4, 2, 3 ,6}, which is a subset of L so we will explore this branch. We remove {4, 2, 3 ,6} from L and add 4 to X. An Example 10 An Example L = { } X = { 0, 2, 4, 7, 10 } L is now empty, so we have a solution, which is X. An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 2, 10 } This time we will explore y = 3. D(y, X) = {3, 1, 7}, which is not a subset of L, so we won’t explore this branch. An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 10 } We backtracked back to the root. Therefore we have found all the solutions. Defining D(y, X) • Before describing PartialDigest, first define D(y, X) as the multiset of all distances between point y and all other points in the set X D(y, X) = {|y – x1|, |y – x2|, …, |y – xn|} = distances between point y and all points in X= {x1, x2, …, xn} Example: D(2,(1,3,4,5))= (1,1,2,3) PDP Example • Given L=(2,2,3,3,4,5,6,7,8,10), n=5 since |L|=10 and 5C2=10; let w=width. • w:=10, L=(2,2,3,3,4,5,6,7,8), X=(0,10) • 1. PLACE(L.X): L not empty; y:=8; D(8,(0,10))=(8,2) is a subset of L • X=(0,10,8) and L:=L ‐ X=(2,3,3,4,5,6,7) • 2. PLACE(L,X) ): L not empty; y:=7; D(7,(0,10,8))=(7,3,1) is not in L • D(w‐7,(0,10,8))=D(3,(0,10,8))=(3,5,7) is in L • X=(0,10,8,3) and L:=L ‐ X=(2,3,4,6) • 3. PLACE(L,X): L not empty; y:=6; D(6,(0,10,8,3))=(6,4,2,3) • is a subset of L, and X=(0,10,8,3,6) and L is empty • 4. PLACE(L,X): L empty; output X=(0,10,8,3,6) Return • Remove y from X and add D(y,X) to L. X= (0,10,8,3,), D(6,(0,10,8,3))=(6,4,2,3) and L =(6,4,2,3) • Remove w‐y=10‐7=3 from X (=0,10,8,3) and add D(w‐y, X)=D(3,(0,10,8))=(3,7,5) to L. yielding X=(0,10,8) and L=(2,3,3,4,5,6,7) • Remove y =8 from X (0,10) and add D(y, X)= D(8,(0,10))=(6,2) to L. This gives back L=(2,2,3,3,4,5,6,7,8), X=(0,10) and w is still 10. We can start over and take a different possible choice for y=2 and get another solution for X=(0,2,4,7,10). Analyzing PartialDigest Algorithm • Still exponential in worst case, but is very fast on average • Informally, let T(n) be time PartialDigest takes to place n cuts – No branching case (there is just one viable alternative at every step. O(n) is time to compute new X and L: – T(n) = T(n-1) + O(n) • T(n) = O(n2) Quadratic – Branching case (if there are two choices at every step): T(n) < 2T(n-1) + O(n) • Exponential

Documents

questions

Notes on Restriction Mapping Algorithms - Bioinformatics | CAP 5510, Study notes of Computer Science

Related documents

Partial preview of the text