Download Notes on Restriction Mapping Algorithms - Bioinformatics | CAP 5510 and more Study notes Computer Science in PDF only on Docsity! Restriction Mapping Algorithms • In this presentation, we will give algorithms to reconstruct the ordering of segments produced from an unknown DNA sequence by using restriction enzymes. We will consider only partial restriction digest. • These algorithms are not used in modern day biotechnology but the techniques illustrate the application of branch‐and‐bound techniques in mathematical biology and was a hot research topic in the 70s and 80s. Discovering Restriction Enzymes • HindII ‐ first restriction enzyme – was discovered accidentally in 1970 while studying how the bacterium Haemophilus influenzae takes up DNA from the virus • Recognizes and cuts DNA at sequences: – GTGCAC – GTTAAC Full Restriction Digest
e Cutting DNA at each restriction site creates
multiple restriction fragments:
Restriction Sites
v
<
<
L
I £C
L <5> at
betel
er
Ds
4
<Q> <5> at 35
e Is it possible to reconstruct the order of
the fragments from the sizes of the fragments
{3,5,5,9} ?
Full Restriction Digest: Multiple Solutions
e Alternative ordering of restriction fragments:
E =r FF —>_ 1 FS
vs
Restriction Sites
Vv Vv Vv
E <5> J E <g> 1 E <5> JE <3> J
Partial Restriction Digest • The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites • This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts • This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence Partial Digest Fundamentals the set of n integers representing the location of all cuts in the restriction map, including the start and end (0,5,14.19,22) the multiset of integers representing lengths of each of the nC2 fragments produced from a partial digest (multiset). This set is the mutual difference set of all elements in X the total number of cuts (=5) X: n: DX: One More Partial Digest Example X 0 2 4 7 10 0 2 4 7 10 2 2 5 8 4 3 6 7 3 10 Representation of DX = {2, 2, 3, 3, 4, 5, 6, 7, 8, 10} as a two dimensional table, with elements of X = {0, 2, 4, 7, 10} along both the top and left side. The elements at (i, j) in the table is xj – xi for 1 ≤ i < j ≤ n. Partial Digest Problem: Formulation Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points • Input: The multiset of pairwise distances L, containing nC2 =n(n‐1)/2 integers • Output: A set X, of n integers, such that DX = L • L is given find X. Brute Force Algorithms
Also known as exhaustive search algorithms;
examine every possible variant to find a
solution
Efficient in rare cases; usually impractical
Partial Digest: Brute Force 1. Find the restriction fragment of maximum length M. M is the length of the DNA sequence. 2. For every possible set X={0, x2, … ,xn-1, M} compute the corresponding DX 5. If DX is equal to the experimental partial digest L, then X is the correct restriction map BruteForcePDP 1. BruteForcePDP(L, n): 2. M <‐ maximum element in L 3. for every set of n – 2 integers 0 < x2 < … xn-1 < M 4. X <‐ {0,x2,…,xn-1,M} 5. Form DX from X 6. if DX = L 7. return X 8. output “no solution” Efficiency of AnotherBruteForcePDP • It’s more efficient, but still slow • If L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will be extremely slow, but AnotherBruteForcePDP will be quite fast • Fewer sets are examined, but runtime is still exponential: O(n2n‐4). Branch and Bound Algorithm for PDP 1. Begin with X = {0} 2. Remove the largest element in L and place it in X 3. See if the element fits on the right or left side of the restriction map 4. When it fits, find the other lengths it creates and remove those from L 5. Go back to step 1 until L is empty Branch and Bound Algorithm for PDP 1. Begin with X = {0} 2. Remove the largest element in L and place it in X 3. See if the element fits on the right or left side of the restriction map 4. When it fits, find the other lengths it creates and remove those from L 5. Go back to step 1 until L is empty WRONG ALGORITHM (may have to backtrack because of the choice at step ) An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 10 } We know now n=5 since nC2=10. So, we begin by setting x5 = 10 so that X = { 0, 10 } and remove 10 from L. The new L is L = { 2, 2, 3, 3, 4, 5, 6, 7, 8}. An Example Take next largest element 8 from L.We have two choices x2 = 2 or x4 = 8. But since the two cases are symmetric, we can assume x2 = 2 . We remove elements x5 ‐ x2 =8 and x2 – x1 =2 from L. The new sets are: X={0,2,10} and L={2,3,3,4,5,6,7} An Example L = { 2, 2, 3, 3, 4, 5, 6, 7} X = { 0, 2, 10 } We have two choices again. We could take 7 from L and make x4 = 7 or x3 = 3. If we choose x3 = 3 then D(x3, X )=(3,1,7)* But, since 1 is not an element of L, this is a wrong choice. So, we choose explore x4 = 7 and D(x4, X ) = {7, 5, 3}. *We define D(y, X )= multiset of distances between a point y and all points in a set X D(x4, X) = {7, 5, 3} = {7 – 0, 7 – 2, 10 – 7} An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 2, 7, 10 } We are left with one choice x3 = 4. D(x3 , X) = {4, 2, 3 ,6}, which is a subset of L so we will explore this branch. We remove {4, 2, 3 ,6} from L and add 4 to X. An Example
10
An Example L = { } X = { 0, 2, 4, 7, 10 } L is now empty, so we have a solution, which is X. An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 2, 10 } This time we will explore y = 3. D(y, X) = {3, 1, 7}, which is not a subset of L, so we won’t explore this branch. An Example L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 0, 10 } We backtracked back to the root. Therefore we have found all the solutions. Defining D(y, X) • Before describing PartialDigest, first define D(y, X) as the multiset of all distances between point y and all other points in the set X D(y, X) = {|y – x1|, |y – x2|, …, |y – xn|} = distances between point y and all points in X= {x1, x2, …, xn} Example: D(2,(1,3,4,5))= (1,1,2,3) PDP Example • Given L=(2,2,3,3,4,5,6,7,8,10), n=5 since |L|=10 and 5C2=10; let w=width. • w:=10, L=(2,2,3,3,4,5,6,7,8), X=(0,10) • 1. PLACE(L.X): L not empty; y:=8; D(8,(0,10))=(8,2) is a subset of L • X=(0,10,8) and L:=L ‐ X=(2,3,3,4,5,6,7) • 2. PLACE(L,X) ): L not empty; y:=7; D(7,(0,10,8))=(7,3,1) is not in L • D(w‐7,(0,10,8))=D(3,(0,10,8))=(3,5,7) is in L • X=(0,10,8,3) and L:=L ‐ X=(2,3,4,6) • 3. PLACE(L,X): L not empty; y:=6; D(6,(0,10,8,3))=(6,4,2,3) • is a subset of L, and X=(0,10,8,3,6) and L is empty • 4. PLACE(L,X): L empty; output X=(0,10,8,3,6) Return • Remove y from X and add D(y,X) to L. X= (0,10,8,3,), D(6,(0,10,8,3))=(6,4,2,3) and L =(6,4,2,3) • Remove w‐y=10‐7=3 from X (=0,10,8,3) and add D(w‐y, X)=D(3,(0,10,8))=(3,7,5) to L. yielding X=(0,10,8) and L=(2,3,3,4,5,6,7) • Remove y =8 from X (0,10) and add D(y, X)= D(8,(0,10))=(6,2) to L. This gives back L=(2,2,3,3,4,5,6,7,8), X=(0,10) and w is still 10. We can start over and take a different possible choice for y=2 and get another solution for X=(0,2,4,7,10). Analyzing PartialDigest Algorithm • Still exponential in worst case, but is very fast on average • Informally, let T(n) be time PartialDigest takes to place n cuts – No branching case (there is just one viable alternative at every step. O(n) is time to compute new X and L: – T(n) = T(n-1) + O(n) • T(n) = O(n2) Quadratic – Branching case (if there are two choices at every step): T(n) < 2T(n-1) + O(n) • Exponential