Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Bioinformatics Algorithms - Lecture Slides | CSCE 590B, Study notes of Computer Science

Material Type: Notes; Professor: Alekseyev; Class: TOPIC/BIOINFORMTCS ALGOR; Subject: Computer Science & Engineering; University: University of South Carolina - Columbia; Term: Unknown 2009;

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-e7y
koofers-user-e7y 🇺🇸

5

(1)

10 documents

1 / 58

Toggle sidebar

Related documents


Partial preview of the text

Download Introduction to Bioinformatics Algorithms - Lecture Slides | CSCE 590B and more Study notes Computer Science in PDF only on Docsity! Introduction to Bioinformatics Algorithms Lectures 1-2 Dr. Max Alekseyev USC, 2009 Organization • Lecturer: Dr. Max Alekseyev • Time/Place:MWF 11:15am-12:30pm / SWGN 2A22 • Office hours: after each lecture or by an appointment, SWGN 3A48 • Course webpage: http://cse.sc.edu/~maxal/csce590b/ • Textbook: «An Introduction to Bioinformatics Algorithms» by N. Jones and P. Pevzner http://www.bioalgorithms.info Bioinformatics Bottlenecks Biological Problem Computational Problem (Model) Formalization How accurate? AlgorithmPractical Results Algorithmic solution Does it exist? Execution Is it efficient? Interpretation Are results meaningful? Data May be “noisy” Why Bioinformatics Algorithms? • Usually a biological problem can be transformed into a computational problem in a number of ways that feature different levels of accuracy and complexity. • Highly accurate models often result in intractable computational problems while less accurate models may produce meaningless results. • Goal: to maintain an acceptable level of accuracy keeping the computational problem effectively solvable. Plan • Brief introduction to Algorithms (Chapter 2) • Brief introduction to Biology (Chapter 3) • Study various biological problems and their computational conunterparts (Chapter 4 and up) An everyday algorithm Good algorithm Bad algorithm A computational problem • Defined by inputs and outputs, eg. Input: amount of money to change, M • Output: set of coins summing to M • Require precise formulation Solved by algorithms A prerequisite to an algorithm Pseudocode: Assignment Assignment Format!) a<—b Effect: Sets the variable a to the value 6. Example: 6 — 2 ab Result: The value of a is 2 Pseudocode: Loops for loops Format: fori —atob B Sets 7 to a and executes instructions B. Sets i to a + 1 and executes instructions B again. Repeats fori = a + 2,a+3,...,6—1,b° SUMINTEGERS(n) 1 sum — 0 2 fori lton 3 sum — sum-+ i 4 return sum Result! © SUMINTEGERS(n) computes the sum of integers from 1 to n. SUM- INTEGERS(10) returns 1 + 24+ ---+ 10 = 55. while loops Format: Effect: while A is true B Checks the condition A. If it is true, then executes instructions B. Checks A again; if it’s true, it executes B again. Repeats until A is not true. ADDUNTIL(6) liel 2 total —i 3 while total < 6 4 ro—aitl 5 total — total +7 6 return i ADDUNTIL(6) computes the smallest integer i such that 1 + 2 + -- dis larger than b. For example, ADDUNTIL(25) returns 7, since 142+4-+-+7 = 28, which is larger than 25, but 14 2+4---4+6= 21, which is smaller than 25. Pseudocode: Array access Array access Format: a; Effect: The ith number of array a = (a),...a;,...4@,). For example, if F = (1,1,2,3,5,8, 13), then Fy = 2, and Fy = 3. Example: FIBONACCI(n) 1 PF —] 2 Py — 1 3 fori — 3ton 4 F, — Fy. + Fi-2 5 return F), Result: —§ FIBONACCI(n) computes the nth Fibonacci number. FIBONACCI(8) returns 21. More specifically USCHANGE(M) Give the integer part of M/25 quarters to customer. Let remainder be the remaining amount due the customer. Give the integer part of remainder /10 dimes to customer. Let remainder be the remaining amount due the customer. Give the integer part of remainder /5 nickels to customer. Let remainder be the remaining amount due the customer. Give remainder pennies to customer. Inelegant, but correct USCHANGE(M) r~—M gq 7/25 r-—r—25-q d<r/10 r+«r-10-d n—r/d rer—-5-n pr return (q,d, 7, p) 1 2 3 4 5 6 7 8 9 But what about, say, South Africa? Generalized Problem Change Problem: Convert some amount of money M into given denominations, using the smallest possible number of coins. Input: An amount of money M, and an array of ¢d denom- inations c = (c1,C2,...,¢a), in decreasing order of value (c1 > cg >--+- > eq). Output: A list of dintegers #1, i2,...,ig such that c)i;+¢oi2+ +++ +cgiqg = M, andi, +12+---+%q¢is as small as possible. How fast is it? • # iterations of first index: M/c1 • # iterations of second index: M/c2 • ... Each “check” does 2d+k operations (k is constant) Hence, the total number of operations (running time complexity) is: M/c1 · M/c2 · … · M/cd · (2d+k) = 2/(c1·...·cd) · d · M d + k/(c1·...·cd) · M d = O(d · Md) • Finding the exact complexity, f(n) = number of basic operations, of an algorithm is difficult. • We approximate f(n) by a function g(n) in a way that does not substantially change the magnitude of f(n), i.e., g(n) is sufficiently close to f(n) for large values of the input size n. • This "approximate" measure of efficiency is called asymptotic complexity. • Thus the asymptotic complexity measure does not give the exact number of operations of an algorithm, but it shows how that number grows with the size of the input. • This gives us a measure that will work for different operating systems, compilers and CPUs. Asymptotic Complexity Order notation BRUTEFORCECHANGE(M,c, d) smallest NumberO f Coins — oo for each (i,,...,%q) from (0,...,0) to (M/ei,..., M/ea) valueO f Coins — an tne if valueO fCoins = M numberO f Coins — a ig if numberOfCoins < smallestNumberO fCoins smallest NumberO fCoins — nurmberO f Coins best Change + (i1,72,...,éa) return (best Change) 1 2 3 4 5 6 7 8 9 =O(d M*d) • Similarly, Ω(g(n)) is used to give a lower bound on a positive runtime function f(n) where n is the input size. Definition: For a function f(n) that is non-negative for all n ≥ 0, we say that f(n) = Ω(g(n)) (“f(n) is big-Omega of g(n)”) if there exist n0 ≥ 0 and a constant c > 0 such that f(n) ≥ cg(n) for all n ≥ n0. Big-Omega Notation • Similarly, Θ(g(n)) is used to give a tight bound on a positive runtime function f(n) where n is the input size. Definition: For a function f(n) that is non-negative for all n ≥ 0, we say that f(n) = Θ(g(n)) (“f(n) is big-Theta of g(n)”) if f(n) = O(g(n)) and f(n) = Ω(g(n)). Big-Theta Notation NP-completeness • There is a class of problems that might require exponential time. • Any problem in this class is, in some way, equivalent to any other problem. • It is very unlikely that a polynomial time algorithm exists that can solve any of this class of problems. The bad news... • Many useful problems in biology are NP-complete (e.g., Traveling Salesman Problem) • Heuristic or statistical approaches aren’t “correct”, but are usually the best choice • Proving NP-completeness for a problem is involved • Take-away lesson: consider the possibility that your problem is NP-complete Good vs. Bad • Problems: Good: model system well; clear; precise • Bad: allows silly/mean solutions • Algorithms: Good: poly-time, correct • Bad: Exponential, or worse; incorrect • Implementations: Good: as fast as the algorithm • Bad: dumb coding Next steps • Sorting Problem • Quadratic vs log time • Towers of Hanoi Problem • Recursion and recurrences • Trees Sorting problem Sorting Problem: Sort a list of integers. Input: A list of n distinct integers a = (@1,@2,...,@n)- Output: Sorted list of integers, that is, a reordering b = (b,,b2,...,b,) of integers from a such that b} < by <--- < bn. Intuitive approach • Find the smallest element. Put it first. • Find the next smallest element. Put it next. • Repeat until done. Asymptotic Complexity • IndexOfMin ~ O(n) • SelectionSort: • Calls IndexOfMin O(n) times • Also performs constant time operations • O(n·n), or O(n2) A faster way • There is a faster way of searching • MergeSort • Will be covered in “Divide and Conquer”. • Think about it for a while, see if you can’t figure it out. Towers of Hanoi Problem Formal Problem Towers of Hanoi Problem: Output a list of moves that solves the Towers of Hanoi. Input: An integer n. Output: A sequence of moves that will solve the n-disk Towers of Hanoi puzzle. Easy values of n • n=0; done • n=1; move from left to right peg; done • n=2; small to middle, large to right, small to right; done. • n=3? Move disk from peg 1 to peg 3 ut | | —> Move disk from peg 1 to peg 2 ut | | Move disk from peg 3 to peg 2 at LL ——— > Move disk from peg 1 to peg 3 at dt | <— Move disk from peg 2 to peg 1 td ——— Move disk from peg 1 to peg ¢ | | But we “assumed”! • Key observation: we know how to solve it for small values of n. • So we have HanoiTowers(1,a,b). We can construct HanoiTowers(2,a,b), HT(3,a,b), HT(4,a,b), etc. out of it. The impossible trick • “Assume can opener!” • Assume we have HanoiTowers(k,a,b) that solves correctly the k-disk (general) HT problem for some k • HanoiTowers(k+1,a,b) is easy to write if it can call HanoiTowers(k,a,b): • HanoiTowers(k,a,c) • move largest from a to b • HanoiTowers(k,c,b) Complete algorithm HANOITOWERS(n, fromPeg, toPeg) if n=1 output “Move disk from peg fromPeg to peg toPeg’ return unusedPeg — 6 — fromPeg — toPeg HANOITOWERS(n — 1, fromPeg, unusedPeg) output “Move disk from peg fromPeg to peg toPeg” HANOITOWERS(n — 1, unusedPeg, toPeg) return f 1 2 3 4 5 6 7 8
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved