Download Bioinformatics Algorithms: Lecture 3 - Time Complexity and NP-Completeness - Prof. M. Alek and more Study notes Computer Science in PDF only on Docsity! Introduction to Bioinformatics Algorithms Lecture 3 Dr. Max Alekseyev USC, 2009 Time Complexity • TC is a function of the input length L. E.g.: • if input is an integer M, then L is proportional to log(M), i.e., L = Θ(log M) • if input is an array of size m with elements ≤ E, then L = Θ(m·log E) • TC(L) is the number of operations (steps) performed by an algorithm in the worst-case. • We are not interested in exact value of TC(L) but rather in its asymptotic behavior (as L grows). • Polynomial TC is “good”; exponential TC is “bad”. NP-completeness • There is a class of problems that might require exponential time (e.g., Traveling Salesman Problem) • Any problem in this class is, in some way, equivalent to any other problem. • It is very unlikely that a polynomial time algorithm exists that can solve any of this class of problems. The bad news... • Many useful problems in biology are NP-complete • Heuristic or statistical approaches aren’t “correct”, but are usually the best choice • Proving NP-completeness for a problem is involved • Take-away lesson: consider the possibility that your problem is NP-complete Summary • Computational problems define mapping from inputs to outputs • Algorithms solve computational problems. • Two fundamental properties of algorithms are “correctness” and “complexity” (“efficiency”). • Problems also have an “inherent complexity”. How can one design an algorithm, given a problem? Towers of Hanoi Problem Formal Problem
Towers of Hanoi Problem:
Output a list of moves that solves the Towers of Hanoi.
Input: An integer n.
Output: A sequence of moves that will solve the n-disk
Towers of Hanoi puzzle.
Easy values of n • n=0; done • n=1; move from left to right peg; done • n=2; small to middle, large to right, small to right; done. • n=3? Recursion! • To solve n=4, we solved the puzzle for n=3 multiple times. • Generalize the problem. • Given n, a, and b, move n disks from peg a to peg b. Key observation • Key observation: we know how to solve it for small values of n. • So we have HanoiTowers(1,a,b). We can construct HanoiTowers(2,a,b), HanoiTowers(3,a,b), HanoiTowers(4,a,b), etc. out of it. The trick • Assume a “can opener”: • Assume we have HanoiTowers(k,a,b) that solves correctly the k-disk (general) HT problem for some k • HanoiTowers(k+1,a,b) is easy to write if it can call HanoiTowers(k,a,b): • HanoiTowers(k,a,c) • move largest from a to b • HanoiTowers(k,c,b) HanoiTowers Complexity • Time complexity measures the number of operations in the worst case. For Hanoi Towers, it is convenient to define “operation” as a single disk move. • Let T(n) be the number of disk moves performed HanoiTower(n). Then T(n) = 2∙T(n-1) + 1, T(1) = 1. • From this equation we derive T(n) = 2n – 1. Q: What is the asymptotic complexity? Sorting problem
Sorting Problem:
Sort a list of integers.
Input: A list of n distinct integers a = (@1,@2,...,@n)-
Output: Sorted list of integers, that is, a reordering b =
(b,,b2,...,b,) of integers from a such that b} < by <--- <
bn.
Intuitive approach • Find the smallest element. Put it first. • Find the next smallest element. Put it next. • Repeat until done. Recursive
SelectionSort
RECURSIVESELECTIONSORT (a, first, last)
1 if fi < last
in INDEXOFMIN(a, first, last)
Swap G@first With Gindex
a — RECURSIVESELECTIONSORT(a, first + 1, last)
returna
Asymptotic Complexity • IndexOfMin ~ O(n) • SelectionSort: • Calls IndexOfMin O(n) times • Also performs constant time operations • O(n·n), or O(n2) A faster way • There is a faster way of searching • MergeSort • Will be covered in “Divide and Conquer”. • Think about it for a while, see if you can’t figure it out.