Download Data Structures & Algorithms Review: Asymptotic Analysis, Sorting, & Dynamic Sets and more Lecture notes Algorithms and Programming in PDF only on Docsity! David Luebke 1 10/21/16 CS 332: Algorithms Review for Exam 1 David Luebke 2 10/21/16 Administrative ● Reminder: homework 3 due today ● Reminder: Exam 1 Wednesday, Feb 13 ■ 1 8.5x11 crib sheet allowed ○ Both sides, mechanical reproduction okay ○ You will turn it in with the exam David Luebke 5 10/21/16 Proof By Induction ● Claim:S(n) is true for all n >= k ● Basis: ■ Show formula is true when n = k ● Inductive hypothesis: ■ Assume formula is true for an arbitrary n ● Step: ■ Show that formula is then true for n+1 David Luebke 6 10/21/16 Induction Example: Gaussian Closed Form ● Prove 1 + 2 + 3 + … + n = n(n+1) / 2 ■ Basis: ○ If n = 0, then 0 = 0(0+1) / 2 ■ Inductive hypothesis: ○ Assume 1 + 2 + 3 + … + n = n(n+1) / 2 ■ Step (show true for n+1): 1 + 2 + … + n + n+1 = (1 + 2 + …+ n) + (n+1) = n(n+1)/2 + n+1 = [n(n+1) + 2(n+1)]/2 = (n+1)(n+2)/2 = (n+1)(n+1 + 1) / 2 David Luebke 7 10/21/16 Induction Example: Geometric Closed Form ● Prove a0 + a1 + … + an = (an+1 - 1)/(a - 1) for all a 1 ■ Basis: show that a0 = (a0+1 - 1)/(a - 1) a0 = 1 = (a1 - 1)/(a - 1) ■ Inductive hypothesis: ○ Assume a0 + a1 + … + an = (an+1 - 1)/(a - 1) ■ Step (show true for n+1): a0 + a1 + … + an+1 = a0 + a1 + … + an + an+1 = (an+1 - 1)/(a - 1) + an+1 = (an+1+1 - 1)/(a - 1) David Luebke 10 10/21/16 Review: Asymptotic Notation ● Upper Bound Notation: ■ f(n) is O(g(n)) if there exist positive constants c and n0 such that f(n) c g(n) for all n n0 ■ Formally, O(g(n)) = { f(n): positive constants c and n0 such that f(n) c g(n) n n0 ● Big O fact: ■ A polynomial of degree k is O(nk) David Luebke 11 10/21/16 Review: Asymptotic Notation ● Asymptotic lower bound: ■ f(n) is (g(n)) if positive constants c and n0 such that 0 cg(n) f(n) n n0 ● Asymptotic tight bound: ■ f(n) is (g(n)) if positive constants c1, c2, and n0 such that c1 g(n) f(n) c2 g(n) n n0 ■ f(n) = (g(n)) if and only if f(n) = O(g(n)) AND f(n) = (g(n)) David Luebke 12 10/21/16 Review: Other Asymptotic Notations ● A function f(n) is o(g(n)) if positive constants c and n0 such that f(n) < c g(n) n n0 ● A function f(n) is (g(n)) if positive constants c and n0 such that c g(n) < f(n) n n0 ● Intuitively, ■ o() is like < ■ O() is like ■ () is like > ■ () is like ■ () is like = David Luebke 15 10/21/16 Review: Solving Recurrences ● Substitution method ● Iteration method ● Master method David Luebke 16 10/21/16 Review: Solving Recurrences ● The substitution method ■ A.k.a. the “making a good guess method” ■ Guess the form of the answer, then use induction to find the constants and show that solution works ■ Example: merge sort ○ T(n) = 2T(n/2) + cn ○ We guess that the answer is O(n lg n) ○ Prove it by induction ■ Can similarly show T(n) = Ω(n lg n), thus Θ(n lg n) David Luebke 17 10/21/16 Review: Solving Recurrences ● The “iteration method” ■ Expand the recurrence ■ Work some algebra to express as a summation ■ Evaluate the summation ● We showed several examples including complex ones: 1 1 )( ncn b n aT nc nT David Luebke 20 10/21/16 Review: Heaps ● A heap is a “complete” binary tree, usually represented as an array: 16 4 10 14 7 9 3 2 8 1 16 14 10 8 7 9 3 2 4 1A = David Luebke 21 10/21/16 Review: Heaps ● To represent a heap as an array: Parent(i) { return i/2 ; } Left(i) { return 2*i; } right(i) { return 2*i + 1; } David Luebke 22 10/21/16 Review: The Heap Property ● Heaps also satisfy the heap property: A[Parent(i)] A[i] for all nodes i > 1 ■ In other words, the value of a node is at most the value of its parent ■ The largest value is thus stored at the root (A[1]) ● Because the heap is a binary tree, the height of any node is at most (lg n) David Luebke 25 10/21/16 Review: BuildHeap() // given an unsorted array A, make A a heap BuildHeap(A) { heap_size(A) = length(A); for (i = length[A]/2 downto 1) Heapify(A, i); } David Luebke 26 10/21/16 Review: Priority Queues ● Heapsort is a nice algorithm, but in practice Quicksort (coming up) usually wins ● But the heap data structure is incredibly useful for implementing priority queues ■ A data structure for maintaining a set S of elements, each with an associated value or key ■ Supports the operations Insert(), Maximum(), and ExtractMax() ■ What might a priority queue be useful for? David Luebke 27 10/21/16 Review: Priority Queue Operations ● Insert(S, x) inserts the element x into set S ● Maximum(S) returns the element of S with the maximum key ● ExtractMax(S) removes and returns the element of S with the maximum key David Luebke 30 10/21/16 Review: Quicksort ● Another divide-and-conquer algorithm ■ The array A[p..r] is partitioned into two non-empty subarrays A[p..q] and A[q+1..r] ○ Invariant: All elements in A[p..q] are less than all elements in A[q+1..r] ■ The subarrays are recursively sorted by calls to quicksort ■ Unlike merge sort, no combining step: two subarrays form an already-sorted array David Luebke 31 10/21/16 Review: Quicksort Code Quicksort(A, p, r) { if (p < r) { q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); } } David Luebke 32 10/21/16 Review: Partition ● Clearly, all the action takes place in the partition() function ■ Rearranges the subarray in place ■ End result: ○ Two subarrays ○ All values in first subarray all values in second ■ Returns the index of the “pivot” element separating the two subarrays David Luebke 35 10/21/16 Review: Analyzing Quicksort ● In the worst case: T(1) = (1) T(n) = T(n - 1) + (n) ● Works out to T(n) = (n2) David Luebke 36 10/21/16 Review: Analyzing Quicksort ● In the best case: T(n) = 2T(n/2) + (n) ● What does this work out to? T(n) = (n lg n) David Luebke 37 10/21/16 Review: Analyzing Quicksort (Average Case) ● Intuitively, the O(n) cost of a bad split (or 2 or 3 bad splits) can be absorbed into the O(n) cost of each good split ● Thus running time of alternating bad and good splits is still O(n lg n), with slightly higher constants ● We can be more rigorous… David Luebke 40 10/21/16 Sorting Summary ● Insertion sort: ■ Easy to code ■ Fast on small inputs (less than ~50 elements) ■ Fast on nearly-sorted inputs ■ O(n2) worst case ■ O(n2) average (equally-likely inputs) case ■ O(n2) reverse-sorted case David Luebke 41 10/21/16 Sorting Summary ● Merge sort: ■ Divide-and-conquer: ○ Split array in half ○ Recursively sort subarrays ○ Linear-time merge step ■ O(n lg n) worst case ■ Doesn’t sort in place David Luebke 42 10/21/16 Sorting Summary ● Heap sort: ■ Uses the very useful heap data structure ○ Complete binary tree ○ Heap property: parent key > children’s keys ■ O(n lg n) worst case ■ Sorts in place ■ Fair amount of shuffling memory around David Luebke 45 10/21/16 Review: Counting Sort ● Counting sort: ■ Assumption: input is in the range 1..k ■ Basic idea: ○ Count number of elements k each element i ○ Use that number to place i in position k of sorted array ■ No comparisons! Runs in time O(n + k) ■ Stable sort ■ Does not sort in place: ○ O(n) array to hold sorted output ○ O(k) array for scratch storage David Luebke 46 10/21/16 Review: Counting Sort 1 CountingSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to n 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1; David Luebke 47 10/21/16 Review: Radix Sort ● Radix sort: ■ Assumption: input has d digits ranging from 0 to k ■ Basic idea: ○ Sort elements by digit starting with least significant ○ Use a stable sort (like counting sort) for each stage ■ Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) ○ When d is constant and k=O(n), takes O(n) time ■ Fast! Stable! Simple! ■ Doesn’t sort in place David Luebke 50 10/21/16 Review: The Selection Problem ● The selection problem: find the ith smallest element of a set ● Two algorithms: ■ A practical randomized algorithm with O(n) expected running time ■ A cool algorithm of theoretical interest only with O(n) worst-case running time David Luebke 51 10/21/16 Review: Randomized Selection ● Key idea: use partition() from quicksort ■ But, only need to examine one subarray ■ This savings shows up in running time: O(n) A[q] A[q] qp r David Luebke 52 10/21/16 Review: Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k); A[q] A[q] k qp r David Luebke 55 10/21/16 Review: Worst-Case Linear-Time Selection ● The algorithm in words: 1. Divide n elements into groups of 5 2. Find median of each group (How? How long?) 3. Use Select() recursively to find median x of the n/5 medians 4. Partition the n elements around x. Let k = rank(x) 5. if (i == k) then return x if (i < k) then use Select() recursively to find ith smallest element in first partition else (i > k) use Select() recursively to find (i-k)th smallest element in last partition David Luebke 56 10/21/16 Review: Worst-Case Linear-Time Selection ● (Sketch situation on the board) ● How many of the 5-element medians are x? ■ At least 1/2 of the medians = n/5 / 2 = n/10 ● How many elements are x? ■ At least 3 n/10 elements ● For large n, 3 n/10 n/4 (How large?) ● So at least n/4 elements x ● Similarly: at least n/4 elements x David Luebke 57 10/21/16 Review: Worst-Case Linear-Time Selection ● Thus after partitioning around x, step 5 will call Select() on at most 3n/4 elements ● The recurrence is therefore: enough big is if 20 )(2019 )(435 435 435)( ccn ncncn ncn ncncn nnTnT nnTnTnT ??? ??? ??? ??? ??? n/5 n/5 Substitute T(n) = cn Combine fractions Express in desired form What we set out to prove