Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS161: Algorithm Design and Analysis Handout # 5, Exercises of Algorithms and Programming

CS161: Algorithm Design and Analysis. Handout # 5. Stanford University. Wednesday, 3 February 2016. Homework #4: Sorting models, hashing.

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

kourtney
kourtney 🇺🇸

4.8

(6)

1 document

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download CS161: Algorithm Design and Analysis Handout # 5 and more Exercises Algorithms and Programming in PDF only on Docsity! CS161: Algorithm Design and Analysis Handout # 5 Stanford University Wednesday, 3 February 2016 Homework #4: Sorting models, hashing Due Date: Wednesday, 10 February 2016 Problem 1. “Real” Cost of Sorting. [40 points, 8 points per part] As part of a homework assignment for your architecture class you have to select the “best” sort- ing algorithm to use on your classes’ example computer. The relevant parts of the computer’s architecture are the cache and the main memory. In particular, the machine’s memory can be viewed as a large array partitioned into a number of cache lines, each containing s consecutive locations of the memory array. In other words the first cache line consists of the first s memory locations, the second cache line consists of the next s memory locations etc. The cache mem- ory is a small fast memory that at any point in time contains a constant (Θ(1)) number of these cache lines. The cache lines that are not in the cache are maintained in main memory. When- ever a user program accesses a memory location in a cache line that is in the cache, the access occurs with no main memory activity. Such a memory access is considered to be fast. If the memory location is not in the cache, however, the cache line on which it resides replaces some other cache line currently in the cache. In other words, the cache line that is being replaced is copied out to main memory and the desired cache line is copied into the cache. This event is called a cache miss. Since a cache miss involves a reference to main memory, this type of memory access is considered to be slow. You decide to select your sorting algorithm based on the running time analysis given of the various sorting algorithms discussed in CS161. However, after a while you realize that the analysis given in CS161 does not account for the fact that the main memory operations required by a cache miss can take orders of magnitude longer than memory operations that only involve the cache. In order to account for the cost of the different memory operations you decide to analyze the sorting algorithms based on the number of cache misses they may require. First you extend O-notation to two-variable functions (so as to free yourself from having to give exact answers when counting main memory operations). Specifically: O( f (x,y)) = {g(x,y) : there exist positive constants x0, y0, and c such that |g(x,y)| ≤ c f (x,y) for all x≥ x0 and y≥ y0}. In answering the following questions, assume that the cache holds Θ(1) cache lines and that the n numbers to be sorted are stored in an array of contiguous memory locations. Your answers should generally be worst-case analyses expressed in terms of n, the number of values to be sorted, and s, the number of values that fit in a cache line. Justify your answers by showing your analyses. 2 CS161: : Handout # 5 (a) Show that INSERTIONSORT has O(n2/s) cache misses. Assuming that your code knows s, suggest a modification to INSERTIONSORT that achieves O(n2/s2) cache misses. (b) Show that the number of cache misses generated in merging two sorted arrays of m numbers apiece is O(m/s). How many cache misses are required by MERGESORT when accessing the input array? (c) How many cache misses are generated when HEAPIFY is called on a node of height h? What is the total cost (in terms of cache misses) of BUILDHEAP? How many cache misses does HEAPSORT generate? (Use the array implementation for heaps given in the book.) (d) How many cache misses does the PARTITION procedure in the book require for access to the input array? Make an intelligent guess as to the average number of cache misses that QUICKSORT requires. You need only provide an intuitive justification for your guess. (e) Which sorting algorithm would you recommend to your boss and why? Problem 2. Hashing to Disk Pages [48 points, 8 points per part] In an application that requires a very large hash table, it may be impractical to store the hash table in primary memory. In particular, one might choose to store the hash table on disk, with one disk page playing the role of one slot in the hash table. (For a review of disk storage, read pages 484-488 in CLRS.) We presume that a disk page is large enough to hold many records. For example, a typical disk page holds 212 bytes of data, and a record may contain only 25 bytes. We shall store the records that hash to a single page in a linear order on the page. If a page overflows because it contains too many records, the excess records are stored in an overflow area somewhere else on disk. A SEARCH consists of hashing to the correct disk page and then linearly searching through the records on that page for one with the query key. If the page is full, then the overflow area must be searched in addition. Since the time to access a disk page is typically at least 10 milliseconds, the cost of the linear search on the page is neglible. Thus, we shall focus the number of disk accesses as our cost measure. For the SEARCH operation, the cost is 1 if we find the record in its “proper” page, but it may be considerably greater if we must in addition search the overflow area. Consequently, the focus of this problem is to ensure that the proper pages seldom overflow, while using as little extra space as possible. We shall assume for the rest of this problem that we are hashing n keys to disk pages, where each disk page holds r records. We would like to know the number m of disk pages so that with high confidence, we can search for any of the n keys with a single disk access. Moreover, we would like m = O(n/r) so that at most a constant fraction of the storage is wasted. We shall make the assumption of simple uniform hashing (see page 259 in CLRS). Also, in- equalities (C.5), (C.19) and Theorem (C.2) in CLRS will be useful for solving this problem.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved