Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

CS161: Algorithm Design and Analysis Handout # 5, Exercises of Algorithms and Programming

Rice University Algorithms and Programming

CS161: Algorithm Design and Analysis. Handout # 5. Stanford University. Wednesday, 3 February 2016. Homework #4: Sorting models, hashing.

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

kourtney 🇺🇸

4.8

(6)

1 document

1 / 3

Partial preview of the text

Download CS161: Algorithm Design and Analysis Handout # 5 and more Exercises Algorithms and Programming in PDF only on Docsity! CS161: Algorithm Design and Analysis Handout # 5 Stanford University Wednesday, 3 February 2016 Homework #4: Sorting models, hashing Due Date: Wednesday, 10 February 2016 Problem 1. “Real” Cost of Sorting. [40 points, 8 points per part] As part of a homework assignment for your architecture class you have to select the “best” sort- ing algorithm to use on your classes’ example computer. The relevant parts of the computer’s architecture are the cache and the main memory. In particular, the machine’s memory can be viewed as a large array partitioned into a number of cache lines, each containing s consecutive locations of the memory array. In other words the first cache line consists of the first s memory locations, the second cache line consists of the next s memory locations etc. The cache mem- ory is a small fast memory that at any point in time contains a constant (Θ(1)) number of these cache lines. The cache lines that are not in the cache are maintained in main memory. When- ever a user program accesses a memory location in a cache line that is in the cache, the access occurs with no main memory activity. Such a memory access is considered to be fast. If the memory location is not in the cache, however, the cache line on which it resides replaces some other cache line currently in the cache. In other words, the cache line that is being replaced is copied out to main memory and the desired cache line is copied into the cache. This event is called a cache miss. Since a cache miss involves a reference to main memory, this type of memory access is considered to be slow. You decide to select your sorting algorithm based on the running time analysis given of the various sorting algorithms discussed in CS161. However, after a while you realize that the analysis given in CS161 does not account for the fact that the main memory operations required by a cache miss can take orders of magnitude longer than memory operations that only involve the cache. In order to account for the cost of the different memory operations you decide to analyze the sorting algorithms based on the number of cache misses they may require. First you extend O-notation to two-variable functions (so as to free yourself from having to give exact answers when counting main memory operations). Specifically: O( f (x,y)) = {g(x,y) : there exist positive constants x0, y0, and c such that |g(x,y)| ≤ c f (x,y) for all x≥ x0 and y≥ y0}. In answering the following questions, assume that the cache holds Θ(1) cache lines and that the n numbers to be sorted are stored in an array of contiguous memory locations. Your answers should generally be worst-case analyses expressed in terms of n, the number of values to be sorted, and s, the number of values that fit in a cache line. Justify your answers by showing your analyses. 2 CS161: : Handout # 5 (a) Show that INSERTIONSORT has O(n2/s) cache misses. Assuming that your code knows s, suggest a modification to INSERTIONSORT that achieves O(n2/s2) cache misses. (b) Show that the number of cache misses generated in merging two sorted arrays of m numbers apiece is O(m/s). How many cache misses are required by MERGESORT when accessing the input array? (c) How many cache misses are generated when HEAPIFY is called on a node of height h? What is the total cost (in terms of cache misses) of BUILDHEAP? How many cache misses does HEAPSORT generate? (Use the array implementation for heaps given in the book.) (d) How many cache misses does the PARTITION procedure in the book require for access to the input array? Make an intelligent guess as to the average number of cache misses that QUICKSORT requires. You need only provide an intuitive justification for your guess. (e) Which sorting algorithm would you recommend to your boss and why? Problem 2. Hashing to Disk Pages [48 points, 8 points per part] In an application that requires a very large hash table, it may be impractical to store the hash table in primary memory. In particular, one might choose to store the hash table on disk, with one disk page playing the role of one slot in the hash table. (For a review of disk storage, read pages 484-488 in CLRS.) We presume that a disk page is large enough to hold many records. For example, a typical disk page holds 212 bytes of data, and a record may contain only 25 bytes. We shall store the records that hash to a single page in a linear order on the page. If a page overflows because it contains too many records, the excess records are stored in an overflow area somewhere else on disk. A SEARCH consists of hashing to the correct disk page and then linearly searching through the records on that page for one with the query key. If the page is full, then the overflow area must be searched in addition. Since the time to access a disk page is typically at least 10 milliseconds, the cost of the linear search on the page is neglible. Thus, we shall focus the number of disk accesses as our cost measure. For the SEARCH operation, the cost is 1 if we find the record in its “proper” page, but it may be considerably greater if we must in addition search the overflow area. Consequently, the focus of this problem is to ensure that the proper pages seldom overflow, while using as little extra space as possible. We shall assume for the rest of this problem that we are hashing n keys to disk pages, where each disk page holds r records. We would like to know the number m of disk pages so that with high confidence, we can search for any of the n keys with a single disk access. Moreover, we would like m = O(n/r) so that at most a constant fraction of the storage is wasted. We shall make the assumption of simple uniform hashing (see page 259 in CLRS). Also, in- equalities (C.5), (C.19) and Theorem (C.2) in CLRS will be useful for solving this problem.

Documents

questions

CS161: Algorithm Design and Analysis Handout # 5, Exercises of Algorithms and Programming

Related documents

Partial preview of the text