Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Database Query Processing: Lecture 17 - Query Processing Techniques - Prof. Mihai Pop, Study notes of Principles of Database Management

University of Maryland Principles of Database Management

Prof. Mihai Pop

Various query processing techniques in database systems, including complex selections, multiple index selection, join methods (nested loop, sort-merge-join, hash-join), and other operations (outer joins, duplicate elimination, aggregates). The lecture also discusses the cost analysis of each method.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-sb5 🇺🇸

10 documents

1 / 17

Partial preview of the text

Download Database Query Processing: Lecture 17 - Query Processing Techniques - Prof. Mihai Pop and more Study notes Principles of Database Management in PDF only on Docsity! CMSC 424 – Database design Lecture 17 Query processing Mihai Pop Admin • Homework 3 is on the website • Project part 1 due • Midterm answers/results > 90 – A (6) 75-90 – B (9) 50-75 – C (6) < 50 – D (1) • 10 improved, 10 got worse, 2 about the same Multiple Index Selection GOAL: apply the most restrictive one and combine multiple of them to reduce the intermediate results AS EARLY AS POSSIBLE • conjunctive selection using one index A: select using A and then apply the remaining of the predicates on the retrieved tuple values • conjunctive selection using a composite key index (R.A,R.B)- then create a composite key or range from the query values and search directly (range search on the first attribute only) • conjunctive selection using two indexes A and B: search each separately and intersect the tuple identifiers (TIDs) • disjunctive selection using two indexes A and B: search each separately and take the union of the TIDs Join Methods: Nested Loop • tuple-oriented: • block-oriented: • reverse inner loop similar to above but for even outer blocks we scan the inner relation in reverse inner blocks outer blocks outer blocks inner blocks for each tuple t(r) in r do begin for each tuple t(s) in s do begin join(t(r),t(s) and append the result to the output end end for each block b(r) in r do begin for each block b(s) in s do begin join(b(r),b(s) and append the result to the output end end Cost of Block-Oriented Nested Loop • cost depends on the number of buffers and the buffer replacement strategy – fasten 1 block from the outer relation, M for the inner and LRU cost: b(r) + b(r)*b(s) assuming that b(s) > M – fasten M blocks from the outer relation, and 1 for the inner 1: read M from the outer cost: M blocks 2: for each block of s join 1 X M blocks cost: b(s) -”- 3: repeat with the next M blocks of r until all done repeated b(r)/M times cost = [ (M + b(s)]* b(r)/M = b(r)+[b(r)*b(s)]/M • which relation should be the outer? Buffer size M+1 Hash-Join Algorithm Details 1. Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. 2. Partition r similarly. 3. For each i: (a)Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. (b)Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in- memory hash index. Output the concatenation of their attributes. The hash-join of r and s is computed as follows. Relation s is called the build input and r is called the probe input. Example of Cost of Hash-Join • M= 20 blocks • bdepositor= 100 • bcustomer = 400. • depositor is the build input. Partition it into 5 partitions, each of size 20 blocks. This partitioning can be done in one pass. • partition customer into 5 partitions,each of size 80. This is also done in one pass. • Do the partition joins- for each j put 20 blocks of partition depositor(j) in memory, built the hash index, and do the probes with the 80 blocks of customer(j) • Therefore total cost, ignoring cost of writing partially filled blocks: – 3(100 + 400) = 1500 I/Os customer depositor Hash-Join algorithm (Cont.) • The value n and the hash function h is chosen such that each si should fit in memory. – Typically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2 – The probe relation partitions ri need not fit in memory • Recursive partitioning required if number of partitions n is greater than number of pages M of memory. – instead of partitioning n ways, use M – 1 partitions for s – Further partition the M – 1 partitions using a different hash function – Use same partitioning method on r – Rarely required: e.g., recursive partitioning not needed for relations of 1GB or less with memory size of 2MB, with block size of 4KB. Other Operations • Outer Joins – Left outer join easy – Right/Full outer join (may need some bookkeeping) • Duplicate elimination – Hard – Sort at the end and eliminate – Hash output and eliminate • Aggregates – Sum, count, min, max easily kept during execution – Avg = Sum / count – Std = sqrt(ssum/count) External Sorting with Sort-Merge ●external vs internal sorting: relation/file does not fit in memory ●create runs phase: ●merge-runs phase: ● ● this assumes that a block from each run can be kept in main memory. If not, then the same algorithm has to be applied in multiple passes repeat until done read M blocks of the relation (or rest if <=M) internal sort using any sort method, e.g QuickSort(M) write the sorted tuples into a run R data file end read one block from each run; merge tuples on the result; advance the pointer from the run you appended last; if the block of a run is empty, read the next one until all blocks of all runs are done External Merge Sort Cost • Cost analysis: – Initial number of runs: br/M – Total number of merge passes required: logM–1(br/M). – Block transfers for initial run creation is br +br=2br • for final pass, we don’t count write cost we ignore final write cost for all operations since the output of an operation may be pipelined to the display or to a parent operation without being written to disk. If pipelined, it will be counted in the cost of the follow up operator • Thus total number of block transfers for external sorting: 2 br ( logM–1(br / M)) + br = br ( 2 logM–1(br / M) + 1) – If M ≥ br/M (only one pass is required) the expression logM–1(br/M) =1 total cost = 3br – However, if M > br then this expression evaluates to 0 total cost =br ONLY

Documents

questions

Database Query Processing: Lecture 17 - Query Processing Techniques - Prof. Mihai Pop, Study notes of Principles of Database Management

Related documents

Partial preview of the text