Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Query Processing: Lecture 17 - Query Processing Techniques - Prof. Mihai Pop, Study notes of Principles of Database Management

Various query processing techniques in database systems, including complex selections, multiple index selection, join methods (nested loop, sort-merge-join, hash-join), and other operations (outer joins, duplicate elimination, aggregates). The lecture also discusses the cost analysis of each method.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-sb5
koofers-user-sb5 🇺🇸

10 documents

1 / 17

Toggle sidebar

Related documents


Partial preview of the text

Download Database Query Processing: Lecture 17 - Query Processing Techniques - Prof. Mihai Pop and more Study notes Principles of Database Management in PDF only on Docsity! CMSC 424 – Database design Lecture 17 Query processing Mihai Pop Admin • Homework 3 is on the website • Project part 1 due • Midterm answers/results > 90 – A (6) 75-90 – B (9) 50-75 – C (6) < 50 – D (1) • 10 improved, 10 got worse, 2 about the same Multiple Index Selection GOAL: apply the most restrictive one and combine multiple of them to reduce the intermediate results AS EARLY AS POSSIBLE • conjunctive selection using one index A: select using A and then apply the remaining of the predicates on the retrieved tuple values • conjunctive selection using a composite key index (R.A,R.B)- then create a composite key or range from the query values and search directly (range search on the first attribute only) • conjunctive selection using two indexes A and B: search each separately and intersect the tuple identifiers (TIDs) • disjunctive selection using two indexes A and B: search each separately and take the union of the TIDs Join Methods: Nested Loop • tuple-oriented: • block-oriented: • reverse inner loop similar to above but for even outer blocks we scan the inner relation in reverse inner blocks outer blocks outer blocks inner blocks for each tuple t(r) in r do begin for each tuple t(s) in s do begin join(t(r),t(s) and append the result to the output end end for each block b(r) in r do begin for each block b(s) in s do begin join(b(r),b(s) and append the result to the output end end Cost of Block-Oriented Nested Loop • cost depends on the number of buffers and the buffer replacement strategy – fasten 1 block from the outer relation, M for the inner and LRU cost: b(r) + b(r)*b(s) assuming that b(s) > M – fasten M blocks from the outer relation, and 1 for the inner 1: read M from the outer cost: M blocks 2: for each block of s join 1 X M blocks cost: b(s) -”- 3: repeat with the next M blocks of r until all done repeated b(r)/M times cost = [ (M + b(s)]* b(r)/M = b(r)+[b(r)*b(s)]/M • which relation should be the outer? Buffer size M+1 Hash-Join Algorithm Details 1. Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. 2. Partition r similarly. 3. For each i: (a)Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. (b)Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in- memory hash index. Output the concatenation of their attributes. The hash-join of r and s is computed as follows. Relation s is called the build input and r is called the probe input. Example of Cost of Hash-Join • M= 20 blocks • bdepositor= 100 • bcustomer = 400. • depositor is the build input. Partition it into 5 partitions, each of size 20 blocks. This partitioning can be done in one pass. • partition customer into 5 partitions,each of size 80. This is also done in one pass. • Do the partition joins- for each j put 20 blocks of partition depositor(j) in memory, built the hash index, and do the probes with the 80 blocks of customer(j) • Therefore total cost, ignoring cost of writing partially filled blocks: – 3(100 + 400) = 1500 I/Os customer depositor Hash-Join algorithm (Cont.) • The value n and the hash function h is chosen such that each si should fit in memory. – Typically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2 – The probe relation partitions ri need not fit in memory • Recursive partitioning required if number of partitions n is greater than number of pages M of memory. – instead of partitioning n ways, use M – 1 partitions for s – Further partition the M – 1 partitions using a different hash function – Use same partitioning method on r – Rarely required: e.g., recursive partitioning not needed for relations of 1GB or less with memory size of 2MB, with block size of 4KB. Other Operations • Outer Joins – Left outer join easy – Right/Full outer join (may need some bookkeeping) • Duplicate elimination – Hard – Sort at the end and eliminate – Hash output and eliminate • Aggregates – Sum, count, min, max easily kept during execution – Avg = Sum / count – Std = sqrt(ssum/count) External Sorting with Sort-Merge ●external vs internal sorting: relation/file does not fit in memory ●create runs phase: ●merge-runs phase: ● ● this assumes that a block from each run can be kept in main memory. If not, then the same algorithm has to be applied in multiple passes repeat until done read M blocks of the relation (or rest if <=M) internal sort using any sort method, e.g QuickSort(M) write the sorted tuples into a run R data file end read one block from each run; merge tuples on the result; advance the pointer from the run you appended last; if the block of a run is empty, read the next one until all blocks of all runs are done External Merge Sort Cost • Cost analysis: – Initial number of runs: br/M – Total number of merge passes required: logM–1(br/M). – Block transfers for initial run creation is br +br=2br • for final pass, we don’t count write cost we ignore final write cost for all operations since the output of an operation may be pipelined to the display or to a parent operation without being written to disk. If pipelined, it will be counted in the cost of the follow up operator • Thus total number of block transfers for external sorting: 2 br ( logM–1(br / M)) + br = br ( 2 logM–1(br / M) + 1) – If M ≥ br/M (only one pass is required) the expression logM–1(br/M) =1 total cost = 3br – However, if M > br then this expression evaluates to 0 total cost =br ONLY
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved