Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Management Systems: Storage, Indexing and File Organization - Prof. Kristen R. Le, Study notes of Database Management Systems (DBMS)

A chapter from the database management systems textbook by kristen lefevre. It covers the concepts of data storage on external media, file organization methods, and indexing techniques. The differences between heap files, sorted files, and indexed files, and the advantages and disadvantages of each. It also explains the concepts of primary and secondary indices, clustered and unclustered indices, and the differences between b+ tree and hash-based indexing.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-usr
koofers-user-usr 🇺🇸

5

(1)

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Database Management Systems: Storage, Indexing and File Organization - Prof. Kristen R. Le and more Study notes Database Management Systems (DBMS) in PDF only on Docsity! 1 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 1 Storage and Indexing Chapter 8 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 2 Data on External Storage - Review • File contains a (multi)set of records • Each record has a unique RID; sufficient to locate record on disk • Data read and written to disk by buffer manager • Main unit of transfer is a page • File made up of many pages; each page contains multiple records • Last class 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 3 File Organization • Method of arranging file of records on disk • File typically stores a relation • Many alternatives, with different strengths and weaknesses • Heap (random order) files: No particular order defined for records • Sorted Files: Records sorted based on one or more attributes • Indexes: Organize records using trees or hashing 2 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 4 Operations on File - Example • Employees (Name, Age, Salary) • Operations • Scan: Fetch all employees from disk • Equality Search: age = 21 • Range Selection: age >= 18 AND age < 65 • Insert a record • Delete a record • How do we organize the file to accomplish these tasks efficiently? 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 5 Indexes • A data structure that organizes data records on disk to optimize certain operations • Speed up selections on the search key field of the index (denoted k) • Any subset of the fields of a relation can be the search key for an index on the relation. • Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 6 Indexes • Index contains a collection of data entries • Data entry for search key k denoted k* • Alternative ways to store data entry k*: 1: Data entry k* is an actual record (with search key k) Index == File ! 2: Data entry k* is (k, rid) pair, where rid is refers to a record with search key k Actual data records stored in a different file 3: Data entry k* is (k, rid-list) pair, where rid-list refers to list of records with search key k • Choice of alternative for data entries is orthogonal to the indexing technique used to locate data entries with a given key value k. • Examples include: B+ trees, hash-based structures 5 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 13 Comparing File Organizations / Indexes - Example • Employees (Name, Age, Salary) • Operations • Scan: Fetch all employees from disk • Equality Search: age = 21 • Range Selection: age >= 18 AND age < 65 • Insert a record • Delete a record 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 14 Analysis of I/O Cost • For simplicity, ignore CPU cost • Important Factors for I/O Cost: • How many pages do we need to read? • Are I/Os sequential or non-sequential? • Textbook contains a slightly different analysis (more detail, but ignores distinction between sequential & non- sequential I/O) 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 15 Heap File • Scan: Read all pages in file • sequential • Equality Search: Read all pages in file • sequential • worst case • Range Selection: Read all pages in file • sequential • Insert: Easy • Delete: Searching + Easy 6 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 16 Clustered File (on Age) • Scan: Slightly more than heap file due to page occupancy • Equality Search: Traverse height of tree • non-sequential • Range Selection: • Traverse height of tree (non-sequential) • Scan leaf records (sequential) • Insert: Traverse height of tree (non-sequential) + write • Delete: Traverse height of tree (non-sequential) + write 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 17 Heap File w/ Unclustered Tree Index (on Age) • Scan: Same as heap file • Equality Search: Traverse height of tree • Non-sequential • Can be more costly if many records satisfy equality • Range Selection: • Traverse height of tree (non-sequential) • Scan leaf records (non-sequential) • Can be very expensive if many records satisfy range expression!! • Insert: Update index + update heap file • Delete: Update index + update heap file 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 18 Heap File w/ Unclustered Hash Index (on Age) • Scan: Same as heap file • Equality Search: Nearly constant • Range Selection: Hash index no use! • Scan heap file • Insert: Update index + update heap file • Delete: Update index + update heap file 7 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 19 Announcements • Optional Exercises (for Review): 8.1, 8.3, 8.11, 9.2, 9.3, 9.5, 9.7, 9.9, 9.12, 9.13, 9.14, 9.16, 9.17 • Sample Midterm posted on class website • Exam Review -- Friday during discussion • Midterm Exam -- Monday, in class
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved