Download Database Management Systems: Storage, Indexing and File Organization - Prof. Kristen R. Le and more Study notes Database Management Systems (DBMS) in PDF only on Docsity! 1 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 1 Storage and Indexing Chapter 8 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 2 Data on External Storage - Review • File contains a (multi)set of records • Each record has a unique RID; sufficient to locate record on disk • Data read and written to disk by buffer manager • Main unit of transfer is a page • File made up of many pages; each page contains multiple records • Last class 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 3 File Organization • Method of arranging file of records on disk • File typically stores a relation • Many alternatives, with different strengths and weaknesses • Heap (random order) files: No particular order defined for records • Sorted Files: Records sorted based on one or more attributes • Indexes: Organize records using trees or hashing 2 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 4 Operations on File - Example • Employees (Name, Age, Salary) • Operations • Scan: Fetch all employees from disk • Equality Search: age = 21 • Range Selection: age >= 18 AND age < 65 • Insert a record • Delete a record • How do we organize the file to accomplish these tasks efficiently? 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 5 Indexes • A data structure that organizes data records on disk to optimize certain operations • Speed up selections on the search key field of the index (denoted k) • Any subset of the fields of a relation can be the search key for an index on the relation. • Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 6 Indexes • Index contains a collection of data entries • Data entry for search key k denoted k* • Alternative ways to store data entry k*: 1: Data entry k* is an actual record (with search key k) Index == File ! 2: Data entry k* is (k, rid) pair, where rid is refers to a record with search key k Actual data records stored in a different file 3: Data entry k* is (k, rid-list) pair, where rid-list refers to list of records with search key k • Choice of alternative for data entries is orthogonal to the indexing technique used to locate data entries with a given key value k. • Examples include: B+ trees, hash-based structures 5 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 13 Comparing File Organizations / Indexes - Example • Employees (Name, Age, Salary) • Operations • Scan: Fetch all employees from disk • Equality Search: age = 21 • Range Selection: age >= 18 AND age < 65 • Insert a record • Delete a record 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 14 Analysis of I/O Cost • For simplicity, ignore CPU cost • Important Factors for I/O Cost: • How many pages do we need to read? • Are I/Os sequential or non-sequential? • Textbook contains a slightly different analysis (more detail, but ignores distinction between sequential & non- sequential I/O) 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 15 Heap File • Scan: Read all pages in file • sequential • Equality Search: Read all pages in file • sequential • worst case • Range Selection: Read all pages in file • sequential • Insert: Easy • Delete: Searching + Easy 6 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 16 Clustered File (on Age) • Scan: Slightly more than heap file due to page occupancy • Equality Search: Traverse height of tree • non-sequential • Range Selection: • Traverse height of tree (non-sequential) • Scan leaf records (sequential) • Insert: Traverse height of tree (non-sequential) + write • Delete: Traverse height of tree (non-sequential) + write 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 17 Heap File w/ Unclustered Tree Index (on Age) • Scan: Same as heap file • Equality Search: Traverse height of tree • Non-sequential • Can be more costly if many records satisfy equality • Range Selection: • Traverse height of tree (non-sequential) • Scan leaf records (non-sequential) • Can be very expensive if many records satisfy range expression!! • Insert: Update index + update heap file • Delete: Update index + update heap file 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 18 Heap File w/ Unclustered Hash Index (on Age) • Scan: Same as heap file • Equality Search: Nearly constant • Range Selection: Hash index no use! • Scan heap file • Insert: Update index + update heap file • Delete: Update index + update heap file 7 2/10/09 EECS 484: Database Management Systems, Kristen LeFevre 19 Announcements • Optional Exercises (for Review): 8.1, 8.3, 8.11, 9.2, 9.3, 9.5, 9.7, 9.9, 9.12, 9.13, 9.14, 9.16, 9.17 • Sample Midterm posted on class website • Exam Review -- Friday during discussion • Midterm Exam -- Monday, in class