Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Systems: Disk Storage, Files, and Hashing Techniques - Prof. Shamkant Navathe, Study notes of Deductive Database Systems

A chapter from 'elmasri/navathe, fundamentals of database systems, fourth edition'. It covers various aspects of disk storage devices, files of records, and operations on files, including unordered files, ordered files, and hashed files. The document also discusses dynamic and extendible hashing techniques and raid technology.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-pbz
koofers-user-pbz 🇺🇸

10 documents

1 / 14

Toggle sidebar

Related documents


Partial preview of the text

Download Database Systems: Disk Storage, Files, and Hashing Techniques - Prof. Shamkant Navathe and more Study notes Deductive Database Systems in PDF only on Docsity! 11/10/2003 1 Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc. 11/10/2003 2 Chapter 13-3 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files – Dynamic and Extendible Hashing Techniques RAID Technology Chapter 13-4 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Disk Storage Devices (cont.) Preferred secondary storage device for high storage capacity and low cost. Data stored as magnetized areas on magnetic disk surfaces. A disk pack contains several magnetic disks connected to a rotating spindle. Disks are divided into concentric circular tracks on each disk surface. Track capacities vary typically from 4 to 50 Kbytes. 11/10/2003 5 Chapter 13-9 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Files of Records A file is a sequence of records, where each record is a collection of data values (or data items). A file descriptor (or file header ) includes information that describes the file, such as the field names and their data types, and the addresses of the file blocks on disk. Records are stored on disk blocks. The blocking factor BFR for a file is the (average) number of file records stored in a disk block. A file can have fixed-length records or variable-length records. Chapter 13-10 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Files of Records (cont.) File records can be unspanned (no record can span two blocks) or spanned (a record can be stored in more than one block). The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed. In a file of fixed-length records, all records have the same format. Usually, unspanned blocking is used with such files. Files of variable-length records require additional information to be stored in each record, such as separator characters and field types. Usually spanned blocking is used with such files. 11/10/2003 6 Chapter 13-11 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Operation on Files Typical file operations include: OPEN: Readies the file for access, and associates a pointer that will refer to a current file record at each point in time. FIND: Searches for the first file record that satisfies a certain condition, and makes it the current file record. FINDNEXT: Searches for the next file record (from the current record) that satisfies a certain condition, and makes it the current file record. READ: Reads the current file record into a program variable. INSERT: Inserts a new record into the file, and makes it the current file record. Chapter 13-12 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Operation on Files (cont.) DELETE: Removes the current file record from the file, usually by marking the record to indicate that it is no longer valid. MODIFY: Changes the values of some fields of the current file record. CLOSE: Terminates access to the file. REORGANIZE: Reorganizes the file records. For example, the records marked deleted are physically removed from the file or a new organization of the file records is created. READ_ORDERED: Read the file blocks in order of a specific field of the file. 11/10/2003 7 Chapter 13-13 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Unordered Files Also called a heap or a pile file. New records are inserted at the end of the file. To search for a record, a linear search through the file records is necessary. This requires reading and searching half the file blocks on the average, and is hence quite expensive. Record insertion is quite efficient. Reading the records in order of a particular field requires sorting the file records. Chapter 13-14 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Ordered Files Also called a sequential file. File records are kept sorted by the values of an ordering field. Insertion is expensive: records must be inserted in the correct order. It is common to keep a separate unordered overflow (or transaction ) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. A binary search can be used to search for a record on its ordering field value. This requires reading and searching log2 of the file blocks on the average, an improvement over linear search. Reading the records in order of the ordering field is quite efficient. 11/10/2003 10 Chapter 13-19 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Hashed Files (cont.) To reduce overflow records, a hash file is typically kept 70-80% full. The hash function h should distribute the records uniformly among the buckets; otherwise, search time will be increased because many overflow records will exist. Main disadvantages of static external hashing: - Fixed number of buckets M is a problem if the number of records in the file grows or shrinks. - Ordered access on the hash key is quite inefficient (requires sorting the records). Chapter 13-20 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Hashed Files (cont.) INSERT FIGURE 13.10 11/10/2003 11 Chapter 13-21 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Dynamic And Extendible Hashed Files Dynamic and Extendible Hashing Techniques Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records. These techniques include the following: dynamic hashing , extendible hashing , and linear hashing . Both dynamic and extendible hashing use the binary representation of the hash value h(K) in order to access a directory. In dynamic hashing the directory is a binary tree. In extendible hashing the directory is an array of size 2d where d is called the global depth. Chapter 13-22 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Dynamic And Extendible Hashing (cont.) The directories can be stored on disk, and they expand or shrink dynamically. Directory entries point to the disk blocks that contain the stored records. An insertion in a disk block that is full causes the block to split into two blocks and the records are redistributed among the two blocks. The directory is updated appropriately. Dynamic and extendible hashing do not require an overflow area. Linear hashing does require an overflow area but does not use a directory. Blocks are split in linear order as the file expands. 11/10/2003 12 Chapter 13-23 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Extendible Hashing INSERT FIGURE 13.11 Chapter 13-24 Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Parallelizing Disk Access using RAID Technology. Secondary storage technology must take steps to keep up in performance and reliability with processor technology. A major advance in secondary storage technology is represented by the development of RAID, which originally stood for Redundant Arrays of Inexpensive Disks. The main goal of raid is to even out the widely different rates of performance improvement of disks against those in memory and microprocessors.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved