Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Physical Storage Media and Disk Organization: Magnetic Disks and Optimization - Prof. Davi, Study notes of Principles of Database Management

An overview of physical storage media, focusing on magnetic disks and their optimization. Topics include the characteristics of magnetic disks, disk hierarchy, and methods for optimizing disk-block access such as disk-arm scheduling, non-volatile write buffers, file organization, and log-based file systems.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-dlw-1
koofers-user-dlw-1 🇺🇸

10 documents

1 / 30

Toggle sidebar

Related documents


Partial preview of the text

Download Physical Storage Media and Disk Organization: Magnetic Disks and Optimization - Prof. Davi and more Study notes Principles of Database Management in PDF only on Docsity! Recap of Feb 20: Database Design Goals, Normalization, Normal Forms • Goals for designing a database: a schema with: – simple, easy to phrase queries – avoids redundancies (repetition of information) – avoids anomalies – good performance • Normalization – decompose complex relations – Lossy decompositions – Functional Dependencies • Normal Forms: 1NF, 2NF, 3NF, BCNF – BCNF or 3NF: lossless decomposition in both • BCNF can’t always ensure dependency preservation • 3NF sometimes requires null values or redundant information Getting Physical: Storage and File Structure (Chapter 11) • Up until now we have examined database design from a high-level conceptual view, passing over actual implementation and underlying hardware. – Appropriate focus for database users – But hardware does have an influence on implementation, and implementation does have an influence on what conceptual designs will be more efficient and useful • Now we get physical -- examine physical storage media to give a background for later focus on implementation of the data models and languages already described Midterm Study and Homework #2 • Material you are responsible for: – All material presented in class before the midterm – Textbook sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11 (except 11.3 and 11.9) • The homework questions from assignment 1 (exercises 1.1, 1.2, 1.3, and 2.1-2.6) are all useful study aids, as are the questions from homework assignment #2: – 3.2, 3.3, 3.5, 3.6 -- 3.9, 3.16 – 4.1, 4.2, 4.4 -- 4.8 – 7.2, 7.4, 7.5, 7.11, 7.12, 7.15, 7.16, 7.21, 7.23 • Homework #2 is due Tuesday, March 11. I’ll try to have them back to you, graded, Thursday March 13 so you can use them as a study aid for the exam Tuesday, March 18. Classification of Physical Storage Media • Media are classified according to three characteristics: – speed of access – cost per unit of data – reliability • data loss on power failure or system crash • physical failure of the storage device • We can also differentiate storage as either – volatile storage – non-volative storage Physical Storage Media Overview (11.1) • Typical media available are: – Cache – Main memory – Flash memory – Mag disk – Optical storage (CD or DVD) – Tape storage Physical Storage Media -- Magnetic Disk • data is stored on a spinning disk and read/written magnetically • primary medium for long-term storage of data • typically stores entire database • data must be moved from disk to main memory for access, and written back for storage – much slower access than main memory (about which more later) • direct access -- possible to read data on disk in any order, unlike magnetic tape • capacities up to 100 gig – much larger capacity and cheaper cost/byte than main memory or flash memory – capacity doubles every two or three years • survives power failures and system crashes – disk failure can destroy data, but this is more rare than system crashes Physical Storage Media -- Optical Storage • Non-volatile; data is read optically from a spinning disk using a laser • CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms • Write-once, Read-many (WORM) optical disks used for archival storage (CD-R and DCD-R) • Multiple-write versions also available (CD-RW, DVD-RW, and DVD- RAM) • Reads and writes are slower than with magnetic disk • Juke-box systems available for storing large volumes of data – large numbers of removable disks – several drives – mechanism for automatic loading/unloading of disks Physical Storage Media -- Tape Storage • Non-volatile • used primarily for backup (to recover from disk failure) and for archival data • sequential access -- much slower than disk • very high capacity (40-300 GB tapes available) • tape can be removed from drive; storage costs much cheaper than disk, but drives are expensive; data is read optically from a spinning disk using a laser • Juke-box systems available for storing large volumes of data – e.g., remote sensing data, possibly hundreds of terabytes (1012 bytes) or even a petabyte (1015 bytes) Magnetic Disks (cont) • To read/write a sector – disk arm swings to position head on the right track – platter spins continually; data is read/written as sector passes under head • Head-disk assemblies – multiple disk platters on a single spindle (typically 2 to 4) – one head per platter, mounted on a common arm • Cylinder i consists of ith track of all the platters Magnetic Disks (cont) • Earlier generation disks were susceptible to head crashes – disk spins constantly at 60, 120, even 250 revolutions per second – head is very close to the surface; if it touches the surface it can scrape the recording medium off the surface, wiping out data and causing the removed medium to fly around, causing more head crashes – newer disks have less friable material; less subject to head crashes Magnetic Disks (cont) • Disk controller -- interfaces between the computer system and the disk drive – accepts high-level commands to read or write a sector – initiates actions such as moving the disk arm to the right track and actually reading or writing the data – computes and attaches checksums to each sector to verify that data is read back correctly – Ensures successful writing by reading back sector after writing it – Performs remapping of bad sectors • Multiple disks are connected to a computer system through a controller – controllers functionality (checksum, bad sector remapping) often carried out by individual disks, reducing load on controller • Two disk interface standards are ATA (AT attachment) and SCSI (Small Computer System Interconnect) Optimization of Disk-Block Access: Motivation • Requests for disk I/O are generated both by the file system and by the virtual memory manager • Each request specifies the address on the disk to be referenced in the form of a block number – a block is a contiguous sequence of sectors from a single track on one platter – block sizes range from 512 bytes to several K (4 -- 16K is typical) – smaller blocks mean more transfers from disk; larger blocks makes for more wasted space due to partially filled blocks – block is the standard unit of data transfer between disk to main memory • Since disk access speed is much slower than main memory access, methods for optimizing disk-block access are important Optimization of Disk-Block Access: Methods • Disk-arm Scheduling: requests for several blocks may be speeded up by requesting them in the order they will pass under the head. – If the blocks are on different cylinders, it is advantageous to ask for them in an order that minimizes disk-arm movement – Elevator algorithm -- move the disk arm in one direction until all requests from that direction are satisfied, then reverse and repeat – Sequential access is 1-2 orders of magnitude faster; random access is about 2 orders of magnitude slower Optimization of Disk-Block Access: Methods • Non-volatile write buffers – store written data in a RAM buffer rather than on disk – write the buffer whenever it becomes full or when no other disk requests are pending – buffer must be non-volatile to protect from power failure • called non-volatile random-access memory (NV-RAM) • typically implemented with battery-backed-up RAM – dramatic speedup on writes; with a reasonable-sized buffer write latency essentially disappears – why can’t we do the same for reads? (hints: ESP, clustering) Storage Access (11.5) • Basic concepts (some already familiar): – block-based. A block is a contiguous sequence of sectors from a single track; blocks are units of both storage allocation and data transfer – a file is a sequence of records stored in fixed-size blocks (pages) on the disk – each block (page) has a unique address called BID – optimization is done by reducing I/O, seek time, etc. – database systems seek to minimize the number of block transfers between the disk and memory. We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. – Buffer - portion of main memory used to store copies of disk blocks – buffer manager - subsystem responsible for allocating buffer space in main memory and handling block transfer between buffer and disk Buffer Management • The buffer pool is the part of the main memory alocated for temporarily storing disk blocks read from disk and made available to the CPU • The buffer manager is the subsystem responsible for the allocation and the management of the buffer space (transparent to users) • On a process (user) request for a block (page) the buffer manager: – checks to see if the page is already in the buffer pool – if so, passes the address to the process – if not, it loads the page from disk and then passes the address to the process – loading a page might require clearing (writing out) a page to make space • Very similar to the way virtual memory managers work, although it can do a lot better (why?) Buffer Replacement Strategies • Most operating systems use a LRU replacement scheme. In database environments, MRU is better for some common operations (e.g., join) – LRU strategy: replace the least recently used block – MRU strategy: replace the most recently used block • Sometimes it is useful to fasten or pin blocks to keep them available during an operation and not let the replacement strategy touch them – pinned block is thus a block that is not allowed to be written back to disk • There are situations where it is necessary to write back a block to disk even though the buffer space it occupies is not yet needed. This write is called the forced output of a block; useful in recovery situations • Toss-immediate strategy: free the space occupied by a block as soon as the final tuple of that block has been processed
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved