Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Data Storage and Memory Hierarchy: Disks and Their Access Methods, Slides of Database Management Systems (DBMS)

An overview of data storage using docsity.com's strawman implementation, discussing the issues with storing tuples on disk, and proposes solutions by taking advantage of computer hardware and clever algorithms. It covers various aspects of data storage, including memory hierarchy, volatile vs. Nonvolatile storage, and disk access methods.

Typology: Slides

2012/2013

Uploaded on 04/26/2013

duurga
duurga 🇮🇳

4.6

(25)

125 documents

1 / 33

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Data Storage and Memory Hierarchy: Disks and Their Access Methods and more Slides Database Management Systems (DBMS) in PDF only on Docsity! Data Storage Memory Hierarchy Disks 1 Docsity.com Strawman Implementation • Use UNIX file system to store relations, e.g. – Students(name, id, dept) in file /usr/db/Students • One line per tuple, each component stored as character string, with # as a separator, e.g. – tuple could be: Smith#123#CS • Store schema in /usr/db/schema, e.g.: – Students#name#STR#id#INT#dept#STR 2 Docsity.com What's Wrong? • The storage of the tuples on disk is inflexible: if a student changes major from EE to ECON, entire file must be rewritten • Search is very expensive (read entire relation) • Query processing is "brute force" -- there are faster ways to do joins, etc. • Data is not buffered between disk and main memory • No concurrency control • No reliability in case of a crash 5 Docsity.com How to Fix these Problems • Take advantage of the characteristics of computer hardware with clever algorithms to do things better • We will cover – data storage (predominantly disks) – how to represent data elements – indexes – query optimization – failure recovery – concurrency control 6 Docsity.com Memory Hierarchy • cache • main memory • secondary storage (disk) • tertiary storage (tapes, CD-ROM) 7 faster, smaller, more expensive slower, larger, cheaper Docsity.com Secondary Storage • Usually disk • Divided logically into blocks, unit of transfer between main memory (called disk I/O) • Typical size: 100 Gbytes • Typical speed: 10 millisec (10-3 sec) • At least 100 times larger than main memory • Much slower than main memory and much much slower than cache: can execute several million instructions during one disk I/O 10 Docsity.com Tertiary Storage • Tape(s) • CD-ROM(s) • At least 1000 times slower than secondary storage • At least 1000 times larger than secondary storage 11 Docsity.com Volatile vs. Nonvolatile • Storage is volatile if the data is lost when the power is gone • Usually main memory is volatile • Usually secondary and tertiary storage is nonvolatile • Thus every change made to a database in main memory must be backed up on disk before it can be permanent. 12 Docsity.com Disk Controller • controls mechanical actuator that moves the heads in and out (radius, distance from spindle) – one track from each surface at the same radius forms a cylinder • selects a surface • selects a sector (senses when that sector is under the corresponding head) • transfers bits 15 Docsity.com Typical Values • Rotation speed: 5400 rmp • Number of platters: 5 • Number of tracks/surface: 20,000 • Number of sectors/track: 500 • Number of bytes/sector: thousands 16 Docsity.com Disk Latency for a Read • Time between issuing command to read a block and when contents of block appear in main memory: – time for processor and disk controller to process request, including resolving any contention (negligible) – seek time: time to move heads to correct radius (0 to 40 millisec) – rotational latency: time until first sector of block is under the head (5 millisec) – transfer time: until all sectors of the block have passed under the head; depends on rotation speed and size of block 17 Docsity.com Speeding Up Disk Accesses 1. Place blocks accessed together on same cylinder – reduces seek time and rotational latency 2. Divide data among several disks – head assemblies can move in parallel 3. Mirror a disk: make copies of it – speeds up reads: get data from disk whose head is closest to desired block – no effect on writes: write to all copies – also helps with fault tolerance 20 Docsity.com Speeding up Disk Accesses 4. Be clever about order in which read and write requests are serviced, i.e., algorithm in OS or DBMS or disk controller – Ex: elevator algorithm 5. Prefetch blocks to main memory in anticipation of future use (buffering) 21 Docsity.com Elevator Algorithm • Works well when there are many "independent" read and write requests, i.e., don't need to be done in a particular order, that are randomly distributed over the disk. • Disk head assembly sweeps in and out repeatedly • When heads pass a cylinder with pending requests, they stop to do the request • When reaching a point with no pending requests ahead, change direction 22 Docsity.com Coping with Intermittent Failures • Use redundant bits in each sector • Store checksums in the redundant bits • After a read, check if checksums are correct; if not then try again • After a write, can do a read and compare with value written, or be optimistic and just check the checksum of the read 25 Docsity.com Checksums • Suppose we use one extra bit, a parity bit. – if the number of 1's in the data bits is odd, then set the parity bit to 1, otherwise to 0 • This is not foolproof: 101 and 110 both have even parity so checksum would be 0 for both • Use n parity bits in the checksum: – parity bit 1 stores parity of every n-th bit, starting with first bit, – parity bit 2 stores parity of every n-th bit, starting with second bit, etc. – Probability of missing an error is 1/2n 26 Docsity.com Coping with Permanent Read/Write Errors • Stable storage policy: • Each "virtual" sector X is represented by two real sectors, XL and XR. • To write value v to X: – repeat {write v to XL, read from XL } until read's checksum is correct or exceed max # of tries – do the same thing with XR – if XL or XR is discovered to be bad, then must find a substitute 27 Docsity.com Coping with Disk Crashes • "Mean time to failure" of a disk is length of time by which 50% of such disks will have had a head crash • Goal is to have a much longer "mean time to data loss" for your system • Key idea: use redundancy • Discuss three such approaches next… 30 Docsity.com Mirroring (RAID Level 1) • Keep another copy of each disk: write to both, read from one. • Only way data can be lost is if second disk crashes while first is being repaired. • If mean time to crash of a single disk is 10 years and it takes 3 hours to repair a disk, then mean time to data loss is 146,000 years. 31 Docsity.com Parity Blocks (RAID Level 4) • Drawback of previous scheme is that you need double the number of disks. • Instead use one spare disk no matter how many data disks you have. • Block i of the spare disk contains the parity checks for block i of all the data disks. • If spare disk fails, get a new spare. • If a data disk fails, recompute its data from the other data disks and the spare.  32 Docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved