Download Storage in Modern Computers: Understanding Primary, Secondary, and Tertiary Storage and more Slides Introduction to Database Management Systems in PDF only on Docsity! Database Management Systems Design Docsity.com Storage in Modern Computers • Storage can be classified into two categories – Volatile – Non-Volatile • Volatile is fast, expensive, limited in size and lost if Computer is turned off – CPU Cache – Main Memory • Non-volatile is cheaper, slower, with higher capacity and persistent – Flash drives – Magnetic Disks – Optical Disks – Tape Docsity.com Memory Hierarchy CACHE MAIN MEMORY MAGNETIC DISK TAPE OPTICAL DISK Price & Speed increases Reliability increases Docsity.com DBMS and Storage • DBMS will try to keep in memory frequently used data • Go to disk only when you have to – Primary vs Secondary Storage Performance • If you need to go to tape or optical disk, bring lots of data (e.g. a few dozen or hundred MBs) – Secondary vs Tertiary Storage Performance • Most DBMS use tertiary storage to bring dataset that are big and used on one or few seldom run queries – Ex: Satellite Images, old data from back ups • Magnetic Disk (or simply Disk) is the predominant method for storage in DBMS Docsity.com Performance in DBMS • The performance of a DBMS depends on: – CPU usage – I/O usage – Network usage • We shall concentrate on I/O • Disk I/O performance can be defined in terms: – Resource usage time: time using the disk – Response time: wall-clock time to complete the query – Number of I/Os: number of times an I/O operation is performed • Parallel I/O – bringing data from various disk simultaneously – Response time <> Resource usage time – In this case usually Response time << Resource usage time Docsity.com Few Notes on Disks • Too expensive and inefficient to bring a few bytes from disk • You often bring one ore more disk blocks • 1 block is the minimal amount of I/O done on disk – For a read or write operation (rw operation) – 1 I/O = 1 block , for read and write • 1 block consist of one or more disk sectors – Implementation of file system layer in DBMS defines this • Few magic numbers for block sizes – 512 bytes – 1KB – 4KB Docsity.com Cost of performing I/O • To access a single block the cost if defined as – Cost = seek time + rotational delay + transfer time • Seek Time – Time for the arm to get into the right track (or cylinder) – Mechanical movement cost • Rotational Delay – Time before the target block gets underneath the reading head • ½ of time for 1 disk rotation – Mechanical movement cost T f i Docsity.com Example • Let a disk have: – Average seek time of 11 ms – Rotational delay of 6 ms – Transfer rate of 10MB/sec – Block size of 1 KB • How much is the cost for one I/O? 111 6 10 / sec 1024*100017 10*1024*1024 17 0.10 17.10 DataSizeCost seek delay TransferRate KBCost ms ms MB Cost ms ms Cost ms ms ms = + + = + + = + = + = Recall that 1KB = 1024 bytes Docsity.com Example • How long would it take to read 20,000 blocks with random I/O? • How long would it take to read 20,000 blocks with ideal sequential I/O? Docsity.com Notes on Disks • Yet another nomenclature – On-line storage: secondary storage – Off-line storage: tertiary storage • Many disk put memory buffer on the disk – Allows this buffers to control the rate at which data are exchanged with main memory – Idea is to minimize amount of time waiting for disk – But, this has implication in terms of transactions and recovery • What is power goes out when the data was on the Docsity.com Disks as performance bottlenecks … • Microprocessor speed increase 50% per year. • Disk performance improvements – Access time decreases 10% per year – Transfer rate decreases 20% per year • Disk crash results in data loss. • Solution: Disk array – Have several disk behave as a single, large and very fast disk. • Parallel I/O Docsity.com Disk Striping – Block sized • Disk Striping can be used to partition the data in a file into equal-sized segments of a block size that are distributed over the disk array. Disk Array Controller Bus File Disk Blocks Docsity.com Data Allocation • Data is partitioned into equal sized segments – Striping unit • Each segment is stored in a different disk of the arrays • Typically, round-robin algorithm is used • If we have n disks, then block i is stored at disk – i mod n • Example: Array of 5 disks, and file of 1MB with a 4KB Striping unit – Disk 0: gets blocks: 0, 5, 10, 15, 20, … – Disk 1: gets blocks: 1, 6, 11, 16, 21, … Docsity.com Benefits of Striping • With Striping we can access data blocks in parallel! – issue a request to the proper disks to get the blocks • For example, suppose we have a 5-disk array with 4KB striping and disk blocks. Let F be a 1MB file. If we need to access partition 0, 11, 22, 23, then we need to ask: – Disk 0 for partition 0 at time t0 – Disk 1 for partition 11 at time t0 Dik 2 f titi 22 t ti t0 Docsity.com Time access estimates • Access time: seek time + rotational delay + transfer time • Disk used independently or in array: IBM Deskstar 14GPX 14.4 GB disk – Seek time: 9.1 milliseconds (msecs) – Rotational delay: 4.15 msecs – Tranfer rate: 13MB/sec • How does striping compares with a single disk? • Scenario: 1disk block(4KB) striping-unit, access Docsity.com Single Disk Access time • Total time = sum of time to read each partition • Time for partition 0: 9.1 msec + 4.15msec + 4KB/(13MB/1sec)*(1MB/1024KB )*(1000msec/1sec) = 9.1 msec + 4.15msec + 0.3 msecs = 13.55 msecs • Time for partition 11: 9.1 msec + 4.15msec + 4KB/(13MB/1sec)*(1MB/1024KB )*(1000msec/1sec) = 9.1 msec + 4.15msec + 0.3 msecs = 13.55 msecs • Time for partition 22: 9.1 msec + 4.15msec + 4KB/(13MB/1sec)*(1MB/1024KB )*(1000msec/1sec) = 9.1 msec + 4.15msec + 0.3 msecs = 13.55 msecs • Time for partition 23: 9.1 msec + 4.15msec + 4KB/(13MB/1sec)*(1MB/1024KB )*(1000msec/1sec) = 9.1 msec + 4.15msec + 0.3 msecs = 13.55 msecs • Total time: 4 * 13.55 msec = 54.20 msecs Docsity.com Striping Access Time • Total time: maximum time to complete any read quest. • Following same calculation as in previous slide: – Time for partition 0: 13.55 msec – Time for partition 11: 13.55 msec – Time for partition 22: 13.55 msec – Time for partition 23: 13.55 msec • Total time: – max{13.55msec, 13.55msec 13.55msec 13 55msec} = 13 55 msec Docsity.com