Download CS 6290: I/O and Storage Systems - Prof. Milos Prvulovic and more Study notes Computer Science in PDF only on Docsity! CS 6290
I/O and Storage
Milos Prvulovic
Georgia Collegeoc!
Tech | Gomputting
Storage Systems • I/O performance (bandwidth, latency) – Bandwidth improving, but not as fast as CPU – Latency improving very slowly – Consequently, by Amdahl’s Law: fraction of time spent on I/O increasing • Other factors just as important – Reliability, Availability, Dependability • Storage devices very diverse – Magnetic disks, tapes, CDs, DVDs, flash – Different advantages/disadvantages and uses Trends for Magnetic Disks • Capacity: doubles in approx. one year • Average seek time – 5-12ms, very slow improvement • Average rotational latency (1/2 full rotation) – 5,000 RPM to 10,000 RPM to 15,000 RPM – Improves slowly, not easy (reliability, noise) • Data transfer rate – Improves at an OK rate •New interfaces, more data per track Optical Disks • Improvement limited by standards – CD and DVD capacity fixed over years – Technology actually improves, but it takes time for it to make it into new standards • Physically small, Replaceable – Good for backups and carrying around Magnetic Tapes • Very long access latency – Must rewind tape to correct place for read/write • Used to be very cheap ($/MB) – It’s just miles of tape! – But disks have caught up anyway… • Used for backup (secondary storage) – Large capacity & Replaceable Buses in a System
r
CPU-memory bus
Cache Bus
adapter
Bus
adapter Main
memory
PCI bus
vo Bus vo
controller adapter controller
Graphics Network
output
vO vO
controller controller
© 2008 Elsevier Science (USA). Alll riahts reserved .
Bus Design Decisions • Split transactions – Traditionally, bus stays occupied between request and response on a read – Now, get bus, send request, free bus (when response ready, get bus, send response, free us) • Bus mastering – Which devices can initiate transfers on the bus – CPU can always be the master – But we can also allow other devices to be masters – With multiple masters, need arbitration CPU-Device Interface • Devices typically accessible to CPU through control and data registers • These registers can be either – Memory mapped • Some physical memory addresses actually map to I/O device registers • Read/write through LS/ST • Most RISC processors support only this kind of I/O mapping – Be in a separate I/O address space • Read/write through special IN/OUT instrs • Used in x86, but even in x86 PCs some I/O is memory mapped Failure Example • A programming mistake is a fault – An add function that works fine, except when we try 5+3, in which case it returns 7 instead of 8 – It is a latent error until activated • An activated fault becomes effective error – We call our add and it returns 7 for 5+3 • Failure when error results in deviation in behavior – E.g. we schedule a meeting for the 7th instead of 8th – An effective error need not result in a failure (if we never use the result of this add, no failure) Reliability and Availability • System can be in one of two states – Service Accomplishment – Service Interruption • Reliability – Measure of continuous service accomplishment – Typically, Mean Time To Failure (MTTF) • Availability – Service accomplishment as a fraction of overall time – Also looks at Mean Time To Repair (MTTR) • MTTR is the average duration of service interruption – Availability=MTTF/(MTTF+MTTR) Faults Classified by Cause • Hardware Faults – Hardware devices fail to perform as designed • Design Faults – Faults in software and some faults in HW – E.g. the Pentium FDIV bug was a design fault • Operation Faults – Operator and user mistakes • Environmental Faults – Fire, power failure, sabotage, etc. Disk Fault Tolerance with RAID • Redundant Array of Inexpensive Disks – Several smaller disks play a role of one big disk • Can improve performance – Data spread among multiple disks – Accesses to different disks go in parallel • Can improve reliability – Data can be kept with some redundancy RAID 0 • Striping used to improve performance – Data stored on disks in array so that consecutive “stripes” of data are stored on different disks – Makes disks share the load, improving •Throughput: all disks can work in parallel •Latency: less queuing delay – a queue for each disk • No Redundancy – Reliability actually lower than with single disk (if any disk in array fails, we have a problem) RAID 1 • Disk mirroring – Disks paired up, keep identical data – A write must update copies on both disks – A read can read any of the two copies • Improved performance and reliability – Can do more reads per unit time – If one disk fails, its mirror still has the data • If we have more than 2 disks (e.g. 8 disks) – “Striped mirrors” (RAID 1+0) • Pair disks for mirroring, striping across the 4 pairs – “Mirrored stripes” (RAID 0+1) • Do striping using 4 disks, then mirror that using the other 4 RAID 5 • Distributed block-interleaved parity – Like RAID 4, but parity blocks distributed to all disks – Read accesses only the data disk where the data is – A write must update the data block and its parity block • But now all disks share the parity update load RAID 6 • Two different (P and Q) check blocks – Each protection group has • N-2 data blocks • One parity block • Another check block (not the same as parity) • Can recover when two disks are lost – Think of P as the sum and Q as the product of D blocks – If two blocks are missing, solve equations to get both back • More space overhead (only N-2 of N are data) • More write overhead (must update both P and Q) – P and Q still distributed like in RAID 5