Download Input/Output Systems: Disk Systems, Dependability, and RAID Technologies - Prof. Jiang Li and more Study notes Computer Architecture and Organization in PDF only on Docsity! Jiang LiDept. of Systems & Computer Science, Howard Univ. 1 Input/Output, Disk Systems (8.1, 8.2, 8.4 ~ 8.7, 8.9) Dr. Jiang Li Slides adapted from various sources (e.g. VT, RPI, UCSB etc) Jiang LiDept. of Systems & Computer Science, Howard Univ. 2 Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections Jiang LiDept. of Systems & Computer Science, Howard Univ. 5 Dependability Measures Reliability: mean time to failure (MTTF) A measure of the continuous service accomplishment Service interruption: mean time to repair (MTTR) Mean time between failures MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) A measure of the service accomplishment with respect to the alternation between accomplishment and interruption. Improving Availability Increase MTTF: fault avoidance, fault tolerance, fault forecasting Reduce MTTR: improved tools and processes for diagnosis and repair Jiang LiDept. of Systems & Computer Science, Howard Univ. 6 Disk Storage Nonvolatile, rotating magnetic storage Jiang LiDept. of Systems & Computer Science, Howard Univ. 7 Magnetic Disks A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material on both sides), with diameters between 1- 3.5 inches Each platter is comprised of concentric tracks (5- 30K) and each track is divided into sectors (100 – 500 per track, each about 512 bytes) Each sector records Sector ID, data (512 bytes, 4096 bytes proposed), error correcting code (ECC, Used to hide defects and recording errors), synchronization fields and gaps A movable arm holds the read/write heads for each disk surface and moves them all in tandem – a cylinder of data is accessible at a time Jiang LiDept. of Systems & Computer Science, Howard Univ. 10 Disk Access Time Example Average seek time: 6ms Transfer rate: 50 MB/sec Controller overhead is 0.2ms What is the average time to read or write a 512- byte sector for a disk of 10000RPM? Average disk access time = Average seek time + Average rotational delay + Transfer time + Controller overhead = 6.0ms + 0.5/10000RPM/(60000ms/min) + 0.5KB/50MB/sec/1000 + 0.2ms = 9.2ms Jiang LiDept. of Systems & Computer Science, Howard Univ. 11 Disk Performance Issues Manufacturers quote average seek time Based on all possible seeks Locality and OS scheduling lead to smaller actual average seek times Smart disk controller allocate physical sectors on disk Present logical sector interface to host Disk/motherboard interface SCSI, ATA, SATA Disk drives include caches Prefetch sectors in anticipation of access Avoid seek and rotational delay Jiang LiDept. of Systems & Computer Science, Howard Univ. 12 Flash Storage Nonvolatile semiconductor storage 100× – 1000× faster than disk Smaller, lower power, more robust But more $/GB (between disk and DRAM) Jiang LiDept. of Systems & Computer Science, Howard Univ. 15 RAID 1 & 2 RAID 1: Mirroring N + N disks, replicate data Write data to both data disk and mirror disk On disk failure, read from mirror RAID 2: Error correcting code (ECC) N + E disks (e.g., 10 + 4) Split data at bit level across N disks Generate E-bit ECC Too complex, not used in practice Jiang LiDept. of Systems & Computer Science, Howard Univ. 16 RAID 3: Bit-Interleaved Parity N + 1 disks Data striped across N disks at byte level Redundant disk stores parity For example: with 9 disks, bit 0 is in disk-0, bit 1 is in disk-1, …, bit 7 is in disk-7; disk-8 maintains parity for all 8 bits Read access Read all disks Write access Generate new parity and update all disks On failure Use parity to reconstruct missing data Not widely used Jiang LiDept. of Systems & Computer Science, Howard Univ. 17 RAID 4: Block-Interleaved Parity N + 1 disks Data striped across N disks at block level Redundant disk stores parity for a group of blocks Read access Read only the disk holding the required block Write access Just read disk containing modified block, and parity disk Calculate new parity, update data disk and parity disk On failure Use parity to reconstruct missing data Not widely used Jiang LiDept. of Systems & Computer Science, Howard Univ. 20 RAID 6: P + Q Redundancy N + 2 disks Like RAID 5, but two lots of parity Greater fault tolerance through more redundancy Multiple RAID More advanced systems give similar fault tolerance with better performance Jiang LiDept. of Systems & Computer Science, Howard Univ. 21 RAID Summary RAID can improve performance and availability RAID 1-5 can tolerate a single fault – mirroring (RAID 1) has a 100% overhead, while parity (RAID 3, 4, 5) has modest overhead Can tolerate multiple faults by having multiple check functions – each additional check can cost an additional disk (RAID 6) RAID 6 and RAID 2 (memory-style ECC) are not commercially employed High availability requires hot swapping Assumes independent disk failures Too bad if the building burns down! See “Hard Disk Performance, Quality and Reliability” http://www.pcguide.com/ref/hdd/perf/index.htm Jiang LiDept. of Systems & Computer Science, Howard Univ. 22 Interconnecting Components Need interconnections between CPU, memory, I/O controllers Bus: shared communication channel Parallel set of wires for data and synchronization of data transfer Can become a bottleneck Performance limited by physical factors Wire length, number of connections More recent alternative: high-speed serial connections with switches Like networks Jiang LiDept. of Systems & Computer Science, Howard Univ. 25 I/O Bus Examples Firewire USB 2.0 PCI Express Serial ATA Serial Attached SCSI Intended use External External Internal Internal External Devices per channel 63 127 1 1 4 Data width 4 2 2/lane 4 4 Peak bandwidth 50MB/s or 100MB/s 0.2MB/s, 1.5MB/s, or 60MB/s 250MB/s/lane 1×, 2×, 4×, 8×, 16×, 32× 300MB/s 300MB/s Hot pluggable Yes Yes Depends Yes Yes Max length 4.5m 5m 0.5m 1m 8m Standard IEEE 1394 USB Implementers Forum PCI-SIG SATA-IO INCITS TC T10 Jiang LiDept. of Systems & Computer Science, Howard Univ. 26 P4 Processor Memory Controller Hub (North Bridge) I/O Controller Hub (South Bridge) Main Memory Graphics output 1 Gb Ethernet CD/DVD Tape Disk System bus 800 MHz, 6.4 GB/sec 266 MB/sec DDR 400 3.2 GB/sec 2.1 GB/sec 266 MB/sec Serial ATA 150 MB/s USB 2.0 60 MB/s 100 MB/s 100 MB/s Typical x86 PC I/O System Jiang LiDept. of Systems & Computer Science, Howard Univ. 27 I/O Management I/O is mediated by the OS Multiple programs share I/O resources Need protection and scheduling I/O causes asynchronous interrupts Same mechanism as exceptions I/O programming is fiddly OS provides abstractions to programs Jiang LiDept. of Systems & Computer Science, Howard Univ. 30 Polling Periodically check I/O status register If device ready, do operation If error, take action Common in small or low-performance real-time embedded systems Predictable timing Low hardware cost In other systems, wastes CPU time Jiang LiDept. of Systems & Computer Science, Howard Univ. 31 Interrupts When a device is ready or error occurs Controller interrupts CPU Interrupt is like an exception But not synchronized to instruction execution Can invoke handler between instructions Cause information often identifies the interrupting device Priority interrupts Devices needing more urgent attention get higher priority Can interrupt handler for a higher priority interrupt Jiang LiDept. of Systems & Computer Science, Howard Univ. 32 I/O Data Transfer Polling and interrupt-driven I/O CPU transfers data between memory and I/O data registers Time consuming for high-speed devices Direct memory access (DMA) OS provides starting address in memory I/O controller transfers to/from memory autonomously Controller interrupts on completion or error Jiang LiDept. of Systems & Computer Science, Howard Univ. 35 Measuring I/O Performance I/O performance depends on Hardware: CPU, memory, controllers, buses Software: operating system, database management system, application Workload: request rates and patterns I/O system design can trade-off between response time and throughput Measurements of throughput often done with constrained response-time Jiang LiDept. of Systems & Computer Science, Howard Univ. 36 Transaction Processing Benchmarks Transactions Small data accesses to a DBMS Interested in I/O rate, not data rate Measure throughput Subject to response time limits and failure handling ACID (Atomicity, Consistency, Isolation, Durability) Overall cost per transaction Transaction Processing Performance Council (TPC) benchmarks (www.tpc.org) TPC-APP: B2B application server and web services TPC-C: on-line order entry environment TPC-E: on-line transaction processing for brokerage firm TPC-H: decision support — business oriented ad-hoc queries Jiang LiDept. of Systems & Computer Science, Howard Univ. 37 File System & Web Benchmarks SPEC System File System (SFS) Synthetic workload for NFS server, based on monitoring real systems Results Throughput (operations/sec) Response time (average ms/operation) SPEC Web Server benchmark Measures simultaneous user sessions, subject to required throughput/session Three workloads: Banking, Ecommerce, and Support Jiang LiDept. of Systems & Computer Science, Howard Univ. 40 I/O System Design Example A CPU sustains 3 billion instructions per second Average 100000 instructions in the OS per I/O operation The user program runs 200000 instructions per I/O operation A memory backplane bus capable of sustaining a transfer rate of 1GB/sec SCSI Ultra320 controllers with a transfer rate of 320MB/sec and accommodating up to 7 disks Disk drives with a read/write bandwidth of 75MB/sec and an average seek plus rotational latency of 6ms The workload consists of 64KB reads (the blocks are sequential on a track), i.e. each I/O transfers 64KB What is the max sustainable I/O rate and the number of disks and SCSC controllers required? Jiang LiDept. of Systems & Computer Science, Howard Univ. 41 I/O System Design Example (cont’d) Max I/O rate of CPU = 3 109 / (200000+100000) = 10000 I/Os per sec Max I/O rate of bus = 109 / (64 103) = 15625 I/Os per sec The CPU is the bottleneck Time per I/O at disk = 6 ms + 64KB / (75MB/sec) 6.9 ms Each disk an complete 1000 / 6.9 146 I/Os per sec, so we need 10000 / 146 69 disks Jiang LiDept. of Systems & Computer Science, Howard Univ. 42 I/O System Design Example (cont’d) Transfer rate required for the SCSI controller 64KB / 6.9 ms 7 64.9 MB/sec < 320 MB/sec We need 69/7 10 SCSI controllers