Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Storage Systems - Systems Architecture - Lecture Slides | CMSC 411, Study notes of Computer Science

Material Type: Notes; Professor: Sussman; Class: SYSTM ARCHITECTURE; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-d5q
koofers-user-d5q 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Storage Systems - Systems Architecture - Lecture Slides | CMSC 411 and more Study notes Computer Science in PDF only on Docsity! CMSC 411 - A. Sussman (from D. O'Leary) 1 Computer Systems Architecture CMSC 411 Unit 6 – Storage Systems Alan Sussman May 2, 2006 CMSC 411 - Alan Sussman 2 Administrivia • HW #5 due today, #6 due Thursday • Quiz 2 scheduled for May 9 – on Units 4 and 4b (ILP) – practice quiz posted by tomorrow • Read Chapter 7, except 7.8, 7.12, 7.13 • Online course evaluation available at https://www.courses.umd.edu/online_evalua tion CMSC 411 - Alan Sussman 3 Last week • Loop unrolling – to minimize pipeline stalls by scheduling multiple loop iterations together, still respecting data dependences – limited by # registers, instruction cache size, data and control dependences (including loop-carried ones) – problems include increased code size, slower compilation, work for programmer or compiler writer – data dependences – GCD test – did you get that? • Conditional instructions – turn control into data dependences – move dependence to end of pipeline – helps with scheduling for superscalar (moving instructions past branches – problems include using extra processor resources, slowing clock rate • Speculative instructions – only serious problem is preserving exception behavior – example is speculative load instruction, that fails if a subsequent store is to the same memory address (before the load check instruction) • IA-64 – supports compiler-based ILP • predicated instructions • deferred exceptions and load speculation CMSC 411 - Alan Sussman 4 Storage systems • We already know about four levels of storage: – registers – cache – memory – disk • but we've been a little vague on how these devices are interconnected • In this unit, we study – input/output units such as disks and tapes – buses to connect storage devices – I/O performance issues – design of file systems (won’t talk much about this) Disk and Tape Technologies CMSC 411 - Alan Sussman 6 (Hard) Disks • What it is: – a collection of 1-20 platters (like 2-sided CD's) – between 1 and 8 inches in diameter – 2.5 & 3.5 inch most common today – rotating on a central spindle – with 500-2500 tracks on each surface – divided into (maybe) 64 sectors • older disks: all tracks have the same number of sectors • current disks: outer tracks have more sectors • larger diameter: best retrieval times • smaller diameter: cheaper and uses less power CMSC 411 - A. Sussman (from D. O'Leary) 2 CMSC 411 - Alan Sussman 7 Disks (cont.) – Fig. 7.1 • Used for – file storage – slowest level of virtual memory during program execution CMSC 411 - Alan Sussman 8 Disks (cont.) • How information is retrieved: – Wait for previous requests to be filled Time = queuing delay – A movable arm is positioned at the correct cylinder Time = seek time – The system waits for the correct sector to appear under the arm Time = rotational latency – Then a magnetic head senses • the sector number • the information recorded in the sector • an error correction code CMSC 411 - Alan Sussman 9 Disks (cont.) – and the information is transferred to a buffer Time = transfer time – The retrieval is handled by a disk controller, which may impose some extra overhead Time = controller time • Because all of this is so expensive, might also read the next sector or two, hoping that the next information needed is located there (prefetch or read ahead) CMSC 411 - Alan Sussman 10 Example • Average disk access time (in millisec): average seek time + average rotational delay + transfer time + controller overhead1024 bytessector size .5 mscontroller overhead 8000 RPMrotation speed 10MB/sectransfer rate 5 msaverage seek time CMSC 411 - Alan Sussman 11 Example (cont.) • average seek time = 5 ms • average rotational delay = • transfer time = • controller overhead = .5 ms • Total: 5 + 3.75 + .1 + .5 = 9.35 ms ms RPSRPM 75.3 )60/000,8( 5.0 000,8 5.0 == ms bytes bytes MB KB 1.sec10 sec/10 10 sec/10 1 4 7 3 === − CMSC 411 - Alan Sussman 12 Technology gap between memory and disk – Fig. 7.5 CMSC 411 - A. Sussman (from D. O'Leary) 5 CMSC 411 - Alan Sussman 25 Failure rate vs. Availability • Failure rate: concerns whether any of the hardware is broken • Availability: concerns whether the system is usable, even if some pieces are broken • Example 1: Your bank can improve the availability of the ATM system by installing two ATM machines so that one is available even if one breaks • Example 2: Your bank can reduce the failure rate of the ATM system by installing a machine that does not break as often – Also increases the availability • Generally, hope that more complicated hardware improves availability and performance, but it also may increase the failure rate CMSC 411 - Alan Sussman 26 Example: Disk arrays • Suppose a machine has an array of 20 disks – Case 1: If distribute the data across the disks (striping), then all 20 disks must be working properly in order to access the data - but throughput can be improved – Case 2: If store 20 copies of the data, one copy per disk, have good availability: can access the data even if some disks fail • But reliability of the 20 disks is less than reliability of a single disk: the probability of one of the 20 disks failing is essentially 20 times the probability that a single disk will fail CMSC 411 - Alan Sussman 27 Disk arrays (cont.) • In Case 2, store multiple copies on multiple disks, called RAID: redundant arrays of inexpensive disks • RAID is actually not inexpensive (because of the cost of the controllers, power supplies, and fans), so often the “I” is said to stand for “independent” – More than 80% of non-PC disk drive sales are now RAID, a $19B industry – Typically store 2 copies, not 20 – Used when availability is critical, in applications such as: • airline reservations • medical records • stock market CMSC 411 - Alan Sussman 28 RAID – Fig. 7.17 • There are various levels of RAID, depending on the relative importance of availability, accuracy, and cost 2826 – P+Q redundancy widely used1815 - 4 w/distributed parity Network Appliance1814 - Block-interleaved parity Storage Concepts1813 - Bit-interleaved parity 4812 - Memory-style ECC EMC, Compaq, IBM8811 - Mirrored widely used0800 - Striped CompaniesCheck disks Example data disks # faults survived RAID level CMSC 411 - Alan Sussman 29 RAID levels 0 & 1 • One copy of data: RAID 0 – Data striped across a disk array • Two full copies of data (mirroring): RAID 1 – If one disk fails, go to other – Can also use this to distribute the load of READs – Most expensive RAID option • RAID 0 and 1 can be combined – 1+0 (or 10) – mirror pairs of disks, then stripe across pairs – 0+1 (or 01) – stripe across one set of half the disks, then mirror writes to both sets CMSC 411 - Alan Sussman 30 RAID 3 • Bit-interleaved parity: RAID 3 – One copy of the data, stored among several disks, and one extra disk to hold a parity bit (checksum) for the others • Example: Suppose have 4 data disks, and one piece of the data looks like this: Disk 1: 0 1 0 1 1 0 0 0 Disk 2: 0 1 1 1 0 1 1 0 Disk 3: 0 1 1 1 1 0 0 0 Disk 4: 0 0 0 1 0 1 0 1 – Then the parity bits are set by taking the sums mod 2: Disk 5: 0 1 0 0 0 0 1 1 CMSC 411 - A. Sussman (from D. O'Leary) 6 CMSC 411 - Alan Sussman 31 RAID 3 (cont.) • So if the data on one of the disks becomes corrupted, the parity bits on Disk 5 will be wrong, so can tell there has been a failure – and be able to fix it if know which disk failed • Disadvantage: Each data access must read from all 5 disks in order to retrieve the data and check for corruption – also can’t always tell where the error is (could even be on the parity disk) CMSC 411 - Alan Sussman 32 RAID 4 • Block-interleaved parity: RAID 4 – Same organization of data as RAID 3 but cheaper reads and writes – Read: Read one sector at a time, and count on the sector's own error detection mechanisms. – Write: In each write, note which bits are changing - this is enough information to change the parity bits without reading from the other disks CMSC 411 - Alan Sussman 33 RAID 4 example • If the original contents are Disk 1: 0 1 0 1 1 0 0 0 Disk 2: 0 1 1 1 0 1 1 0 Disk 3: 0 1 1 1 1 0 0 0 Disk 4: 0 0 0 1 0 1 0 1 Disk 5: 0 1 0 0 0 0 1 1 • And write Disk 2: 0 1 1 1 0 1 1 0 old Disk 2: 1 0 1 1 0 0 1 1 new • Then since bits 0, 1, 5, and 7 changed, need to flip those parity bits: Disk 5: 0 1 0 0 0 0 1 1 old Disk 5: 1 0 0 0 0 1 1 0 new CMSC 411 - Alan Sussman 34 RAID 5 – Fig. 7.19 • Disadvantage of RAID 4: Parity disk is a bottleneck, so it is better to interleave the parity information across all of the disks (RAID 5) CMSC 411 - Alan Sussman 35 RAID 6 • Also called P+Q redundancy – to allow recovery from a second failure, since parity schemes only can recover from one – need a second extra (check) disk – computation is more complicated than simple parity CMSC 411 - Alan Sussman 36 RAID summary • Higher throughput than single disk – in either MB/sec or I/Os/sec • Failure recovery easy • Allows taking advantage of small size and low power requirements of small disks, and still get these advantages – RAIDs now dominate large-scale storage systems • Note: No need to memorize the RAID levels – But you need to be able to explain how the example RAID levels work CMSC 411 - A. Sussman (from D. O'Leary) 7 I/O performance measures CMSC 411 - Alan Sussman 38 I/O performance measures • diversity: which I/O devices can connect to the system? • capacity: how many I/O devices can connect to the system? • bandwidth: throughput, or how much data can be moved per unit time • latency: response time, the interval between a request and its completion • High throughput usually means slow response time! CMSC 411 - Alan Sussman 39 Throughput vs. latency Fig. 7.24 Fig. 7.25 CMSC 411 - Alan Sussman 40 Improving performance (cont.) • Adding another server can decrease response time, if workload is held constant – but keeping work balanced between servers is difficult • To design a responsive system, must understand what the “typical” user wants to do with it • Each transaction consists of three parts: – entry time: the time for the user to make the request – system response time: the latency – think time: the time between system response and the next entry • Key observation is that a faster system produces a lower think time – see Fig. 7.26 CMSC 411 - Alan Sussman 41 Modeling computer performance • The usual way to model computer performance is using queuing theory (mathematics again) • Unfortunately, even queuing theory does not provide a very good model, so more complicated mathematics is now being applied (e.g., stochastic differential equations) • But, H&P only consider queuing models – and we don’t even have time to go into that Data Management Issues
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved