Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Distributed File Systems - Advanced Operating Systems - Lecture Slides, Slides of Advanced Operating Systems

Main points of this lecture are: Distributed File Systems, Consistency and Replication, File Service, File Server Design, Sequence of Bytes, Sequence of Records, File Attributes, Local Copy of File, Issue of Buffering, Symbolic Links

Typology: Slides

2012/2013

Uploaded on 04/23/2013

atasi
atasi 🇮🇳

4.6

(26)

139 documents

1 / 61

Toggle sidebar

Related documents


Partial preview of the text

Download Distributed File Systems - Advanced Operating Systems - Lecture Slides and more Slides Advanced Operating Systems in PDF only on Docsity! CIS 620 Advanced Operating Systems Lecture 11 – Distributed File Systems, Consistency and Replication Docsity.com Distributed File Systems • File service vs. file server – The file service is the specification. – A file server is a process running on a machine to implement the file service for (some) files on that machine. – In a normal distributed system would have one file service but perhaps many file servers. • If have very different kinds of file systems we might not be able to have a single file service as perhaps some functions are not available. Docsity.com Distributed File Systems • Upload/download vs. remote access. – Upload/download means the only file services supplied are read file and write file. • All modifications done on a local copy of file. • Conceptually simple at first glance. • Whole file transfers are efficient (assuming you are going to access most of the file) when compared to multiple small accesses. • Not an efficient use of bandwidth if you access only a small part of a large file. • Requires storage on client. Docsity.com Distributed File Systems • What about concurrent updates? – What if one client reads and "forgets" to write for a long time and then writes back the "new" version overwriting newer changes from others? • Remote access means direct individual reads and writes to the remote copy of the file. – File stays on the server. – Issue of (client) buffering • Good to reduce number of remote accesses. • But what about semantics when a write occurs? Docsity.com Distributed File Systems – Note that meta-data is written for a read so if you want faithful semantics every client read must modify metadata on server or all requests for metadata (e.g ls or dir commands) must go to server. • Cache consistency question. • Directories – Mapping from names to files/directories. – Contains rules for names of files and (sub)directories. – Hierarchy i.e. tree • (hard) links Docsity.com Distributed File Systems – Imagine hard links pointing to directories (Unix does not permit this). cd ~ mkdir B; mkdir C mkdir B/D; mkdir B/E ln B B/D/oh-my – Now you have a loop with honest looking links. – Normally you can't remove a directory (i.e. unlink it from its parent) unless it is empty. • But when can have multiple hard links to a directory, you should permit removing (i.e. unlinking) one even if the directory is not empty. Docsity.com Distributed File Systems – So in the above example you could unlink B from A. – Now you have garbage (unreachable, i.e. unnamable) directories B, D, and E. – For a centralized system you need a conventional garbage collection. – For distributed system you need a distributed garbage collector, which is much harder. • Transparency – Location transparency • Path name (i.e. full name of file) does not say where the file is located. Docsity.com Distributed File Systems – Location Independence • Path name is independent of the server. Hence you can move a file from server to server without changing its name. • Have a namespace of files and then have some (dynamically) assigned to certain servers. This namespace would be the same on all machines in the system. – Root transparency • made up name • / is the same on all systems • This would ruin some conventions like /tmp Docsity.com Distributed File Systems – The binary name could contain the server name so that could directly reference files on other filesystems/machines • Unix doesn't do this – We could have symbolic names contain the server name • Unix doesn't do this either • VMS did something like this. Symbolic name was something like nodename::filename – Could have the name lookup yield multiple binary names. Docsity.com Distributed File Systems • Redundant storage of files for availability • Naturally must worry about updates – When visible? – Concurrent updates? • Whenever you hear of a system that keeps multiple copies of something, an immediate question should be "are these immutable?". If the answer is no, the next question is "what are the update semantics?” • Sharing semantics – Unix semantics - A read returns the value stored by the last write. Docsity.com Distributed File Systems • Actually Unix doesn't quite do this. – If a write is large (several blocks), do seeks for each – During a seek, the process sleeps (in the kernel) – Another process can be writing a range of blocks that intersects the blocks for the first write. – The result could be (depending on disk scheduling), that the result does not have a last write. • Perhaps Unix semantics means - A read returns the value stored by the last write, providing one exists. • Perhaps Unix semantics means - A write syscall should be thought of as a sequence of write-block syscalls and similar for reads. A read-block syscall returns the value of the last write-block syscall for that block Docsity.com Distributed File Systems – Can have "version numbers" • Old version may become inaccessible (at least under the current name) • With version numbers if you use name without number you get the highest numbered version • But really you do have the old (full) name accessible – VMS definitely did this • Note that directories are still mutable • Otherwise no create-file is possible Docsity.com Distributed File Systems • Distributed File System Implementation – File Usage characteristics – Measured under Unix at a university • Not obvious that the same results would hold in a different environment – Findings • 1. Most files are small (< 10K) • 2. Reading dominates writing • 3. Sequential accesses dominate • 4. Most files have a short lifetime Docsity.com Distributed File Systems • 5. Sharing is unusual • 6. Most processes use few files • 7. File classes with different properties exist – Some conclusions • 1 suggests whole-file transfer may be worthwhile (except for really big files). • 2+5 suggest client caching and dealing with multiple writers somehow, even if the latter is slow (since it is infrequent). • 4 suggests doing creates on the client Docsity.com Distributed File Systems • If yes, less communication • If no, more modular "cleaner” – Looking up a/b/c/ when a a/b a/b/c on different servers • Natural solution is for server-a to return name of server-a/b • Then client contacts server-a/b gets name of server- a/b/c etc. • Alternatively server-a forwards request to server-a/b who forwards to server-a/b/c. • Natural method takes 6 communications (3 RPCs) Docsity.com Distributed File Systems • Alternative is 4 communications but is not RPC – Name caching • The translation from a/b/c to the inode (i.e. symbolic to binary name) is expensive even for centralized systems. • Called namei in Unix and was once measured to be a significant percentage of all of kernel activity. • Later Unix added "namei caching" • Potentially an even greater time saver for distributed systems since communication is expensive. • Must worry about obsolete entries. Docsity.com Distributed File Systems • Stateless vs. Stateful – Should the server keep information between requests from a user, i.e. should the server maintain state? – What state? • Recall that the open returns an integer called a file descriptor that is subsequently used in read/write. • With a stateless server, the read/write must be self contained, i.e. cannot refer to the file descriptor. • Why? Docsity.com Caching • There are four places to store a file supplied by a file server (these are not mutually exclusive): – Server's disk • always done – Server's main memory • normally done • Standard buffer cache • Clear performance gain • Little if any semantics problems Docsity.com Caching – Client's main memory • Considerable performance gain • Considerable semantic considerations • The one we will study – Client’s disk • Not so common now with cheaper memory – Unit of caching • File vs. block • Tradeoff of fewer access vs. storage efficiency Docsity.com Caching – What eviction algorithm? • Exact LRU feasible because we can afford the time to do it (via linked lists) since access rate is low. • Where in client's memory to put cache? – The user's process • The cache will die with the process • No cache reuse among distinct processes • Not done for normal OS. • Big deal in databases – Cache management is a well studied DB problem Docsity.com Caching – Delayed write • Wait a while (30 seconds is used in some NFS implementations) and then send a bulk write message. • This is more efficient than a bunch of small write messages. • If file is deleted quickly, you might never write it. • Semantics are now time dependent (and ugly). Docsity.com Caching • Write on close – Session semantics • Fewer messages since more writes than closes. • Not beautiful (think of two files simultaneously opened). • Not much worse than normal (uniprocessor) semantics. The difference is that it (appears) to be much more likely to hit the bad case. – Delayed write on close • Combines the advantages and disadvantages of delayed write and write on close. Docsity.com Caching – Doing it "right”. • Multiprocessor caching (of central memory) is well studied and many solutions are known. • Use cache consistency (a.k.a. cache coherence) methods which are well-known. • Centralized solutions are possible. – But none are cheap. • Perhaps NSF is good enough and not enough reason to change (NFS predates cache coherence work). Docsity.com Replication • Transparency – If we can't tell files are replicated, we say the system has replication transparency – Creation can be completely opaque • i.e. fully manual • users use copy commands • if directory supports multiple binary names for a single symbolic name, – use this when making copies – presumably subsequent opens will try the binary names in order (so they are not opaque) Docsity.com Replication – Creation can use lazy replication. • User creates original – system later makes copies – subsequent opens can be (re)directed at any copy – Creation can use group communication. • User directs requests at a group. • Hence creation happens to all copies in the group at once. Docsity.com Replication • Update protocols – Primary copy • All updates are done to the primary copy. • This server writes the update to stable storage and then updates all the other (secondary) copies. • After a crash, the server looks at stable storage and sees if there are any updates to complete. • Reads are done from any copy. • This is good for reads (read any one copy). • Writes are not so good. – Can't write if primary copy is unavailable. Docsity.com Replication • Two updates start. U1 wants to write 1234, U2 wants to write 6789. • Both read version numbers and add 1 (get 11). • U1 writes A and U2 writes B at roughly the same time. • Later U1 writes B and U2 writes A. • Now both are at version 11 but A=6789 and B=1234. – Voting with ghosts • Often reads dominate writes so we choose RQ=1 (or at least RQ very small so WQ very large). Docsity.com Replication • This makes it hard to write. E.g. RQ=1 so WQ=n and hence can't update if any machine is down. • When one detects that a server is down, a ghost is created. • Ghost cannot participate in read quorum, but can in write quorum – write quorum must have at least one non-ghost • Ghost throws away value written to it • Ghost always has version 0 • When crashed server reboots, it accesses a read quorum to update its value Docsity.com Structured Peer-to-Peer Systems • Balancing load in a peer-to-peer system by replication. Docsity.com NFS • NFS - Sun Microsystems's Network File System. – "Industry standard", dominant system. – Machines can be (and often are) both clients and servers. – Basic idea is that servers export directories and clients mount them. • When server exports a directory, the subtree routed there is exported. • In Unix exporting is specified in /etc/exports Docsity.com NFS • In Unix mounting is specified in /etc/fstab – fstab = file system table. – In Unix w/o NFS what you mount are filesystems. • Two Protocols – Mounting • Client sends server message containing pathname (on server) of the directory it wishes to mount. • Server returns handle for the directory – Subsequent read/write calls use the handle – Handle has data giving disk, inode #, et al – Handle is not an index into table of actively exported directories. Why not? Docsity.com NFS – Because the table would be state and NFS is stateless. Can do this mounting at any time, often done at client boot time. • Automounting – File and directory access • Most Unix system calls supported • Open/close not supported – NFS is stateless • Do have lookup, which returns a file handle. But this handle is not an index into a table. Instead it contains the data needed. • As indicated previously, the stateless nature of NFS makes Unix locking semantics hard to achieve. Docsity.com NFS • Makes mount system call passing handle • Now the kernel takes over – Makes a v-node for the remote directory – Asks client code to construct an r-node – have v-node point to r-node • Open system call • While parsing the name of the file, the kernel (VFS layer) hits the local directory on which the remote is mounted (this part is similar to ordinary mounts of local filesystems). • Kernel gets v-node of the remote directory (just as would get i-node if processing local files) Docsity.com NFS • Kernel asks client code to open the file (given r-node) • Client code calls server code to look up remaining portion of the filename • Server does this and returns a handle (but does not keep a record of this). Presumably the server, via the VFS and local OS, does an open and this data is part of the handle. So the handle gives enough information for the server code to determine the v-node on the server machine. Docsity.com NFS • When client gets a handle for the remote file, it makes an r-node for it. This is returned to the VFS layer, which makes a v-node for the newly opened remote file. This v-node points to the r-node. The latter contains the handle information. • The kernel returns a file descriptor, which points to the v-node. – Read/write • VFS finds v-node from the file descriptor it is given. • Realizes remote and asks client code to do the read/write on the given r-node (pointed to by the v- node). Docsity.com NFS – Lessons learned (from AFS, but applies in some generality) • Workstations, i.e. clients, have cycles to burn – So do as much as possible on client • Cache whenever possible • Exploit usage properties – Several classes of files (e.g. temporary) – Trades off simplicity for efficiency • Minimize system wide knowledge and change – Helps scalability – Favors hierarchies Docsity.com NFS • Trust fewest possible entities – Try not to depend on the "kindness of strangers" • Batch work where possible Docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved