Download File Systems and Storage Systems: A Deep Dive into FFS, LFS, and RAID - Prof. Geoffrey M. and more Study notes Computer Science in PDF only on Docsity! 1 Lecture 13: FFS, LFS, RAID Geoffrey M. Voelker November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 2 We’ve looked at disks and file systems generically Now we’re going to look at some example file and storage systems BSD Unix Fast File System (FFS) Log-structured File System (LFS) Redundant Array of Inexpensive Disks 2 November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 3 The original Unix file system had a simple, straightforward implementation Easy to implement and understand But very poor utilization of disk bandwidth (lots of seeking) BSD Unix folks did a redesign (mid 80s?) that they called the Fast File System (FFS) Improved disk utilization, decreased response time McKusick, Joy, Leffler, and Fabry Now the FS from which all other Unix FS’s have been compared Good example of being device-aware for performance November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 4 Original Unix FS had two placement problems: 1. Data blocks allocated randomly in aging file systems Blocks for the same file allocated sequentially when FS is new As FS “ages” and fills, need to allocate into blocks freed up when other files are deleted Problem: Deleted files essentially randomly placed So, blocks for new files become scattered across the disk 2. Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks Both of these problems generate many long seeks 5 November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 9 #% ! Treat the disk as a single log for appending Collect writes in disk cache, write out entire collection in one large disk request » Leverages disk bandwidth » No seeks (assuming head is at end of log) All info written to disk is appended to log » Data blocks, attributes, inodes, directories, etc. Simple, eh? Alas, only in abstract November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 10 #! LFS has two challenges it must address for it to be practical 1. Locating data written to the log » FFS places files in a location, LFS writes data “at the end” 2. Managing free space on the disk » Disk is finite, so log is finite, cannot always append » Need to recover deleted blocks in old parts of log 6 November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 11 #&# FFS uses inodes to locate data blocks Inodes pre-allocated in each cylinder group Directories contain locations of inodes LFS appends inodes to end of the log just like data Makes them hard to find Approach Use another level of indirection: Inode maps Inode maps map file #s to inode location Location of inode map blocks kept in checkpoint region Checkpoint region has a fixed location Cache inode maps in memory for performance November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 12 #& ' LFS append-only quickly runs out of disk space Need to recover deleted blocks Approach: Fragment log into segments Thread segments on disk » Segments can be anywhere Reclaim space by cleaning segments » Read segment » Copy live data to end of log » Now have free segment you can reuse Cleaning is a big problem Costly overhead 7 November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 13 (% Redundant Array of Inexpensive Disks (RAID) A storage system, not a file system Patterson, Katz, and Gibson (Berkeley, ’88) Idea: Use many disks in parallel to increase storage bandwidth, improve reliability Files are striped across disks Each stripe portion is read/written in parallel Bandwidth increases with more disks November 14, 2001 CSE 120 – Lecture 13 – FFS, LFS, RAID 14 (%! Small files (small writes less than a full stripe) Need to read entire stripe, update with small write, then write entire segment out to disks Reliability More disks increases the chance of media failure (MTBF) Turn reliability problem into a feature Use one disk to store parity data » XOR of all data blocks in stripe Can recover any data block from all others + parity block Hence “redundant” in name Introduces overhead, but, hey, disks are “inexpensive”