Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Investigating Performance Benefits in Disk Storage: Coordination of Disks and RAIDs, Papers of Theatre

Florida State University (FSU)Theatre

The potential performance benefits of coordinating disks and raids at the low level, focusing on allocation, access granularity, and scheduling. The authors propose a way to coordinate disk queues within a raid and discuss various techniques to improve disk queue performance. Topics include read-ahead optimization, i/o scheduling, and track-aligned accesses.

Typology: Papers

Pre 2010

Uploaded on 08/31/2009

koofers-user-hdv-1 🇺🇸

10 documents

1 / 44

Partial preview of the text

Download Investigating Performance Benefits in Disk Storage: Coordination of Disks and RAIDs and more Papers Theatre in PDF only on Docsity! THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A Behind-the-Scenes Story on Applying Cross-Layer Coordination to Disks and RAIDs By JIN QIAN A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science Degree Awarded: Fall Semester, 2007 ii The members of the Committee approve the Thesis of Jin Qian defended on October 23, 2007. ______________________ An-I Andy Wang Professor Directing Thesis ______________________ Theodore P. Baker Committee Member ______________________ Xin Yuan Committee Member Approved: ______________________ David Whalley, Chair, Department of Computer Science The Office of Graduate Studies has verified and approved the above named committee members. v TABLE OF CONTENTS List of Tables .................................................................................... vi List of Figures .................................................................................... vii Abstract .......................................................................................... viii 1. INTRODUCTION............................................................................... 1 1.1 Background .............................................................................. 2 1.1.1 Disk Access Path ............................................................. 2 1.1.2 Software RAIDs ............................................................... 3 1.1.3 Cross-layer Coordination ................................................. 4 1.2 Goals and Approaches............................................................. 5 2. RECREATING TRACK-ALIGNED EXTENTS ................................... 8 2.1 Extracting Disk Characteristics................................................. 8 2.2 Exploiting Track Boundaries .................................................... 15 2.3 Verification of the Performance Benefits.................................. 17 3. TRACK-ALIGNED RAIDS ................................................................. 20 3.1 Implementation......................................................................... 21 3.2 Verification of Performance Benefits ........................................ 21 4. COORDINATED I/O SCHEDULER................................................... 25 4.1 Design Space........................................................................... 26 4.2 Implementation ........................................................................ 26 4.3 Performance Evaluation........................................................... 27 5. LESSONS LEARNED ....................................................................... 30 6. CONCLUSION .................................................................................. 32 REFERENCES .................................................................................... 33 BIOGRAPHICAL SKETCH .................................................................... 36 vi LIST OF TABLES Table 2.1: Hardware/software experimental specifications .................... 8 Table 2.1.1: Different track sizes of Maxtor 10K V drives....................... 13 vii LIST OF FIGURES Figure 1.1: A software RAID-5 with four disks........................................ 3 Figure 2.1.1: Bandwidth comparison ..................................................... 9 Figure 2.1.2: Serpentine track numbering ............................................. 11 Figure 2.1.3: Elapsed time from different offsets .................................. 13 Figure 2.1.4: CDF of disk access times ................................................ 14 Figure 2.3.1: Bonnie single disk bandwidth comparisons ...................... 17 Figure 2.3.2: Diff single disk speed comparisons .................................. 18 Figure 2.3.3: Startup latency comparisons on single disk ...................... 19 Figure 3.1: Bonnie bandwidth comparisons of RAIDs .......................... 23 Figure 3.2: Diff elapsed time comparisons of RAIDs ............................ 23 Figure 3.3: Startup latency comparisons of RAIDs ................................ 24 Figure 4.3.1: Startup latency comparisons with coordinated queues .... 28 Figure 4.3.2: Startup latency CDF comparisons of different weights...... 29 Figure 4.3.3: Disk head location deviation comparisons ...................... 29 2 the track-aligned extents work [Schindler et al. 2002] on track-aligned accesses to a disk and a track-aligned RAID. By clean-room implementations, we meant implementing mechanisms to achieve the effects of those cross-layer optimizations from standard operating system distributions. By doing so, we had the opportunities to explore various design issues and alternatives. We also proposed and implemented a way to coordinate disk queues within a RAID. Applying the cross-layer coordination approach involves decisions of surprisingly greater intricacy than we expected from what has previously been reported in the research literature. Considering the quickly evolving physical characteristics of modern disks and associated hardware and increasing complexity of I/O subsystems, these intricacies are likely to grow. Thus, we report our experience to the research community to better understand the low-level decisions required to apply such an approach to modern disks and RAIDs. 1.1 Background 1.1.1 Disk Access Path In Linux, a user-level application issues read and write system calls to a file system in order to access file data on disk. Knowing that most file content is accessed sequentially, the read-ahead optimization is typically applied to most file access requests, so that subsequent file content is fetched from the disk in advance to mask the high mechanical latencies of disks. The file system then locates various data blocks on the disk belonging to the file through the i-node data structure. Subsequently, various requests go through an I/O scheduler (commonly referred as the elevator algorithm) that merges and sorts them according to disk block locations. Some merged requests can be from multiple files. Finally, requests are passed down to a device driver and translated to disk commands. After receiving a request, a disk first checks whether the requested data block is inside its hardware cache. If not, the firmware and circuits convert the requested logical 3 block location into a physical disk location, identified by a surface number, a track number, and a sector offset within that track. In the end, the disk commands its arm to locate the right track and begins to transfer data as soon as the target sector rotates under the disk head. 1.1.2 Software RAIDs A software RAID under Linux sits between the file system layer and the device driver layer. After a file system translates a file request into block requests, a block request is first sent to a multi-device driver (e.g., RAID-5), which is responsible for gathering, remapping, and forwarding requests to individual disks within the RAID. The multi-device driver also reorders, merges, and splits requests as needed to improve overall performance. The request queue controlled by the I/O scheduler and associated with the RAID device driver can be plugged at times, so that the pending requests in the queue wait for additional requests for some time to increase the opportunities for effective reordering. The queue can also be unplugged when forwarding requests to underlying per-disk device drivers. The per-disk software device driver handles vendor-specific details of hard disks, and is associated with a request queue. Therefore, each device driver independently schedules and optimizes its disk performance, without coordinating with other disks. Figure 1.1: A software RAID-5 with four disks 4 Figure 1.1 shows a software RAID-5 with four disks. Each request is sent to a RAID-5 multi-device layer, which splits (as needed) and forwards the request(s) to per-disk device drivers. Within the RAID-5, Ap is the parity for A1, A2, and A3. Bp is the parity for B1, B2, and B3, and so on. 1.1.3 Cross-layer Coordination Since the early file systems, storage designers have realized the power of tailoring file system design to the underlying disk characteristics. FFS [McKusick et al. 1984] exploits the spatial locality of accessing consecutive and nearby blocks on disks to improve performance. LFS [Matthews et al. 1997] exploits both spatial locality for sequential writes and temporal locality for nearby disk-block reads for performance improvement. Within recent years, cross-layer coordination with low-level storage has begun to attract research attention, and it has demonstrated significant performance improvements. For example, by exposing the track boundaries of disks, file systems and cache prefetching can effectively allocate and access data in a track-aligned manner [Schindler et al. 2002]. The file system layer can also gain semantic knowledge of specific applications (e.g., database) to optimize disk layout [Sivathanu et al. 2005]. In addition to the track-alignment, Lumb et al. [2000] exploit the rotational bandwidth that can be extracted during seeks in order to perform low-priority disk requests. Sivathanu et al. [2003] and Sivathanu et al. [2005] made low-level storage aware of the file systems and database applications running above them, so the data location policies could be optimized according to the semantic knowledge of the file system and the database data structures. Atropos [Schindler et al. 2004] stripes data across disks in a track-aligned manner and supports two dimensional data structures via efficient diagonal access to blocks on adjacent tracks. Schlosser et al. [2005] exploit the low seek time for nearby tracks in order to place multidimensional data sets. Exposing the use of many disks to the file-system level leads to many parallel file system designs. For instance, PVFS [Carnes et al. 2000] modifies the semantics of file 7 case delays by applying track-aligned accesses to disks to reduce the expected worst-case rotational delay for accessing a stripe. To address the worst-case queuing time among disks in a RAID, we designed and implemented a way to coordinate disk queues, with an aim for a striped request to be sent to disks at approximately the same time. This coordination can also potentially improve the synchrony of disk head locations, ameliorating the worst-case seek time among disks. 8 CHAPTER 2 RECREATING TRACK-ALIGNED EXTENTS The three main tasks to duplicate the track-aligned extents work are (1) finding the track boundaries and the zero-latency access disk characteristics, (2) making use of such information, and (3) verifying its performance benefits. The hardware and software experimental settings are summarized in Table 2.1. Table 2.1: Hardware/software experimental specifications Hardware/softwar e Configurations Processor Pentium D 830, 3GHz, 16KB L1 cache, 2x1MB L2 cache Memory 128 MB or 2GB RAID controller Adaptec 4805SAS Disks tested Maxtor SCSI 10K5 Atlas, 73GB, 10K RPM, 8MB on- disk cache [Maxtor 2004] Seagate Cheetah 15K.4 Ultra320 SCSI, 36GB, 8MB on-disk cache [Seagate 2007] Fujitsu MAP3367NC, 10K RPM, 37GB, with 8MB on- disk cache [Fujitsu 2007] Operating system Linux 2.6.16.9 File system Ext2 [Card et al. 1999] 2.1 Extracting Disk Characteristics Simple request scanning: Since the reported performance benefits for track alignments are high, conceivably, a user-level program can observe timing variations to 9 identify track boundaries. A program can incrementally issue reads, requesting one more sector than before, staring from the 0th sector. As the request size grows, the disk bandwidth should first increase and then drop as the request size exceeds the size of the first track (due to track switching overhead). The process can then repeat, starting from the first sector of the previously found track. The inefficiency of this algorithm can be reduced via applying binary search. To reduce disturbances introduced by various hardware and software components of the disk data path, we used DIRECT_IO flag to bypass the Linux page cache, and we accessed the disk as a raw device to bypass the file system. We used a modified aacraid driver code to bypass the SCSI controller, and we used sdparm to disable the read cache (RCD=1) and prefetch (DPTL=0) of the disk. As a sanity check, we also attempted to start all reads from an arbitrary position of the 256th sector. Additionally, we attempted to start each read with a random sector between 0 and 512, with each succeeding request size increasing by 1 sector (512 bytes). Figure 2.1.1 shows the resulting graph of bandwidth comparison for different read request sizes from different starting sectors on a Maxtor disk. 0 10 20 30 40 50 60 70 0 1000 2000 3000 4000 5000 6000 7000 Request size (sectors) Bandwidth (MB / sec) Starting from the 0th sector 256th sector random sectors Figure 2.1.1: Bandwidth comparison. 12 Variants of this serpentine numbering scheme [Anderson 2003] are observed in Seagate [2007] and Fujitsu [2007] drives as well. One can conjecture this numbering scheme in relation to the elevator and scanning-based IO schedulers. In terms of performance characteristics, one might expect additional timing variations due to the track numbering system, in addition to track boundaries. Second, the number of sectors contained in each track is different between the top and bottom surfaces, even for the same track number. For example, for a Maxtor drive, a track on the top surface of track 0 may contain 1,144 sectors, and the bottom surface of the track 0 may contain 1,092 sectors. One explanation is that certain sectors are spares. By having spares within each track, bad sectors can be remapped without introducing additional seek delays. In the context of track alignment, this finding implies additional bookkeeping for each disk surface. Third, the track size differs even for the same disk model from the same vendor. In a batch of 6 Maxtor 10K V drives purchased at the same time, we found 4 different LBA numbering schemes (Table 2.1.1). The implication is that track extraction cannot be performed once per disk model. It potentially needs to be performed on every disk. Track size differs even in the same zone on the same surface, though rarely and only slightly. We saw that some tracks begin on their second sectors, that is, LBA skips the first sector of that track. Due to all these irregular anomalies, we are no longer able to calculate track boundary with zone information but have to extract all tracks. 13 Table 2.1.1: Different Track Sizes of Maxtor 10K V Drives. Serial number Surface 0, outer most track Surface 1, outer most track J20 Q3 CZK 1144 sectors 1092 sectors J20 Q3 C0K 1092 sectors 1144 sectors J20 Q3 C9K 1092 sectors 1144 sectors J20 TK 7GK 1025 sectors 1196 sectors J20 TF S0K 1060 sectors 1170 sectors J20 TF MKK 1060 sectors 1170 sectors Track boundary verifications: To verify the track information extracted via the SCSI diagnostic commands, we wrote a program to measure the elapsed time to access 64 sectors of data with shifting offsets from random track boundaries. The use of 64 sectors eases the visual identifications of track boundaries. We measured tracks only from the top surface within the first zone of a Maxtor disk, so we can simplify our experiment by accessing mostly a track size of 1,144 sectors. 0 2 4 6 8 10 12 14 0 1144 2288 Offset from track boundaries (sectors) Elapsed time (msec) Figure 2.1.3: Elapsed time from different offsets Figure 2.1.3 shows elapsed time to access random 64 sectors, starting from different offsets from SCSI-command-extracted track boundaries on a Maxtor drive. The track size is 14 1,144 sectors. Figure 2.1.3 confirms our extracted track boundaries. Each data point represents the time to access a 64-sector request starting from a randomly chosen sector offset from a track boundary. The 6-msec range of timing variation reflects the rotation- delay variations for a 10,000 RPM drive. The average elapsed time of accessing 64 sectors across a track boundary is 7.3 msec, compared to 5.7 msec for not crossing the track boundaries. Interestingly, the difference of 1.6 msec is much higher than the track switching time of 0.3 to 0.5 msec [Maxtor 2004]. We also verified this extraction method with other vendor drives. The findings were largely consistent. Zero-latency feature verification: Since the range of performance gain by track- aligned access depends on whether a disk can access the information within a track out-of- order, we performed the tests suggested in [Worthington et al. 1995]. Basically, we randomly picked two consecutive sectors, read those sectors in the reverse LBA order, and observed the timing characteristics. This test was performed with various caching options on. 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Access time (msec) Percentage of accesses Fujitsu 1st access Fujitsu 2st access Maxtor 1st access Maxtor 2nd access Seagate 1st access Seagate 2nd access Figure 2.1.4: CDF of disk access times As shown in Figure 2.1.4, with a Maxtor drive for accessing random sets of two consecutive LBAs in the reverse order, 50% of the time the second request is served from the cache, indicating the zero-latency capability. (We did not observe correlations between 17 There are two places in the kernel making use of the search function. First, pre- allocation looks for the first block of a track (the block right after a track boundary) and then allocates this track to a requesting file. The end of a track (the next boundary) can be identified by a used block marked by mke2fs so that the pre-allocation ends properly. One implication is that individual file systems need to be modified to benefit from track alignments. Second, when a readahead starts a new prefetch window, it drops all prefetching requests that exceed the track boundary. 2.3 Verification of the Performance Benefits Bonnie: We chose a widely used benchmark called Bonnie [Bray 1996], which is unaware of the underlying track-alignment mechanisms. Bonnie consists of many phases, stressing the performance of character and block IOs amidst sequential and random access patterns. The two phases of our interests are the sequential write and read. The sequential write phase creates a 1-GB file, which exceeds our 128-MB memory limit, and reads it sequentially. We enabled SCSI cache, disk caching, and prefetch to better reflect normal usage. Each experiment was repeated 10 times, analyzed at a 90% confidence interval. 0 10 20 30 40 50 60 70 80 90 Write Read Bandwidth (MB / sec) Track-aligned Normal Figure 2.3.1: Bonnie single disk bandwidth comparisons 18 Figure 2.3.1 is bandwidth comparisons between conventional and track-aligned accesses to a single disk, when running the Bonnie benchmark. It shows the expected 3% slowdown for a single stream of sequential disk accesses, where skipped blocks that cross track boundaries can no longer contribute to the bandwidth. 0 5 10 15 20 25 30 35 40 45 Track-aligned Track-aligned, no on-disk prefetch Normal Normal, no on- disk prefetch Time (sec) Figure 2.3.2: Diff single disk speed comparisons We also ran a diff program (from GNU diffutils 2.8.1) to compare two 512-MB large files via interleaved reads between two files, with the –speed-large-files option. Without this option, diff will try to read one entire file into the memory and then the other file and compare them if memory permits, which nullifies our intent of testing interleaved reads. We have two settings: the normal and the track-aligned case. Figure 2.3.2 is the speed comparison between conventional and track-aligned accesses to a single disk, diffing two 512MB files with 128MB of RAM. It shows that track-aligned accesses are almost twice as fast as the normal case. In addition, we observed that disk firmware prefetch can violate the track-aligned prefetch issued from the file system readahead, as disk firmware prefetch has no regard for track boundaries. Disabling on-disk prefetch further speeds up track- aligned access by another 8%. Therefore, for subsequent experiments, we disabled disk firmware prefetch for track-aligned accesses. 19 Since track-aligned extents excel in handling concurrent accesses, we conducted an experiment that involves concurrent processes issuing multimedia-like traffic streams at around 500KB/sec. We used 2GB for our memory size. We wrote a script that increases the number of multimedia streams by one after each second, and the script records the startup latency of each new stream. Each emulated multimedia streaming process first randomly selects a disk position and sequentially accesses the subsequent blocks at the specified streaming rate. We assumed that the acceptable startup latency is around 3 seconds, and the program terminates once the latency reaches 3 seconds. 0 0.5 1 1.5 2 2.5 3 10 100 Number of request streams Startup latency (sec) Track-aligned Large-readahead Normal Figure 2.3.3: Startup latency comparisons on single disk Figure 2.3.3 is startup latency comparisons of conventional I/O requests, requests with a one-track prefetch window, and track-aligned requests on a single disk, with a varying number of multimedia-like request streams. It shows that the original disk can support up to 130 streams with the startup latency within 3 seconds. A track-size readahead window can reduce the latency at 130 streams by 30%, while the track-aligned access can reduce the latency by 55%. 22 similar range due to buffered writes. However, for read bandwidth, the track-aligned RAID- 5 outperforms the conventional one by 57%. The diff experiment compared between two 512-MB files with 128MB of RAM. Figure 3.2 shows that the track-aligned RAID-5 can achieve a 3x factor speedup compared to the original RAID-5. For the multimedia-like workload with 2GB of RAM, the track-aligned RAID-5 demonstrates a 3.3x better scaling in concurrency than the conventional RAID-5, where a RAID-5 with a readahead window comparable to the track-aligned RAID-5 contributes only less than half of the scaling improvement. The latency improvement of track-aligned RAID- 5 is particularly impressive considering that the original RAID-5 was expected to degrade in the latency characteristics when compared to the single-disk case, due to the widening timing variance of disks and the need to wait for the slowest disk for striped requests. Track-aligned accesses reduce the worst-case rotational timing variance and can realize more benefits of parallelism. Figure 3.1 shows bandwidth comparisons between the track- aligned RAID-5, a RAID-5 with a prefetch window of 4 tracks, and the original RAID-5, running Bonnie with 1GB working set and 128MB of RAM. Figure 3.2 shows the elapsed time comparisons of the track-aligned RAID-5, a RAID-5 with a prefetch window of 4 tracks, and the original RAID-5, when running diff comparing two 512MB files. Figure 3.3 shows startup latency comparisons of the track-aligned RAID-5, a RAID-5 with a prefetch window of 4 tracks, and the original RAID-5, with a varying number of multimedia-like request streams. 23 0 20 40 60 80 100 120 140 160 Write Read Bandwidth (MB / sec) Track-aligned RAID-5 RAID-5 with a large readahead RAID-5 Figure 3.1: Bonnie bandwidth comparisons of RAIDs 0 5 10 15 20 25 30 35 40 45 Track-aligned RAID-5 RAID-5 with a large readahead RAID-5 Elapsed time (secs) Figure 3.2: Diff elapsed time comparisons of RAIDs 24 0 0.5 1 1.5 2 2.5 3 0 50 100 150 200 Number of request streams Startup latency (sec) Track-aligned RAID-5 Large readahead RAID-5 Figure 3.3: Startup latency comparisons of RAIDs 27 based on the number of pending requests in per-disk queues. Then, within the per-disk CFQ scheduler, the relative block distance between the current and the previous request (computed in the cfq_choose_req function) is adjusted by the maximum queue length left shifted by a weight factor. The maximum relative block distance is determined by the size of the hard drive. In our case, about 19 million blocks for the 73-GB Maxtor drive. The maximum possible queue length is defined by BLKDEV_MAX_RQ in Linux, which is 128 by default. Therefore, for the queue length to have an influence equal to that of the block distance, we need the weight factor to be set to around 16 when the system is under a heavy concurrent load. 4.3 Performance Evaluation Figure 4.3.1 shows startup latency comparisons of the track-aligned RAID-5, the track-aligned RAID-5 with coordinated queues, the original RAID-5 with coordinated queues, and the original RAID-5, with a varying number of multimedia-like request streams. It summarizes the performance of coordinated queues in relation to RAID-5 and track-aligned accesses. Intriguingly, coordinated queues improved the concurrency scaling by a factor of only 1.2x, while the track-aligned RAID improved scaling by a factor of 3.3x. Also, when the track-aligned RAID-5 was combined with coordinated queues, no significant performance differences were observed. 28 0 0.5 1 1.5 2 2.5 3 1 31 61 91 121 151 181 Number of request streams Startup latency (sec) Track-aligned RAID-5 Track-aligned RAID-5 with coordinated queues RAID-5 with coordinated queues RAID-5 Figure 4.3.1: Startup latency comparisons with coordinated queues Since the chosen weight for the coordinated queues can affect the performance, we conducted a sensitivity analysis by varying the weight from 10 to 30 for the same scaling experiment. Figure 4.3.2 is startup latency CDF comparisons of different weights used for coordinated queues, with a varying number of multimedia-like request streams. It shows that the startup latency CDF variation is generally within 10%. Puzzled by the limited combined benefits between the track-aligned RAID-5 and the coordinated queue, we plotted the moving average of the disk head distance among five disks, for various schemes. Figure 4.3.3 is disk head location deviation (10 seconds moving average) comparisons of the track-aligned RAID-5, the track-aligned RAID-5 with coordinated queues, the original RAID-5 with coordinated queues, and the original RAID-5, with a varying number of multimedia-like request streams. Intriguingly, it shows that the track-aligned RAID-5 actually synchronizes disk heads better than the coordinated queues due to two possible reasons. First, the plugging and unplugging mechanisms used to honor track boundaries actually interact with the scheduling of striped requests. Second, our implementation of the track-aligned RAID-5 also requests the parity information on reads, reducing the chances of divergence for disk head locations. 29 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 1 2 3 4 5 6 7 8 9 10 Startup latency (sec) Percentage of accesses 10 12 14 16 18 20 22 24 26 28 30 Figure 4.3.2: Startup latency CDF comparisons of different weights 0 0.5 1 1.5 2 2.5 3 3.5 4 0 50 100 150 200 Time (sec) Head distance (million sectors) Track-aligned RAID-5 Track-aligned RAID-5 + coordinated queues Coordinated queues RAID-5 Figure 4.3.3: Disk head location deviation comparisons 32 CHAPTER 6 CONCLUSION Through the clean-room duplication of the track-aligned access and its incorporation in RAIDs, and through exploring a proposed method to coordinate per-disk queues in RAIDs, we have validated the performance benefits achievable by applying the cross-layer coordination technique to storage. However, we also need to overcome the diversity of disk behaviors and the size of design space to take advantage of hardware details. Therefore, for the cross-layer approach to become broadly applicable at these levels, we need to overcome rapid hardware evolution by inventing ways to obtain hardware characteristics efficiently and exploit them automatically. On the other hand, since we observe the recurrent theme of applying similar optimization approaches with only different enforcement policies (e.g. caching, prefetching, remapping), we may be able to derive a converging and evolving software standard for disks. This evolution is analogous to the interaction between the continuously evolving graphic chips and graphic standards such as DirectX and OpenGL. The cross-layer coordination approach also prompts us to either develop a better understanding of the legacy storage data path or simplify the data path enough to make it understandable; otherwise, the benefits of the end-point optimization can be potentially reduced due to unforeseen interactions, or diffused due to the need to explore a vast configuration space. 33 REFERENCES [Anderson 2003] Anderson D. You Don’t Know Jack about Disks. Storage. 1(4), 2003. [Binny and Dharmendra 2005] Binny S. Gill and Dharmendra S. Modha. WOW:Wise Ordering for Writes—Combining Spatial and Temporal Locality in Non-Volatile Caches. Proceedings of 4th USENIX Conference on File and Storage Technologies, 2005 [Bray 1996] Bray T. Bonnie benchmark. http://www.textuality.com/bonnie/download.html, 1996 [Card et al. 1999] Card R, Ts’o T, Tweedie S. Design and Implementation of the Second Extended Filesystem. The HyperNews Linux KHG Discussion. http://www.linuxdoc.org (search for ext2 Card Tweedie design), 1999. [Carnes et al. 2000] Carns PH, Ligon WB III, Ross RB, Thakur R. PVFS: A Parallel File System For Linux Clusters. Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000. [Fujitsu 2007] MAP3147NC/NP MAP3735NC/NP MAP3367NC/NP Disk Drives Product/Maintenance Manual. http://www.fujitsu.com/downloads/COMP/fcpa/hdd/discontinued/map-10k-rpm_prod-manual.pdf, 2007. [Iyer and Druschel 2001] Iyer S, Druschel P. Anticipatory Scheduling: A Disk Scheduling Framework to Overcome Deceptive Idleness in Synchronous I/O. Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001. [Lo et al. 2005] Lo SW, Kuo TW, Lam KY. Multi-disk Scheduling for Time-Constrained Requests in RAID-0 Devices. Journal of Systems and Software, 76(3), pp. 237-250, 2005. [Lumb et al. 2000] Christopher R. Lumb, Jiri Schindler, Gregory R. Ganger, David F. Nagle, and Erik Riedel. Towards Higher Disk Head Utilization: Extracting Free Bandwidth from Busy Disk Drives. Proceedings of the 2000 Symposium on Operating Systems Design and Implementation. USENIX Association, 2000. [Matthews et al. 1997] Matthews JN, Roselli D, Costello AM, Wang RY, Anderson TE. Improving the Performance of Log-Structured File Systems with Adaptive Methods. Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 238-251, October, 1997. [Matthew et al. 2007] Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, and Gregory R. Ganger, Argon: Performance Insulation for Shared Storage Servers. Proceedings of 5th USENIX Conference on File and Storage Technologies, 2007 34 [Maxtor 2004] Atlas 10K V Ultra320 SCSI Hard Drive. http://www.darklab.rutgers.edu/MERCURY/t15/disk.pdf, 2004. [McKusick et al. 1984] McKusick MK, Joy WN, Leffler SJ, Fabry RS. A Fast File System for UNIX, Computer Systems, 2(3), pp. 181-197, 1984. [Nugent et al. 2003] Nugent J, Arpaci-Dusseau AC, Arpaci-Dusseau RH. Controlling Your PLACE in the File System with Gray-box Techniques. Proceedings of the USENIX Annual Technical Conference, 2003. [Patterson et al. 1988] Patterson DA, Gibson G, Katz RH, A Case for Redundant Arrays of Inexpensive Disks (RAID). ACM SIGMOD International Conference on Management of Data, 1(3), pp.109-116, 1988. [Saltzer et al. 1981] Saltzer JH, Reed DP, Clark DD. End-to-End Arguments in System Design. Proceedings of the 2nd International Conference on Distributed Systems, 1981. [Schindler and Ganger 1999] Schindler J, Ganger GR. Automated Disk Drive Characterization. CMU SCS Technical Report CMU-CS-99-176, December 1999. [Schindler et al. 2002] Schindler J, Griffin JL, Lumb CR, Ganger GR. Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics. Proceedings of the 1st USENIX Conference on File and Storage Technologies, 2002. [Schindler et al. 2004] Schindler J, Schlosser SW, Shao M, Ailamaki A, Ganger GR. Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks. Proceedings of the 3rd USENIX Conference on File and Storage Technologies, 2004. [Schlosser et al. 2005] Schlosser SW, Schindler J, Papadomanolakis S, Shao M, Ailamaki A, Faloutsos C, Ganger GR. On Multidimensional Data and Modern Disks. Proceedings of the 4th USENIX Conference on File and Storage Technology, 2005. [Schmuck and Haskin 2002] Schmuck F, Haskin R. GPFS: A Shared-Disk File System for Large Computing Clusters. Proceedings of the 1st Conference on File and Storage Technologies, 2002. [Seagate 2007] Product Manual: CheetahR 15K.4 SCSI. http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.4/SCSI/100220 456d.pdf, 2007. [Sivathanu et al. 2003] Muthian Sivathanu, Vijayan Prabhakaran, Florentina I. Popovici, Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Semantically-Smart Disk Systems. Proceedings of the 2nd USENIX Conference on File and Storage Technologies, 2003. [Sivathanu et al. 2005] Sivathanu M, Bairavasundaram LN, Arpaci-Dusseu AC, Arpaci-Dusseau RH. Database-Aware Semantically-Smart Storage. Proceedings of the 4th USENIX Conference on File and Storage Technologies, 2005.

Documents

questions

Investigating Performance Benefits in Disk Storage: Coordination of Disks and RAIDs, Papers of Theatre

Related documents

Partial preview of the text