Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Exploring Virtual Memory: Mapping & Managing Main Memory as a Cache, Study notes of Computer Science

This document delves into the concept of virtual memory, where memory blocks (pages) are mapped from virtual addresses to physical addresses. The objective is to use main memory as a cache for secondary storage, allowing efficient and safe sharing of memory among programs and providing the illusion of unbounded memory. Topics covered include virtual memory, mapping virtual addresses to physical addresses, hardware support for address translation, page faults, and implementing protection with virtual memory.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-oye-1
koofers-user-oye-1 🇺🇸

10 documents

1 / 42

Toggle sidebar

Related documents


Partial preview of the text

Download Exploring Virtual Memory: Mapping & Managing Main Memory as a Cache and more Study notes Computer Science in PDF only on Docsity! Lec 5 Systems Architecture II 1 Systems Architecture II Topics Exploiting Memory Hierarchy: Virtual Memory* I/O Devices and Communication Buses ** *This lecture was derived from material in the text (Chapter 7). **This lecture was derived from material in the text (Chapter 8). All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED). Notes Courtesy of Jeremy R. Johnson Lec 5 Systems Architecture II 2 Systems Architecture II Topic 1: Exploiting Memory Hierarchy: Virtual Memory Lec 5 Systems Architecture II 5 Mapping from a Virtual to Physical Address • In virtual memory, an address is broken into a virtual page number and a page offset • The virtual page number is mapped to a physical page number • The number of addressable virtual pages may not be the same as the number of physical pages Virtual address 3 2 1 011 10 9 815 14 13 1231 30 29 28 27 Page offsetVirtual page number 3 2 1 011 10 9 815 14 13 1229 28 27 Page offsetPhysical page number Physical address Translation Lec 5 Systems Architecture II 6 Design Decisions • Many design decisions for virtual memory are motivated by the high cost of a miss (called a page fault) – The time to process a page fault will take millions of cycles! 1 Pages should be large enough to amortize large access time (utilize principle of locality) – 4 KB - 64 KB page size 2 Reducing the rate of page faults is crucial – Use a fully associative scheme 3 Page faults can be handled in software and hence clever algorithms can be used to choose replacement pages thus reducing the number of misses 4 Using write-through to manage writes in virtual memory will not work (too costly). Write-back strategy is used instead Lec 5 Systems Architecture II 7 Locating a Physical Page • Use a fully associative scheme to reduce page faults and to allow more flexible replacement policies • Since search is to costly, table lookup is used to find the physical location of a page. • Index table is called a page table • The page table is indexed by the virtual page number • Each program has its own page table. • The size of the page table is determined by the number of bits in the virtual address • Since page tables are large and there can be many of them, the entire table is not kept in memory (use dynamic tables, hash functions, multiple levels, and virtual memory itself) Lec 5 Systems Architecture II 10 Write Strategy and Page Replacement • The difference in access times to cache and main memory is tens of cycles and a write-through strategy can be used (with the aid of a write buffer to hide latency of the write) • Since writes to disk take millions of cycles this approach is impractical for virtual memory • Instead, use a write-back policy – perform individual writes to the page in memory and copy them back to disk when the page is replaced – Copying an entire page is more efficient than the sum of individual writes • A write-back operation though more efficient is still costly – only write back when the contents of the page have changed – determined by the setting of the dirty bit Lec 5 Systems Architecture II 11 Making Address Translation Fast • With virtual memory, accessing memory requires two memory references (one to determine physical address and one to access the contents of the desired location) • The key to improving access performance relies on the principle of locality – it is likely that once a page is accessed it will be accessed again in the near future – only need to perform address translation once, saving the result in a buffer (a special cache) called the translation-lookaside buffer (TLB) • Since the TLB may replace access to the page table, reference and dirty bit may be required. Write-back is used since miss rate should be very small • A TLB miss may not imply a page fault Lec 5 Systems Architecture II 12 Translation-Lookaside Buffer – TLB size 32-4096 entries – Block size 1 - 2 page table entries – Hit time: 0.5 - 1 cycles – Miss penalty: 10-30 cycles – Miss rate: 0.01% - 1% Valid 1 1 1 1 0 1 1 0 1 1 0 1 Page table Physical page addressValid 1 1 1 1 0 1 Tag Virtual page number Physical page or disk address Physical memory Disk storage Lec 5 Systems Architecture II 15 Processing a Read/Write in the DECStation 3100 TLB and Cache Yes Deliver data to the CPU Write? Try to read data from cache Write data into cache, update the tag, and put the data and the address into the write buffer Cache hit?Cache miss stall TLB hit? TLB access Virtual address TLB miss exception No YesNo YesNo Write access bit on?  YesNo Write protection exception Physical address Lec 5 Systems Architecture II 16 Interaction with OS • Maintain memory hierarchy (data can not be in cache unless in memory) – flush cache of entries from page replaced to disk – update page tables and TLB so that attempt to access data on the replaced page will generate a page fault • Protection (prevent one programs from writing to another programs portion of memory) – Each program has its own virtual address space - organize page tables to map this to distinct sets of physical pages – Must make sure only the OS can modify page tables (done by putting page tables in address space of OS) – Two modes of execution (user and supervisor). Special instructions to modify TLB, page table register, user/supervisor mode bit – on context switch need to flush TLB entries (or use PID to distinguish) • Sharing (allow programs to share memory (e.g. editor code) – Have OS point virtual page to shared physical page – Use write access bit to restrict sharing to read Lec 5 Systems Architecture II 17 Handling Page Faults • Handling page faults requires using the exception mechanism to interrupt the active process, transferring control to the OS, and later resuming execution of the interrupted process – Exception must be asserted before instruction completes (so that the state remains as it was before the instruction), otherwise can not properly restart instruction (making instructions restartable is difficult) – Instruction address placed in EPC and cause of exception in cause register – Virtual address determined from EPC or instruction in EPC (depending on whether it was an instruction or data access that caused the fault) – Steps: • Save entire state of process (includes all registers) • Look up page table entry and find location of referenced page on disk • Choose replacement page (if dirty need to write back) • Start a read to bring referenced page from disk to memory – Since the last step takes millions of cycles usually start another process while waiting for read to complete Lec 5 Systems Architecture II 20 Model for Cache Misses • Compulsory misses (cold-start) – These are cache misses caused by the first access to a block that has never been in the cache • Capacity misses – These are cache misses caused when the cache cannot contain all the blocks needed during execution of a program. • Conflict misses (collision) – These are cache misses that occur in a set associative or direct mapped cache when multiple blocks compete for the same set. These misses are eliminated with a fully associative cache. Lec 5 Systems Architecture II 21 Systems Architecture II Topic 2: I/O Devices and Communication Buses Lec 5 Systems Architecture II 22 Introduction • Objective: To understand the basic principles of different I/O devices and to develop protocols for connecting I/O devices to processors and memory. To analyze and compare performance of I/O devices and communication protocols. • Topics – Design issues and importance of I/O – I/O devices • keyboard and monitor • mouse • magnetic disk • network – Buses • synchronous vs. asynchronous • handshaking protocol • bus arbitration Lec 5 Systems Architecture II 25 Magnetic Disk • Rotating disk with magnetic surface – 3600 to 10,000 RPM – $0.10 per MB • hard disk organized into platters • each surface made up of tracks – 1000-5000 • tracks divided into sectors – 64-200 – 512 bytes per sector Platter Track Platters Sectors Tracks Lec 5 Systems Architecture II 26 Disk Performance • Average disk access time = – avg. seek time + avg. rotational delay + transfer time + controller overhead • Avg. seek time (time to move head to track) – may be 25% of manufacturer reported time due to locality • Avg. rotational delay – 0.5 rotation/RPM • Transfer time – depends on rotation speed, sector size, track density – caching used to improve transfer rate • What is the average time to read a 512-byte sector from a typical disk rotating at 5400 RPM? – Average seek time = 12ms – Transfer rate = 5 MB/sec – Controller overhead = 2ms Lec 5 Systems Architecture II 27 Disk Performance • Average seek time – 12 ms (advertised - averaged over all possible seeks) – 3 ms (measured typically 25% of advertised) • Average rotational delay – .5 rotation/5400 RPM = .5 rotation/(5400 RPM/60 sec/min) = 0.0056 sec = 5.6 ms • Transfer time – .5KB/(5MB/sec) = 0.0001 sec = .1 ms • Controller time – 2ms • Average disk access time – 12 + 5.6 + 0.1 + 2 ms = 19.7 ms – 3 + 5.6 + 0.1 + 2 ms = 10.7 ms Lec 5 Systems Architecture II 30 Bus Input and Output Memory Processor Control lines Data lines Disks Memory Processor Control lines Data lines Disks Processor Control lines Data lines Disks a. b. c. Memory Memory Processor Control lines Data lines Disks Processor Control lines Data lines Disks a. b. Memory Input Operation a) Write request b) memory transfer Output Operation a) Read request b) memory access c) memory transfer Lec 5 Systems Architecture II 31 Synchronous vs. Asynchronous • Synchronous – use a clock and a synchronous protocol – fast and small – but every device must operate at same rate and – clock skew requires the bus to be short • Asynchronous – don’t use a clock and instead use handshaking – can accommodate a wide variety of devices – can be lengthened Lec 5 Systems Architecture II 32 Handshaking Protocol • ReadReq – Used to indicate a read request for memory. Address put on the data lines at the same time • DataRdy – Used to indicate that data is now ready on the data lines. Data is placed on data lines at the same time (set by either memory or device depending on whether it is an output or input operation) • Ack – Used to acknowledge the ReadReq of DataRdy signals • ReadReq and DataRdy asserted until the other party has seen the control lines and the data lines have been read. This indication is made by asserting the Ack signal. Lec 5 Systems Architecture II 35 Performance Comparison • Synchronous bus – 50 ns clock – 32 data bits – 200 ns memory access • Time – send address: 50 ns – Read memory: 200 ns – send data: 50 ns – total time = 300 ns • Bandwidth – 4 bytes/300 ns = 4 MB/0.3 sec = 13.3 MB/sec • Asynchronous bus – 40 ns per handshake – 32 data bits – 200 ns memory access • Time – step 1: 40 ns – Max(steps 2,3,4,Read): 200 ns – Steps 5,6,7: 120 ns – total time = 360 ns • Bandwidth – 4 bytes/360 ns = 4 MB/0.36 sec = 11.1 MB/sec Lec 5 Systems Architecture II 36 Improving Bus Performance • Data bus width: By increasing the width of the data bus, transfers of multiple words take fewer bus cycles • Separate vs. multiplexed address and data lines: Separate data and address lines will improve performance of writes since the address and data can be sent at the same time. • Block transfers: Allowing the bus to transfer multiple words in back to back bus cycles without sending an address or releasing the bus reduces the time to transfer a large block. Lec 5 Systems Architecture II 37 Performance Example • Memory supports block access of 4 - 16 32-bit words • 64-bit synchronous bus clocked at 200 MHz (5 ns clock) with each 64-bit transfer taking 1 cycle and 1 cycle to send an address • Two cycles between each bus operation • 200 ns memory access time for 1st 4 words and 20 ns for each additional set of 4 words. • Find the sustained bandwidth and latency to read 256 words for transfers that use 4-word blocks and 16-word blocks. Lec 5 Systems Architecture II 40 Bus Arbitration (detail) • Daisy chain arbitration (e.g. VME) – Grant lines run from highest priority to lowest – High-priority device that wants access intercepts bus grant signal – Simple but cannot assure fairness and may limit speed • Centralized, parallel arbitration (e.g. PCI) – Devices independently request the bus through multiple request lines – Centralized arbiter chooses which device will act as a master – The central arbiter is required and may become a bottleneck • Distributed arbitration by self-selection (e.g. NuBus in Mac II) – Devices independently request the bus through multiple request lines – Devices identify themselves to the bus and broadcast their priority – Each device determines independently if it is the high-priority requestor – Drawback: requires more lines for request signals • Distributed arbitration by collision (e.g. Ethernet) – Devices independently request the bus, which results in a collision. – A scheme is used for selecting among colliding parties to be a master. Lec 5 Systems Architecture II 41 Single Bus Master (Processor) Memory Processor Bus request lines Bus Disks Bus request lines Bus Disks Processor Bus request lines Bus Disks a. b. c. ProcessorMemory Memory Lec 5 Systems Architecture II 42 Daisy Chain Arbitration Device n Lowest priority Device 2Device 1 Highest priority Bus arbiter Grant Grant Grant Release Request
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved