Download Memory Hierarchy and Cache Systems: Principles and Design - Prof. Pascal Hentenryck and more Study notes Computer Science in PDF only on Docsity!
Memory Hierarchy
CSCI-0310
Pascal Van Hentenryck,
==
BROWN
Q|
CSCI-0310 Lecture 26
Memory Hierarchy
¢ Memory hierarchy
* Caches
¢ Buses
CSCI-0310 Lecture 26
Lecture 26 5CSCI-0310 Memory Hierarchy We’ll use a hierarchy of memory modules to maintain the illusion CPU Expensive Memory Memory Cheap Memory (Disk) Fastest Slowest Smallest Biggest Lecture 26 6CSCI-0310 Memory Hierarchy Today’s lecture • Two levels: the RAM + the cache Basic Principles • Store all the data in the lower level (RAM) • Store some of the “good stuff” in the higher level (Cache) • The hit rate is the proportion of memory references that are in the higher level • The miss penalty is the time to replace a block in the upper level with the corresponding block from the lower level • If we can make the hit rate high, most memory accesses will be fast Lecture 26 7CSCI-0310 Cache A small, fast memory where we keep relevant data is called a cache • Simplest kind is a direct-mapped cache • Each memory location is mapped to a cache location according to its low order bits 000 001 010 011 100 101 110 111 Cache 000001 main memory ...... 001001 010001 011001 Lecture 26 10CSCI-0310 Step by Step Given an address to read from: • use the low-order bits of the address to index into the cache • if that cache line is invalid, we miss • if the high-order bits of the address match the tag in the selected cache line, we hit • if they don’t match, we miss If we hit (fast!) • return the data from the selected cache line If we miss (slow...) • set the valid bit in this cache line • set the tag bits to be the high-order bits of the address • copy the data at this address from main memory into this cache line • return the data Lecture 26 11CSCI-0310 Cache Tag DataVIndex 000 001 010 011 100 101 110 111 000001 001010 Tag DataVIndex 000 001 010 011 100 101 110 111 0001 1 000 0011 Lecture 26 12CSCI-0310 Cache Tag DataVIndex 000 001 010 011 100 101 110 111 000001 001001 Tag DataVIndex 000 001 010 011 100 101 110 111 0001 1 001 0011 0011 MISS! Lecture 26 15CSCI-0310 Spatial Locality In the previous example, each cache line held a word of data. What if we store more data per cache line? • we exploit spatial locality for reads • we have to do write misses differently: when we write one word of block x where block y used to be, we have to read in the rest of block x How big should blocks be? Lecture 26 16CSCI-0310 More Caches Direct-Mapped: every low-level block can be put in exactly one high-level location Fully Associative: any low-level block can be put in any high-level location Set Associative: every low-level block can be put in one of a small number of high-level locations Most caches are • direct-mapped or set-associative Lecture 26 17CSCI-0310 Set-Associative Caches 4-way set associative cache Every low-level location is mapped directly to a set of high-level locations; within the high-level set, the mapping is associative. Set 1 Set 2 ... main memory Lecture 26 20CSCI-0310 Bus Design Structure • 50 to 100 lines Data lines • Data to be read or to be written Address lines • Where to write or read the data Control lines • What to do with the lines • MemoryRead, MemoryWrite, IOreadBus • Bus request, bus granted Dedicated versus Multiplexed lines • Same lines used for data and addresses • Dedicated lines Lecture 26 21CSCI-0310 Bus Design Bus Arbitration • Different devices may want to use the bus at the same time • I/O may write some memory while CPU may want to read/write some memory as well • Need a way to coordinate them CPU Mem I/O Lecture 26 22CSCI-0310 Central Arbitration Main Problem of Buses • Contention • Everybody wants to use the bus Arbiter CPU I/O Lecture 26 25CSCI-0310 Data Transfer Address Data read Data write Address data data data Read-Modify-Write Block write Lecture 26 26CSCI-0310 PCI Bus Structure (Intel, patents are public domain) • 32- or 64 bit bus • multiplexed data and address lines • centralized arbitration • interface control lines • ... Command (by the master at address time) • I/O read • I/O write • Ram read / Ram multiple read • Ram write / Ram multiple write • .... Lecture 26 27CSCI-0310 FutureBus+ Structure (IEEE Standard) • address and data lines (many) Arbitration • decentralized or centralized Data transfer • many ... Lecture 26 30CSCI-0310 Competition Numbers Structure • Most significant: 8 bit of priorities • A round robin bit • Least significant: geographic location Fairness • the RR bit is set when tenure is given to a module at the same priority level but with a higher geographical address • the RR bit is reset when tenure is given to a module at the same priority level with a lower geographical address • This ensures round robin under heavy load Lecture 26 31CSCI-0310 Hierarchical Organization CPU CACHE Mem Bus Exp. Int Screen Netw. Disk