Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Memory Hierarchy Design in Computer Architecture - Prof. Edgar Gabriel, Study notes of Computer Architecture and Organization

An overview of the memory hierarchy design in computer architecture. It covers the concept of the memory hierarchy, the principle of locality, and the terminology used in this context. The document also discusses cache measures, simplest cache designs, and the disadvantages of set associative caches. It concludes with four essential questions about memory hierarchy.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-02i
koofers-user-02i 🇺🇸

10 documents

1 / 29

Toggle sidebar

Related documents


Partial preview of the text

Download Memory Hierarchy Design in Computer Architecture - Prof. Edgar Gabriel and more Study notes Computer Architecture and Organization in PDF only on Docsity! COSC 6385 – Computer Architecture Edgar Gabriel COSC 6385 Computer Architecture - Memory Hierarchy Design (I) Edgar Gabriel Fall 2006 Slides are based on a lecture by David Culler, University of California, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05 COSC 6385 – Computer Architecture Edgar Gabriel Recap: Who Cares About the Memory Hierarchy? µProc 60%/yr. (2X/1.5yr ) DRAM 9%/yr. (2X/10 yrs) 1 10 100 1000 19 80 19 81 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 DRAM CPU 19 82 Processor-Memory Performance Gap: (grows 50% / year) Pe rf or m an ce Time “Moore’s Law” Processor-DRAM Memory Gap (latency) COSC 6385 – Computer Architecture Edgar Gabriel Memory Hierarchy: Terminology • Hit: data appears in some block in the upper level (example: Block X) – Hit Rate: the fraction of memory access found in the upper level – Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y) – Miss Rate = 1 - (Hit Rate) – Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor • Hit Time << Miss Penalty (500 instructions on 21264!) Lower Level MemoryUpper Level Memory To Processor From Processor Blk X Blk Y COSC 6385 – Computer Architecture Edgar Gabriel Cache Measures • Hit rate: fraction found in that level – So high that usually talk about Miss rate – Miss rate fallacy: as MIPS to CPU performance, miss rate to average memory access time in memory • Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) • Miss penalty: time to replace a block from lower level, including time to replace in CPU – access time: time to lower level = f(latency to lower level) – transfer time: time to transfer block =f(BW between upper & lower levels) COSC 6385 – Computer Architecture Edgar Gabriel Simplest Cache: Direct Mapped Memory 4 Byte Direct Mapped Cache Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F Cache Index 0 1 2 3 • Location 0 can be occupied by data from: – Memory location 0, 4, 8, ... etc. – In general: any memory location whose 2 LSBs of the address are 0s – Address<1:0> => cache index • Which one should we place in the cache? • How can we tell which one is in the cache? COSC 6385 – Computer Architecture Edgar Gabriel Disadvantage of Set Associative Cache • N-way Set Associative Cache v. Direct Mapped Cache: – N comparators vs. 1 – Extra MUX delay for the data – Data comes AFTER Hit/Miss • In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: – Possible to assume a hit and continue. Recover later if miss. Cache Data Cache Block 0 Cache Tag Valid : :: Cache Data Cache Block 0 Cache TagValid :: : Cache Index Mux 01Sel1 Sel0 Cache Block Compare Adr Tag Compare OR Hit COSC 6385 – Computer Architecture Edgar Gabriel 4 Questions for Memory Hierarchy • Q1: Where can a block be placed in the upper level? (Block placement) • Q2: How is a block found if it is in the upper level? (Block identification) • Q3: Which block should be replaced on a miss? (Block replacement) • Q4: What happens on a write? (Write strategy) COSC 6385 – Computer Architecture Edgar Gabriel Q1: Where can a block be placed in the upper level? • Block 12 placed in 8 block cache: – Fully associative, direct mapped, 2-way set associative – S.A. Mapping = Block Number Modulo Number Sets Cache 01234567 0123456701234567 Memory 1111111111222222222233 01234567890123456789012345678901 Full Mapped Direct Mapped(12 mod 8) = 4 2-Way Assoc (12 mod 4) = 0 COSC 6385 – Computer Architecture Edgar Gabriel Q4: What happens on a write? • Write through—The information is written to both the block in the cache and to the block in the lower-level memory. • Write back—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. – is block clean or dirty? • Pros and Cons of each? – WT: read misses cannot result in writes – WB: no repeated writes to same location • WT always combined with write buffers so that don’t wait for lower level memory COSC 6385 – Computer Architecture Edgar Gabriel Write Buffer for Write Through • A Write Buffer is needed between the Cache and Memory – Processor: writes data into the cache and the write buffer – Memory controller: write contents of the buffer to memory • Write buffer is just a FIFO: – Typical number of entries: 4 – Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle • Memory system designer’s nightmare: – Store frequency (w.r.t. time) -> 1 / DRAM write cycle – Write buffer saturation Processor Cache Write Buffer DRAM COSC 6385 – Computer Architecture Edgar Gabriel Impact of Memory Hierarchy on Algorithms • Today CPU time is a function of (ops, cache misses) vs. just f(ops): What does this mean to Compilers, Data structures, Algorithms? • “The Influence of Caches on the Performance of Sorting” by A. LaMarca and R.E. Ladner. Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January, 1997, 370-379. • Quicksort: fastest comparison based sorting algorithm when all keys fit in memory • Radix sort: also called “linear time” sort because for keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys • For Alphastation 250, 32 byte blocks, direct mapped L2 2MB cache, 8 byte keys, from 4000 to 4000000 COSC 6385 – Computer Architecture Edgar Gabriel Quicksort vs. Radix as vary number keys: Cache misses 0 1 2 3 4 5 1000 10000 100000 1000000 10000000 Quick(miss/key) Radix(miss/key) Cache misses Set size in keys Radix sort Quick sort What is proper approach to fast algorithms? COSC 6385 – Computer Architecture Edgar Gabriel A Modern Memory Hierarchy • By taking advantage of the principle of locality: – Present the user with as much memory as is available in the cheapest technology. – Provide access at the speed offered by the fastest technology. Control Datapath Secondary Storage (Disk) Processor R egisters Main Memory (DRAM) Second Level Cache (SRAM) O n-C hip C ache 1s 10,000,000s (10s ms) Speed (ns): 10s 100s 100s Gs Size (bytes): Ks Ms Tertiary Storage (Disk/Tape) 10,000,000,000s (10s sec) Ts COSC 6385 – Computer Architecture Edgar Gabriel • Virtual memory => treat memory as a cache for the disk • Terminology: blocks in this cache are called “Pages” – Typical size of a page: 1K — 8K • Page table maps virtual page numbers to physical frames – “PTE” = Page Table Entry Physical Address Space Virtual Address Space What is virtual memory? Virtual Address Page Table index into page table Page Table Base Reg V AccessRights PA V page no. offset 10 table located in physical memory P page no. offset 10 Physical Address COSC 6385 – Computer Architecture Edgar Gabriel Translation Look-Aside Buffers Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. CPU TLBLookup Cache Main Memory VA PA miss hit data Trans- lation hit miss 20 tt1/2 t Translation with a TLB COSC 6385 – Computer Architecture Edgar Gabriel Summary : Caches • The Principle of Locality: – Program access a relatively small portion of the address space at any instant of time. • Temporal Locality: Locality in Time • Spatial Locality: Locality in Space • Three Major Categories of Cache Misses: – Compulsory Misses: sad facts of life. Example: cold start misses. – Capacity Misses: increase cache size – Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! • Write Policy: – Write Through: needs a write buffer. Nightmare: WB saturation – Write Back: control can be complex COSC 6385 – Computer Architecture Edgar Gabriel Summary : The Cache Design Space • Several interacting dimensions – cache size – block size – associativity – replacement policy – write-through vs write-back – write allocation • The optimal choice is a compromise – depends on access characteristics • workload • use (I-cache, D-cache, TLB) – depends on technology / cost • Simplicity often wins Associativity Cache Size Block Size Bad Good Less More Factor A Factor B
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved