Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

The Memory Hierarchy - Lecture Slides | CS 201, Papers of Computer Science

Material Type: Paper; Class: COMPUTER SYSTEMS PROG; Subject: Computer Science; University: Portland State University; Term: Unknown 1989;

Typology: Papers

Pre 2010

Uploaded on 08/18/2009

koofers-user-mw8
koofers-user-mw8 🇺🇸

10 documents

1 / 28

Toggle sidebar

Related documents


Partial preview of the text

Download The Memory Hierarchy - Lecture Slides | CS 201 and more Papers Computer Science in PDF only on Docsity! CS 201 The Memory Hierarchy Gerson Robboy Portland State University – 2 – 15-213, F’02 memory hierarchy overview (traditional) • CPU registers • main memory (RAM) • secondary memory (DISK) • why? what is different between these entities? • what role does caching play? • how many different kinds of secondary memory are there these days? • can we find out how fast these secondary memories are? • note: this stuff is mostly an architectural overview - when does it impact programming? – 5 – 15-213, F’02 Typical Bus Structure Connecting CPU and Memory A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. main memory I/O bridgebus interface ALU register file CPU chip system bus memory bus – 6 – 15-213, F’02 Disk Geometry Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps. spindle surface tracks track k sectors gaps – 7 – 15-213, F’02 Disk Geometry (Muliple-Platter View) Aligned tracks form a cylinder. surface 0 surface 1 surface 2 surface 3 surface 4 surface 5 cylinder k spindle platter 0 platter 1 platter 2 – 10 – 15-213, F’02 Disk Operation (Single-Platter View) The disk surface spins at a fixed rotational rate spindle By moving radially, the arm can position the read/write head over any track. The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle spindle sp in dl e – 11 – 15-213, F’02 Disk Operation (Multi-Platter View) arm read/write heads move in unison from cylinder to cylinder spindle – 12 – 15-213, F’02 Disk Access Time Average time to access some target sector approximated by :  Taccess = Tavg seek + Tavg rotation + Tavg transfer Seek time (Tavg seek)  Time to position heads over cylinder containing target sector.  Typical Tavg seek = 9 ms Rotational latency (Tavg rotation)  Time waiting for first bit of target sector to pass under r/w head.  Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min Transfer time (Tavg transfer)  Time to read the bits in the target sector.  Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min. – 15 – 15-213, F’02 I/O Bus main memory I/O bridgebus interface ALU register file CPU chip system bus memory bus disk controller graphics adapter USB controller mousekeyboard monitor disk I/O bus Expansion slots for other devices such as network adapters. – 16 – 15-213, F’02 Storage Trends Why can’t the access time of a disk be reduced a lot more? metric 1980 1985 1990 1995 2000 2000:1980 $/MB 8,000 880 100 30 1 8,000 access (ns) 375 200 100 70 60 6 typical size(MB) 0.064 0.256 4 16 64 1,000 DRAM metric 1980 1985 1990 1995 2000 2000:1980 $/MB 19,200 2,900 320 256 100 190 access (ns) 300 150 35 15 2 100 SRAM metric 1980 1985 1990 1995 2000 2000:1980 $/MB 500 100 8 0.30 0.05 10,000 access (ms) 87 75 28 10 8 11 typical size(MB) 1 10 160 1,000 9,000 9,000 Disk – 17 – 15-213, F’02 CPU Clock Rates 1980 1985 1990 1995 2000 2003 2003:1980 CPU 8080 286 386 Pent P-III P-IV MHz 1 6 20 150 750 2000 2000 ns/cycle 1,000 166 50 6 1.6 0.5 2000 Summary: In 25 years - •DRAM Memory has gotten 10,000 times bigger and cheaper, and only around 6 times faster •Disks have gotten 10,000 times bigger and cheaper and around 10 times faster •CPUs have gotten 2000 to 3000 times faster •Disks and memory are orders of magnitude slower in comparison – 20 – 15-213, F’02 Memory Hierarchies Some fundamental and enduring properties of hardware and software:  Fast storage technologies cost more per byte and have less capacity.  The gap between CPU and main memory speed is widening.  Well-written programs tend to exhibit good locality. These fundamental properties complement each other. They suggest an approach for organizing memory and storage systems known as a memory hierarchy. – 21 – 15-213, F’02 An Example Memory Hierarchy registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks. off-chip L2 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache memory. CPU registers hold words retrieved from L1 cache. L2 cache holds cache lines retrieved from main memory. L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices – 22 – 15-213, F’02 Caches Cache: A smaller, faster storage device that contains a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy:  For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work?  Programs tend to access the data at level k more often than they access the data at level k+1.  Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.  Goal: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. – 25 – 15-213, F’02 More recent info for Pentium-4  L1 data cache  8K bytes, on the cpu  2 cycle latency  L1 instruction cache replaced with “execution trace cache”  Optimized for fetching and decoding instructions  L2 cache (unified? data only?)  256K - 1 MB, also on the cpu  8-way set associative, 64-byte cache line size  18 cycle latency, even though it’s on the CPU  2 MB L3 cache off chip  Memory has around 92 cycle latency – 26 – 15-213, F’02 The memory cache Stores values from main memory, recently accessed by the processor. Controlled by the hardware – completely invisible to software  except for the performance – 27 – 15-213, F’02 Cache Performance Metrics Miss Rate  Fraction of memory references not found in cache (misses/references)  Typical numbers:  3-10% for L1  can be quite small ( < 1%) for L2, depending on size, etc. Hit Time  Time to deliver a line in the cache to the processor (includes time to determine whether the line is in the cache)  Typical numbers:  2 clock cycles for L1  ~20 clock cycles for L2 Miss Penalty  Additional time required because of a miss  Typically ~100 cycles for main memory
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved