Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Systems: A Programmer's Perspective - Memory Hierarchies and Caching, Study notes of Geometry

An excerpt from the book 'Computer Systems: A Programmer's Perspective' by Bryant and O'Hallaron. It discusses memory hierarchies and caching, including the concepts of volatile and nonvolatile memories, cache memories, and examples of caching in the memory hierarchy. The text also mentions the role of memory controllers and DRAMs.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

attourney
attourney 🇬🇧

3.7

(10)

7 documents

1 / 66

Toggle sidebar

Related documents


Partial preview of the text

Download Computer Systems: A Programmer's Perspective - Memory Hierarchies and Caching and more Study notes Geometry in PDF only on Docsity! Carnegie Mellon 1 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   The  Memory  Hierarchy     15-­‐213:  Introduc;on  to  Computer  Systems   11th  Lecture,  Oct.  6,  2015   Instructors:     Randal  E.  Bryant  and  David  R.  O’Hallaron   Carnegie Mellon 2 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Today   ¢  Storage  technologies  and  trends   ¢  Locality  of  reference   ¢  Caching  in  the  memory  hierarchy   Carnegie Mellon 5 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   NonvolaCle  Memories   ¢  DRAM  and  SRAM  are  volaCle  memories   §  Lose  informa;on  if  powered  off.   ¢  NonvolaCle  memories  retain  value  even  if  powered  off   §  Read-­‐only  memory  (ROM):  programmed  during  produc;on   §  Programmable  ROM  (PROM):  can  be  programmed  once   §  Eraseable  PROM  (EPROM):  can  be  bulk  erased  (UV,  X-­‐Ray)   §  Electrically  eraseable  PROM  (EEPROM):  electronic  erase  capability   §  Flash  memory:  EEPROMs.  with  par;al  (block-­‐level)  erase  capability   §  Wears  out  aZer  about  100,000  erasings   ¢  Uses  for  NonvolaCle  Memories   §  Firmware  programs  stored  in  a  ROM  (BIOS,  controllers  for  disks,   network  cards,  graphics  accelerators,  security  subsystems,…)   §  Solid  state  disks  (replace  rota;ng  disks  in  thumb  drives,  smart   phones,  mp3  players,  tablets,  laptops,…)   §  Disk  caches   Carnegie Mellon 6 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   TradiConal  Bus  Structure  ConnecCng     CPU  and  Memory   ¢  A  bus  is  a  collecCon  of  parallel  wires  that  carry  address,   data,  and  control  signals.   ¢  Buses  are  typically  shared  by  mulCple  devices.   Main memory I/O bridge Bus interface ALU Register file CPU chip System bus Memory bus Carnegie Mellon 7 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Memory  Read  TransacCon  (1)   ¢  CPU  places  address  A  on  the  memory  bus.   ALU Register file Bus interface A 0 A x Main memory I/O bridge %rax Load operation: movq A, %rax Carnegie Mellon 10 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Memory  Write  TransacCon  (1)   ¢   CPU  places  address  A  on  bus.  Main  memory  reads  it  and   waits  for  the  corresponding  data  word  to  arrive.   y ALU Register file Bus interface A Main memory 0 A %rax I/O bridge Store operation: movq %rax, A Carnegie Mellon 11 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Memory  Write  TransacCon  (2)   ¢   CPU  places  data  word  y  on  the  bus.   y ALU Register file Bus interface y Main memory 0 A %rax I/O bridge Store operation: movq %rax, A Carnegie Mellon 12 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Memory  Write  TransacCon  (3)   ¢   Main  memory  reads  data  word  y  from  the  bus  and  stores   it  at  address  A.   y ALU Register file Bus interface y main memory 0 A %rax I/O bridge Store operation: movq %rax, A Carnegie Mellon 15 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Geometry  (Muliple-­‐PlaXer  View)   ¢   Aligned  tracks  form  a  cylinder.   Surface 0 Surface 1 Surface 2 Surface 3 Surface 4 Surface 5 Cylinder k Spindle Platter 0 Platter 1 Platter 2 Carnegie Mellon 16 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Capacity   ¢  Capacity:  maximum  number  of  bits  that  can  be  stored.   §  Vendors  express  capacity  in  units  of  gigabytes  (GB),    where   1  GB  =  109  Bytes.     ¢  Capacity  is  determined  by  these  technology  factors:   §  Recording  density  (bits/in):  number  of  bits  that  can  be  squeezed   into  a  1  inch  segment  of  a  track.   §  Track  density  (tracks/in):  number  of  tracks  that  can  be  squeezed   into  a  1  inch  radial  segment.   §  Areal  density  (bits/in2):  product  of  recording  and  track  density.   Carnegie Mellon 17 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Recording  zones     ¢  Modern  disks  parCCon  tracks   into  disjoint  subsets  called   recording  zones     §  Each  track  in  a  zone  has  the  same   number  of  sectors,  determined   by  the  circumference  of   innermost  track.   §  Each  zone  has  a  different  number   of  sectors/track,  outer  zones   have  more  sectors/track  than   inner  zones.   §  So  we  use  average  number  of   sectors/track  when  compu;ng   capacity.           Spindle …   Carnegie Mellon 20 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  OperaCon  (MulC-­‐PlaXer  View)   Arm Read/write heads move in unison from cylinder to cylinder Spindle Carnegie Mellon 21 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Tracks divided into sectors Disk  Structure  -­‐  top  view  of  single  plaXer   Surface organized into tracks Carnegie Mellon 22 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access   Head in position above a track Carnegie Mellon 25 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  –  Read   After BLUE read After reading blue sector Carnegie Mellon 26 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  –  Read   After BLUE read Red request scheduled next Carnegie Mellon 27 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  –  Seek   After BLUE read Seek for RED Seek to red’s track Carnegie Mellon 30 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  –  Service  Time  Components   After BLUE read Seek for RED Rotational latency After RED read Data  transfer   Seek   RotaConal   latency   Data  transfer   Carnegie Mellon 31 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  Time   ¢  Average  Cme  to  access  some  target  sector  approximated  by  :   §  Taccess    =    Tavg  seek  +    Tavg  rota;on  +  Tavg  transfer     ¢  Seek  Cme  (Tavg  seek)   §  Time  to  posi;on  heads  over  cylinder  containing  target  sector.   §  Typical    Tavg  seek  is  3—9  ms   ¢  RotaConal  latency  (Tavg  rotaCon)   §  Time  wai;ng  for  first  bit  of  target  sector  to  pass  under  r/w  head.   §  Tavg  rota;on  =  1/2  x  1/RPMs  x  60  sec/1  min   §  Typical  Tavg  rota;on  =  7200  RPMs   ¢  Transfer  Cme  (Tavg  transfer)     §  Time  to  read  the  bits  in  the  target  sector.   §  Tavg  transfer  =  1/RPM  x  1/(avg  #  sectors/track)  x  60  secs/1  min.   Carnegie Mellon 32 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Disk  Access  Time  Example   ¢  Given:   §  Rota;onal  rate  =  7,200  RPM   §  Average  seek  ;me  =  9  ms.   §  Avg  #  sectors/track  =  400.   ¢  Derived:   §  Tavg  rota;on  =  1/2  x  (60  secs/7200  RPM)  x  1000  ms/sec  =  4  ms.   §  Tavg  transfer  =  60/7200  RPM  x  1/400  secs/track  x  1000  ms/sec  =  0.02  ms   §  Taccess    =  9  ms  +  4  ms  +  0.02  ms   ¢  Important  points:   §  Access  ;me  dominated  by  seek  ;me  and  rota;onal  latency.   §  First  bit  in  a  sector  is  the  most  expensive,  the  rest  are  free.   §  SRAM  access  ;me  is  about    4  ns/doubleword,  DRAM  about    60  ns   §  Disk  is  about  40,000  ;mes  slower  than  SRAM,     §  2,500  ;mes  slower  then  DRAM.   Carnegie Mellon 35 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Reading  a  Disk  Sector  (1)   Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller mouse keyboard Monitor Disk I/O bus Bus interface CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller. Carnegie Mellon 36 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Reading  a  Disk  Sector  (2)   Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller Mouse Keyboard Monitor Disk I/O bus Bus interface Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory. Carnegie Mellon 37 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Reading  a  Disk  Sector  (3)   Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller Mouse Keyboard Monitor Disk I/O bus Bus interface When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU) Carnegie Mellon 40 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   SSD  Tradeoffs  vs  RotaCng  Disks   ¢  Advantages     §  No  moving  parts  à  faster,  less  power,  more  rugged   ¢  Disadvantages   §  Have  the  poten;al  to  wear  out     §  Mi;gated  by  “wear  leveling  logic”  in  flash  transla;on  layer   §  E.g.  Intel  SSD  730  guarantees  128  petabyte  (128  x  1015  bytes)  of   writes  before  they  wear  out   §  In  2015,  about  30  ;mes  more  expensive  per  byte   ¢  ApplicaCons   §  MP3  players,  smart  phones,  laptops   §  Beginning  to  appear  in  desktops  and  servers   Carnegie Mellon 41 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   The  CPU-­‐Memory  Gap   The gap between DRAM, disk, and CPU speeds. 0.0 0.1 1.0 10.0 100.0 1,000.0 10,000.0 100,000.0 1,000,000.0 10,000,000.0 100,000,000.0 1985 1990 1995 2000 2003 2005 2010 2015 Ti m e (n s) Year Disk seek time SSD access time DRAM access time SRAM access time CPU cycle time Effective CPU cycle time DRAM   CPU   SSD   Disk   Carnegie Mellon 42 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Locality  to  the  Rescue!     The  key  to  bridging  this  CPU-­‐Memory  gap  is  a  fundamental   property  of  computer  programs  known  as  locality   Carnegie Mellon 45 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Locality  Example   ¢  Data  references   §  Reference  array  elements  in  succession   (stride-­‐1  reference  pacern).   §  Reference  variable  sum  each  itera;on.   ¢  InstrucCon  references   §  Reference  instruc;ons  in  sequence.   §  Cycle  through  loop  repeatedly.     sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; SpaCal  locality   Temporal  locality   SpaCal  locality   Temporal  locality   Carnegie Mellon 46 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   QualitaCve  EsCmates  of  Locality   ¢  Claim:  Being  able  to  look  at  code  and  get  a  qualitaCve   sense  of  its  locality  is  a  key  skill  for  a  professional   programmer.   ¢  QuesCon:  Does  this  funcCon  have  good  locality  with   respect  to  array  a?   int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } Carnegie Mellon 47 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Locality  Example   ¢  QuesCon:  Does  this  funcCon  have  good  locality  with   respect  to  array  a?   int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Carnegie Mellon 50 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Today   ¢  Storage  technologies  and  trends   ¢  Locality  of  reference   ¢  Caching  in  the  memory  hierarchy   Carnegie Mellon 51 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Example Memory Hierarchy Regs L1 cache (SRAM) Main memory (DRAM) Local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices Remote secondary storage (e.g., Web servers) Local disks hold files retrieved from disks on remote servers L2 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache. CPU registers hold words retrieved from the L1 cache. L2 cache holds cache lines retrieved from L3 cache L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices L3 cache (SRAM) L3 cache holds cache lines retrieved from main memory. L6: Main memory holds disk blocks retrieved from local disks. Carnegie Mellon 52 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Caches   ¢  Cache:  A  smaller,  faster  storage  device  that  acts  as  a  staging   area  for  a  subset  of  the  data  in  a  larger,  slower  device.   ¢  Fundamental  idea  of  a  memory  hierarchy:   §  For  each  k,  the  faster,  smaller  device  at  level  k  serves  as  a  cache  for  the   larger,  slower  device  at  level  k+1.   ¢  Why  do  memory  hierarchies  work?   §  Because  of  locality,  programs  tend  to  access  the  data  at  level  k  more   oZen  than  they  access  the  data  at  level  k+1.     §  Thus,  the  storage  at  level  k+1  can  be  slower,  and  thus  larger  and   cheaper  per  bit.   ¢  Big  Idea:    The  memory  hierarchy  creates  a  large  pool  of   storage  that  costs  as  much  as  the  cheap  storage  near  the   boXom,  but  that  serves  data  to  programs  at  the  rate  of  the   fast  storage  near  the  top.   Carnegie Mellon 55 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   General  Cache  Concepts:  Miss   0   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   8   9   14   3  Cache   Memory   Data  in  block  b  is  needed  Request:  12   Block  b  is  not  in  cache:   Miss!   Block  b  is  fetched  from   memory  Request:  12  12   12   Block  b  is  stored  in  cache   • Placement  policy:   determines  where  b  goes   • Replacement  policy:   determines  which  block   gets  evicted  (vic;m)   Carnegie Mellon 56 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   General  Caching  Concepts:     Types  of  Cache  Misses   ¢  Cold  (compulsory)  miss   §  Cold  misses  occur  because  the  cache  is  empty.   ¢  Conflict  miss   §  Most  caches  limit  blocks  at  level  k+1  to  a  small  subset  (some;mes  a   singleton)  of  the  block  posi;ons  at  level  k.   §  E.g.  Block  i  at  level  k+1  must  be  placed  in  block  (i  mod  4)  at  level  k.   §  Conflict  misses  occur  when  the  level  k  cache  is  large  enough,  but  mul;ple   data  objects  all  map  to  the  same  level  k  block.   §  E.g.  Referencing  blocks  0,  8,  0,  8,  0,  8,  ...  would  miss  every  ;me.   ¢  Capacity  miss   §  Occurs  when  the  set  of  ac;ve  cache  blocks  (working  set)  is  larger  than   the  cache.   Carnegie Mellon 57 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Examples  of  Caching  in  the  Mem.  Hierarchy   Hardware   MMU   0  On-­‐Chip  TLB  Address  translaCons  TLB   Web  browser  10,000,000  Local  disk  Web  pages  Browser  cache   Web  cache   Network  buffer   cache   Buffer  cache   Virtual  Memory   L2  cache   L1  cache   Registers   Cache  Type   Web  pages   Parts  of  files   Parts  of  files   4-­‐KB  pages   64-­‐byte  blocks   64-­‐byte  blocks   4-­‐8  bytes  words   What  is  Cached?   Web  proxy   server   1,000,000,000  Remote  server  disks   OS  100  Main  memory   Hardware  4  On-­‐Chip  L1   Hardware  10  On-­‐Chip  L2   NFS  client  10,000,000  Local  disk   Hardware  +  OS  100  Main  memory   Compiler  0    CPU  core   Managed  By  Latency  (cycles)  Where  is  it  Cached?   Disk  cache     Disk  sectors   Disk  controller   100,000   Disk  firmware   Carnegie Mellon 60 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   ConvenConal  DRAM  OrganizaCon   ¢  d  x  w  DRAM:   §  dw  total  bits  organized  as  d  supercells  of  size  w  bits   cols rows 0 1 2 3 0 1 2 3 Internal row buffer 16 x 8 DRAM chip addr data supercell (2,1) 2 bits / 8 bits / Memory controller (to/from CPU) Carnegie Mellon 61 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Reading  DRAM  Supercell  (2,1)   Step  1(a):  Row  access  strobe  (RAS)  selects  row  2.   Step  1(b):  Row  2  copied  from  DRAM  array  to  row  buffer.     Cols Rows RAS = 2 0 1 2 3 0 1 2 Internal row buffer 16 x 8 DRAM chip 3 addr data 2 / 8 / Memory controller Carnegie Mellon 62 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Reading  DRAM  Supercell  (2,1)   Step  2(a):  Column  access  strobe  (CAS)  selects  column  1.   Step  2(b):  Supercell  (2,1)  copied  from  buffer  to  data  lines,  and  eventually   back  to  the  CPU.       Cols Rows 0 1 2 3 0 1 2 3 Internal row buffer 16 x 8 DRAM chip CAS = 1 addr data 2 / 8 / Memory controller supercell (2,1) supercell (2,1) To CPU Carnegie Mellon 65 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/MB 880 100 30 1 0.1 0.06 0.02 44,000 access (ns) 200 100 70 60 50 40 20 10 typical size (MB) 0.256 4 16 64 2,000 8,000 16.000 62,500 Storage  Trends   DRAM SRAM Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/GB 100,000 8,000 300 10 5 0.3 0.03 3,333,333 access (ms) 75 28 10 8 5 3 3 25 typical size (GB) 0.01 0.16 1 20 160 1,500 3,000 300,000 Disk Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/MB 2,900 320 256 100 75 60 320 116 access (ns) 150 35 15 3 2 1.5 200 115 Carnegie Mellon 66 Bryant  and  O’Hallaron,  Computer  Systems:  A  Programmer’s  Perspec;ve,  Third  Edi;on   CPU  Clock  Rates   1985 1990 1995 2003 2005 2010 2015 2015:1985 CPU 80286 80386 Pentium P-4 Core 2 Core i7(n) Core i7(h) Clock rate (MHz) 6 20 150 3,300 2,000 2,500 3,000 500 Cycle time (ns) 166 50 6 0.30 0.50 0.4 0.33 500 Cores 1 1 1 1 2 4 4 4 Effective cycle 166 50 6 0.30 0.25 0.10 0.08 2,075 time (ns) InflecCon  point  in  computer  history   when  designers  hit  the  “Power  Wall”   (n)  Nehalem  processor   (h)  Haswell  processor  
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved