Download Comparing Machine Performance & Memory Hierarchy in CS161 - Prof. Harry Hsieh and more Assignments Computer Science in PDF only on Docsity! CS161: Design and Architecture of Computer Systems November 16, 2007 12004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Administrative Matters • Midterm #2 – Monday, 11/19 – Cover chapter 5 and 6 – Cover homework 4 and 5 – 15% of your grade • Office Hour – Sunday • 4-5PM – Call me at 951-827-2030 if there is a problem with door… – Monday • Edward 9-10 • Harry 1-2 22004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Comparing performance • Machine 1 – Single cycle, 200ps memory, 100ps ALU, 50ps register – Clock cycle of 200+50+100+200+50=600, CPI=1 – Time per instruction = 600 ps • Machine 2 – Multicycle, 25% loads, 10% stores, 11% branches, 2% jump, 52% ALU, 5 cycle loads, 4 cycle stores, 4 cycle ALU, 3 cycle Branches, and 3 cycle jumps – CPI = 0.25*5+0.1*4+0.52*4+0.11*3+0.02*3=4.12 – Clock = max of stages = 200, Time per instruction = 824 ps • Machine 3 – Pipelined, half of loads take 2 cycles, 25% of branches are miss predicted hence take 2 cycles, jump always take 2 cycles – CPI = 0.25*1.5+0.1*1+0.52*1+0.11*1.25+0.02*2=1.17 – Clock = max of stages = 200, Time per instruction = 234 ps CS161: Design and Architecture of Computer Systems November 16, 2007 32004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING What if we are allowed 50ps clock? • Machine 1 – Single cycle, 200ps memory, 100ps ALU, 50ps register – Clock cycle of 200+50+100+200+50=600, CPI=1 – Time per instruction = 600 ps • Machine 2 – Multicycle, 25% loads, 10% stores, 11% branches, 2% jump, 52% ALU – Load takes 4+1+2+4+1 = 12 cycles – Store takes 4+1+2+4 = 11 cycles – ALU takes 4+1+2+1 = 8 cyles – Branch takes 4+1+2 = 7 cycles – Jump takes 4+1+2= cycles – CPI = 0.25*12+0.1*11+0.52*8+0.11*7+0.02*7=9.17 – Clock = 50ps, Time per instruction = 458.5ps 42004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Chapter Seven Large and Fast: Exploiting Memory Hierarchy CS161: Design and Architecture of Computer Systems November 16, 2007 92004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Memory Hierarchy • There is also power and area consideration • Our initial focus: two levels (upper, lower) – block: minimum unit of data (usually several words) (a.k.a. line) – hit: data requested is in the upper level – miss: data requested is not in the upper level 102004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Memory Hierarchy (continue) • Hit rate (to a level) (a.k.a. hit ratio) – Percentage of access that is found in that level – HIGHLY application dependent • Miss rate (to a level) (a.k.a. miss ratio) – Percentage of access that is not found in that level – 1 – hit_rate • Hit time (to a level) – Time it takes for a single memory access to that level • Miss penalty (to a level) – Time require to fetch a block from lower level – Total time when there is a miss in top level and hit in lower level • You should think of it as “Miss time” CS161: Design and Architecture of Computer Systems November 16, 2007 112004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING • Two issues: – How do we know if a data item is in the cache? – If it is, how do we find it? • Our first example: – Each block has exactly one location in the cache – "direct mapped“ – Lots of blocks in the lower level shared the same location in the cache Cache 122004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING • Cache address is – Block address modulo the number of blocks in the cache – E.g. 8 cache lines, takes lower 3 bits of block address Direct Mapped Cache CS161: Design and Architecture of Computer Systems November 16, 2007 132004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING • How do we know it is in the cache? – The upper bits (e.g. 2 bits) becomes the tag, to compare – Add another valid bit to each cache lines for initialization Direct Mapped Cache 142004 Morgan Kaufmann Publishers COMPUTER SCIENCE & ENGINEERING Behavior of a direct mapped cache