Download Performance Measurement in Computer Architecture: Metrics, Trends, and Technologies and more Assignments Computer Architecture and Organization in PDF only on Docsity! 1 Introduction • Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design • Text for CS/EE 6810: Hennessy and Patterson’s Computer Architecture, A Quantitative Approach, 4th Edition • Topics Measuring performance/cost/power Instruction level parallelism, dynamic and static Memory hierarchy Multiprocessors Storage systems and networks 2 Organizational Issues • Office hours, MEB 3414, by appointment • TA: Kenneth Williams; TA office hrs: TBA • Special accommodations, add/drop policies (see class webpage) • Class web-page and class mailing list at http://www.eng.utah.edu/~cs6810 • Grades: Two midterms, 25% each Homework assignments, 50%, you may skip one No tolerance for cheating 5 Where Are We Headed? • Modern trends: Clock speed improvements are slowing power constraints already doing less work per stage Difficult to further optimize a single core for performance Multi-cores: each new processor generation will accommodate more cores 6 Processor Technology Trends • Shrinking of transistor sizes: 250nm (1997) 130nm (2002) 65nm (2007) 22nm • Transistor density increases by 35% per year and die size increases by 10-20% per year… more cores! • Transistor speed improves linearly with size (complex equation involving voltages, resistances, capacitances)… clock speed improvements! • Wire delays do not scale down at the same rate as logic delays… the Pentium 4 has pipeline stages for wire delays 7 Technology Trends • DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years (the memory wall!), bandwidth improves twice as fast as latency decreases • Disk density improves by 100% every year, latency improvement similar to DRAM • Networks: primary focus on bandwidth; 10Mb 100Mb in 10 years; 100Mb 1Gb in 5 years 10 Summarizing Performance • Consider 25 programs from a benchmark set – how do we capture the behavior of all 25 programs with a single number? P1 P2 P3 Sys-A 10 8 25 Sys-B 12 9 20 Sys-C 8 8 30 Total (average) execution time Total (average) weighted execution time Average of normalized execution times Geometric mean of normalized execution times 11 AM Example • We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second • The exact same workload (the four programs execute the same number of instructions that they did on machine X) is run on a new machine Y and the execution times for each program are 0.8, 1.1, 0.5, 2 • With AM of normalized execution times, we can conclude that Y is 1.1 times slower than X – perhaps, not for all workloads, but definitely for one specific workload (where all programs run on the ref-machine for an equal #cycles) • With GM, you may find inconsistencies 12 GM Example Computer-A Computer-B Computer-C P1 1 sec 10 secs 20 secs P2 1000 secs 100 secs 20 secs Conclusion with GMs: (i) A=B (ii) C is ~1.6 times faster • For (i) to be true, P1 must occur 100 times for every occurrence of P2 • With the above assumption, (ii) is no longer true Hence, GM can lead to inconsistencies 15 CPU Performance Equation • CPU time = clock cycle time x cycles per instruction x number of instructions • Influencing factors for each: clock cycle time: technology and organization CPI: organization and instruction set design instruction count: instruction set design and compiler • CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically 16 Measuring System CPI • Assume that an architectural innovation only affects CPI • For 3 programs, base CPIs: 1.2, 1.8, 2.5 CPIs for proposed model: 1.4, 1.9, 2.3 • What is the best way to summarize performance with a single number? AM, HM, or GM of CPIs? 17 Example • AM of CPI for base case = 1.2 cyc + 1.8 cyc + 2.5 cyc instr instr instr 5.5 cycles is execution time if each program ran for one instruction – therefore, AM of CPI defines a workload where every program runs for an equal #instrs • HM of CPI = 1 / AM of IPC ; defines a workload where every program runs for an equal number of cycles • GM of CPI: warm fuzzy number, not necessarily representing any workload