Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CIS501 Introduction to Computer Architecture, Exercises of Computer Networks

UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability. 4. What is Computer Architecture? (review). • Design of interfaces and ...

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

tiuw
tiuw 🇺🇸

4.7

(18)

53 documents

1 / 18

Toggle sidebar

Related documents


Partial preview of the text

Download CIS501 Introduction to Computer Architecture and more Exercises Computer Networks in PDF only on Docsity! UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 1 CIS501 Introduction to Computer Architecture Prof. Milo Martin Unit 1: Technology, Cost, Performance, Power, and Reliability UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 2 This Unit • What is a computer and what is computer architecture • Forces that shape computer architecture • Applications (covered last time) • Semiconductor technology • Evaluation metrics: parameters and technology basis • Cost • Performance • Power • Reliability UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 3 Readings • H+P • Chapters 1 • Paper • G. Moore, “Cramming More Components onto Integrated Circuits” • Reminders • Pre-quiz • Paper review • Groups of 3-4, send via e-mail to cis501+review@cis.upenn.edu • Don’t worry (much) about power question, as we might not get to it today UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 4 What is Computer Architecture? (review) • Design of interfaces and implementations… • Under constantly changing set of external forces… • Applications: change from above (discussed last time) • Technology: changes transistor characteristics from below • Inertia: resists changing all levels of system at once • To satisfy different constraints • CIS 501 mostly about performance • Cost • Power • Reliability • Iterative process driven by empirical evaluation • The art/science of tradeoffs UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 5 Abstraction and Layering • Abstraction: only way of dealing with complex systems • Divide world into objects, each with an… • Interface: knobs, behaviors, knobs ! behaviors • Implementation: “black box” (ignorance+apathy) • Only specialists deal with implementation, rest of us with interface • Example: car, only mechanics know how implementation works • Layering: abstraction discipline makes life even simpler • Removes need to even know interfaces of most objects • Divide objects in system into layers • Layer X objects • Implemented in terms of interfaces of layer X-1 objects • Don’t even need to know interfaces of layer X-2 objects • But sometimes helps if they do UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 6 Abstraction, Layering, and Computers • Computers are complex systems, built in layers • Applications • O/S, compiler • Firmware, device drivers • Processor, memory, raw I/O devices • Digital circuits, digital/analog converters • Gates • Transistors • 99% of users don’t know hardware layers implementation • 90% of users don’t know implementation of any layer • That’s OK, world still works just fine • But unfortunately, the layers sometimes breakdown • Someone needs to understand what’s “under the hood” UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 7 CIS501: A Picture • Computer architecture • Definition of ISA to facilitate implementation of software layers • CIS 501 mostly about computer micro-architecture • Design CPU, Memory, I/O to implement ISA … Application OS FirmwareCompiler CPU I/O Memory Digital Circuits Gates & Transistors Hardware Software Instruction Set Architecture (ISA) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 8 Semiconductor Technology Background • Transistor • invention of the century • Fabrication Application OS FirmwareCompiler CPU I/O Memory Digital Circuits Gates & Transistors UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 17 Manufacturing Process • Grow SiO2 • Grow photo-resist • Burn “wire-level-1” mask • Dissolve unburned photo-resist • And underlying SiO2 • Grow copper “wires” • Dissolve remaining photo-resist • Continue with next wire layer… • Typical number of wire layers: 3-6 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 18 Defects • Defects can arise • Under-/over-doping • Over-/under-dissolved insulator • Mask mis-alignment • Particle contaminants • Try to minimize defects • Process margins • Design rules • Minimal transistor size, separation • Or, tolerate defects • Redundant or “spare” memory cells Defective: Defective: Slow: UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 19 Empirical Evaluation • Metrics • Cost • Performance • Power • Reliability • Often more important in combination than individually • Performance/cost (MIPS/$) • Performance/power (MIPS/W) • Basis for • Design decisions • Purchasing decisions UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 20 Cost • Metric: $ • In grand scheme: CPU accounts for fraction of cost • Some of that is profit (Intel’s, Dell’s) • We are concerned about Intel’s cost (transfers to you) • Unit cost: costs to manufacture individual chips • Startup cost: cost to design chip, build the fab line, marketing Memory, display, power supply/battery, disk, packagingOther costs 20-30%20–30%10–20%10–30%% of total $10–$20$50–$100$150-$350$100–$300$ PhonePDALaptopDesktop UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 21 Unit Cost: Integrated Circuit (IC) • Chips built in multi-step chemical processes on wafers • Cost / wafer is constant, f(wafer size, number of steps) • Chip (die) cost is proportional to area • Larger chips means fewer of them • Larger chips means fewer working ones • Why? Uniform defect density • Chip cost ~ chip area" • " = 2#3 • Wafer yield: % wafer that is chips • Die yield: % chips that work • Yield is increasingly non-binary - fast vs slow chips UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 22 Yield/Cost Examples • Parameters • wafer yield = 90%, " = 2, defect density = 2/cm2 400324256196144100Die size (mm2) 10%11%12%16%19%23%Die yield 90(9)116(13)153(20)206(32)290(53)431(96)10” Wafer 52(5)68(7)90(11)124(19)177(32)256(59)8” Wafer 23(2)32(3)44(5)62(9)90(16)139(31)6” Wafer $473 $202 $119 $35 Total $37 $23 $21 $12 Test Cost $19(273) $30(431) $3(304) $11(168) Package Cost (pins) $417 $149 $95 $12 Die Cost 9%402961.5$1500Intel Pentium 19%532341.2$1500DEC Alpha 27%661961.3$1700IBM PPC601 54%181811.0$1200Intel 486DX2 YieldDiesArea (mm2) Defect (/cm2) Wafer Cost UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 23 Startup Costs • Startup costs: must be amortized over chips sold • Research and development: ~$100M per chip • 500 person-years @ $200K per • Fabrication facilities: ~$2B per new line • Clean rooms (bunny suits), lithography, testing equipment • If you sell 10M chips, startup adds ~$200 to cost of each • Companies (e.g., Intel) don’t make money on new chips • They make money on proliferations (shrinks and frequency) • No startup cost for these UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 24 Moore’s Effect on Cost • Scaling has opposite effects on unit and startup costs + Reduces unit integrated circuit cost • Either lower cost for same functionality… • Or same cost for more functionality – Increases startup cost • More expensive fabrication equipment • Takes longer to design, verify, and test chips UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 25 Performance • Two definitions • Latency (execution time): time to finish a fixed task • Throughput (bandwidth): number of tasks in fixed time • Very different: throughput can exploit parallelism, latency cannot • Baking bread analogy • Often contradictory • Choose definition that matches goals (most frequently thruput) • Example: move people from A to B, 10 miles • Car: capacity = 5, speed = 60 miles/hour • Bus: capacity = 60, speed = 20 miles/hour • Latency: car = 10 min, bus = 30 min • Throughput: car = 15 PPH (count return trip), bus = 60 PPH UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 26 Performance Improvement • Processor A is X times faster than processor B if • Latency(P,A) = Latency(P,B) / X • Throughput(P,A) = Throughput(P,B) * X • Processor A is X% faster than processor B if • Latency(P,A) = Latency(P,B) / (1+X/100) • Throughput(P,A) = Throughput(P,B) * (1+X/100) • Car/bus example • Latency? Car is 3 times (and 200%) faster than bus • Throughput? Bus is 4 times (and 300%) faster than car UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 27 What Is ‘P’ in Latency(P,A)? • Program • Latency(A) makes no sense, processor executes some program • But which one? • Actual target workload? + Accurate – Not portable/repeatable, overly specific, hard to pinpoint problems • Some representative benchmark program(s)? + Portable/repeatable, pretty accurate – Hard to pinpoint problems, may not be exactly what you run • Some small kernel benchmarks (micro-benchmarks) + Portable/repeatable, easy to run, easy to pinpoint problems – Not representative of complex behaviors of real programs UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 28 SPEC Benchmarks • SPEC (Standard Performance Evaluation Corporation) • http://www.spec.org/ • Consortium of companies that collects, standardizes, and distributes benchmark programs • Post SPECmark results for different processors • 1 number that represents performance for entire suite • Benchmark suites for CPU, Java, I/O, Web, Mail, etc. • Updated every few years: so companies don’t target benchmarks • SPEC CPU 2000 • 12 “integer”: gzip, gcc, perl, crafty (chess), vortex (DB), etc. • 14 “floating point”: mesa (openGL), equake, facerec, etc. • Written in C and Fortran (a few in C++) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 37 Another CPI Example • Assume a processor with instruction frequencies and costs • Integer ALU: 50%, 1 cycle • Load: 20%, 5 cycle • Store: 10%, 1 cycle • Branch: 20%, 2 cycle • Which change would improve performance more? • A. Branch prediction to reduce branch cost to 1 cycle? • B. A bigger data cache to reduce load cost to 3 cycles? • Compute CPI • Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2 • A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*1 = 1.8 • B = 0.5*1 + 0.2*3 + 0.1*1 + 0.2*2 = 1.6 (winner) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 38 Increasing Clock Frequency: Pipelining • CPU is a pipeline: compute stages separated by latches • http://…/~amir/cse371/lecture_slides/pipeline.pdf • Clock period: maximum delay of any stage • Number of gate levels in stage • Delay of individual gates (these days, wire delay more important) PC Insn Mem Register File s1 s2 d Data Mem a d + 4 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 39 Increasing Clock Frequency: Pipelining • Reduce pipeline stage delay • Reduce logic levels and wire lengths (better design) • Complementary to technology efforts (described later) • Increase number of pipeline stages (multi-stage operations) – Often causes CPI to increase – At some point, actually causes performance to decrease • “Optimal” pipeline depth is program and technology specific • Remember example • PentiumIII: 12 stage pipeline, 800 MHz faster than • Pentium4: 22 stage pipeline, 1 GHz • Next Intel design: more like PentiumIII • Much more about this later UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 40 CPI and Clock Frequency • System components “clocked” independently • E.g., Increasing processor clock frequency doesn’t improve memory performance • Example • Processor A: CPICPU = 1, CPIMEM = 1, clock = 500 MHz • What is the speedup if we double clock frequency? • Base: CPI = 2 ! IPC = 0.5 ! MIPS = 250 • New: CPI = 3 ! IPC = 0.33 ! MIPS = 333 • Clock *= 2 ! CPIMEM *= 2 • Speedup = 333/250 = 1.33 << 2 • What about an infinite clock frequency? • Only a x2 speedup (Example of Amdahl’s Law) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 41 Measuring CPI • How are CPI and execution-time actually measured? • Execution time: time (Unix): wall clock + CPU + system • CPI = CPU time / (clock frequency * dynamic insn count) • How is dynamic instruction count measured? • More useful is CPI breakdown (CPICPU, CPIMEM, etc.) • So we know what performance problems are and what to fix • CPI breakdowns • Hardware event counters • Calculate CPI using counter frequencies/event costs • Cycle-level micro-architecture simulation (e.g., SimpleScalar) + Measure exactly what you want + Measure impact of potential fixes • Must model micro-architecture faithfully • Method of choice for many micro-architects (and you) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 42 Improving CPI • CIS501 is more about improving CPI than frequency • Historically, clock accounts for 70%+ of performance improvement • Achieved via deeper pipelines • That will (have to) change • Deep pipelining is not power efficient • Physical speed limits are approaching • 1GHz: 1999, 2GHz: 2001, 3GHz: 2002, 4GHz? almost 2006 • Techniques we will look at • Caching, speculation, multiple issue, out-of-order issue • Vectors, multiprocessing, more… • Moore helps because CPI reduction requires transistors • The definition of parallelism is “more transistors” • But best example is caches UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 43 Moore’s Effect on Performance • Moore’s Curve: common interpretation of Moore’s Law • “CPU performance doubles every 18 months” • Self fulfilling prophecy • 2X every 18 months is ~1% per week • Q: Would you add a feature that improved performance 20% if it took 8 months to design and test? • Processors under Moore’s Curve (arrive too late) fail spectacularly • E.g., Intel’s Itanium, Sun’s Millennium 0 50 100 150 200 250 300 350 1982 1984 1986 1988 1990 1992 1994 Year Pe rf or m an ce RISC Intel x86 35%/yr UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 44 Performance Rules of Thumb • Make common case fast • Sometimes called “Amdahl’s Law” • Corollary: don’t optimize 1% to the detriment of other 99% • Build a balanced system • Don’t over-engineer capabilities that cannot be utilized • Design for actual, not peak, performance • For actual performance X, machine capability must be > X UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 45 Transistor Speed, Power, and Reliability • Transistor characteristics and scaling impact: • Switching speed • Power • Reliability • “Undergrad” gate delay model for architecture • Each Not, NAND, NOR, AND, OR gate has delay of “1” • Reality is not so simple UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 46 Transistors and Wires IBM SOI Technology © IB M From slides © Krste Asanovi!, MIT UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 47 Transistors and Wires IBM CMOS7, 6 layers of copper wiring © IB M From slides © Krste Asanovi!, MIT UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 48 1!0 I 0!1 1!0 1!0 Simple RC Delay Model • Switching time is a RC circuit (charge or discharge) • R - Resistance: slows rate of current flow • Depends on material, length, cross-section area • C - Capacitance: electrical charge storage • Depends on material, area, distance • Voltage affects speed, too UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 57 Moore’s Effect on RC Delay • Scaling helps reduce wire and gate delays • In some ways, hurts in others + Wires become shorter (Length( ! Resistance() + Wire “surface areas” become smaller (Capacitance() + Transistors become shorter (Resistance() + Transistors become narrower (Capacitance(, Resistance*) – Gate insulator thickness becomes smaller (Capacitance*) – Distance between wires becomes smaller (Capacitance*) UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 58 Improving RC Delay • Exploit good effects of scaling • Fabrication technology improvements + Use copper instead of aluminum for wires ('( ! Resistance() + Use lower-dielectric insulators ()( ! Capacitance() + Increase Voltage + Design implications + Use bigger cross-section wires (Area* ! Resistance() • Typically means taller, otherwise fewer of them – Increases “surface area” and capacitance (Capacitance*) + Use wider transistors (Area* ! Resistance() – Increases capacitance (not for you, for upstream transistors) – Use selectively UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 59 Another Constraint: Power and Energy • Power (Watt or Joule/Second): short-term (peak, max) • Mostly a dissipation (heat) concern • Power-density (Watt/cm2): important related metric – Thermal cycle: power dissipation* ! power density* ! temperature* ! resistance* ! power dissipation*… • Cost (and form factor): packaging, heat sink, fan, etc. • Energy (Joule): long-term • Mostly a consumption concern • Primary issue is battery life (cost, weight of battery, too) • Low-power implies low-energy, but not the other way around • 10 years ago, nobody cared UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 60 Sources of Energy Consumption CL Diode Leakage Current Subthreshold Leakage Current Short-Circuit Current Capacitor Charging Current Dynamic power: • Capacitor Charging (85-90% of active power) • Energy is $ CV2 per transition • Short-Circuit Current (10-15% of active power) • When both p and n transistors turn on during signal transition Static power: • Subthreshold Leakage (dominates when inactive) • Transistors don’t turn off completely • Diode Leakage (negligible) • Parasitic source and drain diodes leak to substrate From slides © Krste Asanovi!, MIT UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 61 Moore’s Effect on Power • Scaling has largely good effects on local power + Shorter wires/smaller transistors (Length( ! Capacitance() – Shorter transistor length (Resistance(, Capacitance() – Global effects largely undone by increased transistor counts • Scaling has a largely negative effect on power density + Transistor/wire power decreases linearly – Transistor/wire density decreases quadratically – Power-density increases linearly • Thermal cycle • Controlled somewhat by reduced VDD (5!3.3!1.6!1.3!1.1) • Reduced VDD sacrifices some switching speed UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 62 Reducing Power • Reduce supply voltage (VDD) + Reduces dynamic power quadratically and static power linearly • But poses a tough choice regarding VT – Constant VT slows circuit speed ! clock frequency ! performance – Reduced VT increases static power exponentially • Reduce clock frequency (f) + Reduces dynamic power linearly – Doesn’t reduce static power – Reduces performance linearly • Generally doesn’t make sense without also reduced VDD … • Except that frequency can be adjusted cycle-to-cycle and locally • More on this later UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 63 Dynamic Voltage Scaling (DVS) • Dynamic voltage scaling (DVS) • OS reduces voltage/frequency when peak performance not needed ± X-Scale is power efficient (6200 MIPS/W), but not IA32 compatible 62MIPS @ 0.01W300MIPS @ 0.25W1100MIPS @ 4.5WLow-power 800MIPS @ 0.9W1600MIPS @ 2W3400MIPS @ 34WHigh-speed 0.7–1.65V (continuous) 1.1–1.6V (continuous) 0.9–1.7V (0.1V steps) Voltage 50–800MHz (50MHz steps) 200–700MHz (33MHz steps) 300–1000MHz (50MHz steps) Frequency Intel X-Scale (StrongARM2) TM5400 “LongRun” Mobile PentiumIII “SpeedStep” UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 64 Reducing Power: Processor Modes • Modern electrical components have low-power modes • Note: no low-power disk mode, magnetic (non-volatile) • “Standby” mode • Turn off internal clock • Leave external signal controller and pins on • Restart clock on interrupt ± Cuts dynamic power linearly, doesn’t effect static power • Laptops go into this mode between keystrokes • “Sleep” mode • Flush caches, OS may also flush DRAM to disk • Turn off processor power plane – Needs a “hard” restart + Cuts dynamic and static power • Laptops go into this mode after ~10 idle minutes UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 65 Reliability • Mean Time Between Failures (MTBF) • How long before you have to reboot or buy a new one • Not very quantitative yet, people just starting to think about this • CPU reliability small in grand scheme • Software most unreliable component in a system • Much more difficult to specify & test • Much more of it • Most unreliable hardware component … disk • Subject to mechanical wear UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 66 Moore’s Bad Effect on Reliability • CMOS devices: CPU and memory • Historically almost perfectly reliable • Moore has made them less reliable over time • Two sources of electrical faults • Energetic particle strikes (from sun) • Randomly charge nodes, cause bits to flip, transient • Electro-migration: change in electrical interfaces/properties • Temperature-driven, happens gradually, permanent • Large, high-energy transistors are immune to these effects – Scaling makes node energy closer to particle energy – Scaling increases power-density which increases temperature • Memory (DRAM) was hit first: denser, smaller devices than SRAM UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 67 Moore’s Good Effect on Reliability • The key to providing reliability is redundancy • The same scaling that makes devices less reliable… • Also increase device density to enable redundancy • Classic example • Error correcting code (ECC) for DRAM • ECC also starting to appear for caches • More reliability techniques later • Today’s big open questions • Can we protect logic? • Can architectural techniques help hardware reliability? • Can architectural techniques help with software reliability? UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 68 Summary: A Global Look at Moore • Device scaling (Moore’s Law) + Increases performance • Reduces transistor/wire delay • Gives us more transistors with which to reduce CPI + Reduces local power consumption – Which is quickly undone by increased integration – Aggravates power-density and temperature problems – Aggravates reliability problem + But gives us the transistors to solve it via redundancy + Reduces unit cost – But increases startup cost • Will we fall off Moore’s Cliff? (for real, this time?) • What’s next: nanotubes, quantum-dots, optical, spin-tronics, DNA?
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved