Download CS 61C: Great Ideas in Computer Architecture Course Summary & Review and more Lecture notes Computer Architecture and Organization in PDF only on Docsity! Instructor: Justin Hsia 8/06/2012 Summer 2012 ‐‐ Lecture #28 1 CS 61C: Great Ideas in Computer Architecture Course Summary & Review Agenda • Course Summary • Administrivia • What’s Next? • Acknowledgements 8/06/2012 Summer 2012 ‐‐ Lecture #28 2 Number Representation • Anything can be represented as a number! – With n digits in base B, can represent Bn things • IEC (vs. SI) prefixes (210 ≈ 103) • Signed and unsigned integers – Addition, subtraction, overflow, sign extension – Two’s complement (better than 1’s and sign&mag) • Floating point (sign, biased exp, significand) – Inf, NaN, 0, denorms – Precision and truncation 8/06/2012 Summer 2012 ‐‐ Lecture #28 5 Higher‐Level Language (HLL) • We studied C because exposes more of hardware (particularly memory) – Compiled language is machine‐dependent • Arrays and strings – Don’t run off the end or forget null terminator • Pointers hold addresses, used to pass by ref – Pointer arithmetic – Array vs. pointer syntax • Structs are padded collections of variables 8/06/2012 Summer 2012 ‐‐ Lecture #28 6 Assembly Language • Close to the level that a machine understands – ISA in human‐readable format – TAL vs. MAL (pseudo‐instructions) • RISC vs. CISC and effects • MIPS Instruction Formats: R, I, J – Meaning and limitations of the fields – Relative (branch) vs. absolute (jump) addressing – Register conventions (saved/volatile; caller/callee) • Assembler: instr translation, sym/rel tables 8/06/2012 Summer 2012 ‐‐ Lecture #28 7 Logic Circuit Description • Build Synchronous Digital Systems out of combinational and sequential logic • Equivalence between Circuit Diagrams, Truth Tables, and Boolean Expressions – Can convert between all representations • Boolean algebra allows for circuit simplification (Karnaugh maps, too) • FSMs built with registers and CL • In reality, everything wires and transistors – Voltage‐controlled switches (1: high, 0: low) 8/06/2012 Summer 2012 ‐‐ Lecture #28 10 Great Idea #2: Moore’s Law 8/06/2012 Summer 2012 ‐‐ Lecture #28 11 Predicts: Transistor count per chip doubles every 2 years Gordon Moore Intel Cofounder B.S. Cal 1950# of tr an sis to rs o n an in te gr at ed c irc ui t ( IC ) Year: Technology Trends • Dynamic power = C × V2 × f – Capacitance, voltage, switching frequency • In WSC: Power Usage Effectiveness (PUE) = Total building power / IT equipment power • Technology growth is slowing, processors have hit a power wall – Everywhere: transistor density, CPU speed, disk and memory capacity – Performance improvements now coming from parallelism and multicore processors 8/06/2012 Summer 2012 ‐‐ Lecture #28 12 Memory • Programmer treats as one long array – You know that this is just an illusion (VM)! • Memory is byte‐addressed – Most data (including instructions) in words and word‐aligned, so all word addresses are multiples of 4 (end in 0b00) • Multicore systems use shared memory – Synchronization/cache coherence necessary 8/06/2012 Summer 2012 ‐‐ Lecture #28 15 Memory Management • Program’s address space contains four regions: – Stack: local variables, grows downward – Heap: space requested for pointers via malloc(); resizes dynamically, grows upward – Static Data: global and static variables, does not grow or shrink – Code: loaded when program starts, does not change size code static data heap stack ~ FFFF FFFFhex ~ 0hex 7/03/2012 Summer 2012 ‐‐ Lecture #10 16 • Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology 8/06/2012 Summer 2012 ‐‐ Lecture #28 17 Typical Memory Hierarchy On‐Chip Components Second Level Cache (SRAM) Control Datapath Secondary Memory (Disk or Flash)RegFile Main Memory (DRAM)Data Cache Instr Cache Cost/bit: highest lowest Speed: ½’s 1’s 10’s 100’s 1,000,000’s(cycles) Size: 100’s 10K’s M’s G’s T’s(bytes) I TLB D TLB Caching Details (2/2) • Cache parameters affect performance – Block size, cache size, set associativity – Write‐back/write‐through policies – Write allocate/no‐write allocate policies – Block replacement policy (Least Recently Used) • Source of cache misses: The 3 C’s – Compulsory, capacity, conflict • Multilevel caches reduce miss penalty 8/06/2012 Summer 2012 ‐‐ Lecture #28 20 Virtual Memory Details (1/3) • Give main memory effective size of disk without major penalty to performance – Move data in contiguous pages from disk to main memory – Assumption is that memory is small compared to both disk and virtual address space (or many processes) • Also provide protection for multiple processes – Requires a lot of work by operating system 8/06/2012 Summer 2012 ‐‐ Lecture #28 21 Virtual Memory Details (2/3) • Paging requires address translation – Can run programs larger than main memory – Hides variable machine configurations (RAM/HDD) – Solves fragmentation problem • Address mappings stored in page tables in memory – Additional memory access mitigated with TLB, which is a cache for page table – Management bits: Valid, Dirty, Ref, Access Rights 7/31/2012 22Summer 2012 ‐‐ Lecture #25 Great Idea #4: Parallelism 6/18/2012 Summer 2012 ‐‐ Lecture #1 Smart Phone Warehouse Scale Computer Leverage Parallelism & Achieve High Performance Core … Memory Input/Output Computer Core • Parallel Requests Assigned to computer e.g. search “Katz” • Parallel Threads Assigned to core e.g. lookup, ads • Parallel Instructions > 1 instruction @ one time e.g. 5 pipelined instructions • Parallel Data > 1 data item @ one time e.g. add of 4 pairs of words • Hardware descriptions All gates functioning in parallel at same time Software Hardware Cache Memory Core Instruction Unit(s) Functional Unit(s) A0+B0 A1+B1 A2+B2 A3+B3 Logic Gates 25 Types of Parallelism (1/4) • Request‐Level Parallelism (RLP) – Handling many requests per second (e.g. web search) • Data‐Level Parallelism (DLP) – Operate on many pieces of data at once – SIMD: at the level of single instructions – MapReduce: at the level of programs (split into map and reduce) 8/06/2012 Summer 2012 ‐‐ Lecture #28 26 Types of Parallelism (2/4) • Thread‐Level Parallelism (TLP) – Have many processors, run either different programs or different parts of same program at same time – If same program, need to deal with shared memory (cache coherence and synchronization primitives to prevent data races) – Splitting up work properly is difficult! • Shared vs. private variables in OpenMP • Often requires re‐designing your algorithm 8/06/2012 Summer 2012 ‐‐ Lecture #28 27 Great Idea #5: Performance Measurement and Improvement • Allows direct comparisons of architectures and quantification of improvements • It is all about time to finish (latency) – Includes both setup and execution. • Match application and hardware to exploit: – Locality – Parallelism – Special hardware features, like specialized instructions (e.g. matrix manipulation) 8/06/2012 Summer 2012 ‐‐ Lecture #28 30 Performance Measurements • Execution time (latency) and work per time (throughput) – CPU Time = Instructions × CPI × Clock Cycle Time • Memory Access: – AMAT, CPIstall use hit time, miss rate, miss penalty – Definitions recursive back to last level in hierarchy • Amdahl’s Law – Speedup = 1 / [ (1‐F) + F/S ] – Why we almost never get max possible speedup 8/06/2012 Summer 2012 ‐‐ Lecture #28 31 Performance Programming • Key challenge: Craft parallel programs that that scale well (weak/strong scaling) – Scheduling, load balancing, time for synchronization, overhead for communication • Some techniques: – Register/Cache Blocking – Data Parallelism & Loop Unrolling – Multithreading 8/06/2012 Summer 2012 ‐‐ Lecture #28 32 Redundant Arrays of Inexpensive Disks • Possible to simulate behavior of single larger disk with an array of smaller disks – Cheaper, higher bandwidth, more resistant to failure • RAID 0 – No redundancy • RAID 1 – Mirroring for redundancy • RAID 2 – Bit‐level striping • RAID 3 – Parity disks • RAID 4 – Block‐level striping with parity disk • RAID 5 – Striped parity 8/06/2012 Summer 2012 ‐‐ Lecture #28 35 Error Detection & Correction • Even parity using XOR • Hamming Distance – Distance 2 can detect 1‐bit error – Distance 3 can correct 1‐bit error – Distance 4 can correct 1‐bit error and detect 2‐bit error • Hamming ECC – Introduce extra parity bits (one per group) – Sum of group errors indicates corrupted bit 8/06/2012 Summer 2012 ‐‐ Lecture #28 36 Agenda • Course Summary • Administrivia • What’s Next? • Acknowledgements 8/06/2012 Summer 2012 ‐‐ Lecture #28 37 Project 2: sgemm‐small.c 8/06/2012 Summer 2012 ‐‐ Lecture #28 40 4 6 8 10 12 14 16 18 0 5 10 15 Fr eq ue nc y Gflops/s Average Speed on 36 x 36 Matrices Mean: Std Dev: 11.1 2.2 Project 2: sgemm‐openmp.c 8/06/2012 Summer 2012 ‐‐ Lecture #28 41 0 10 20 30 40 50 60 70 80 0 1 2 3 4 5 6 7 8 9 Fr eq ue nc y Gflops/s Average Speed on Large Matrices m=[1000,10000] by n=[32,100] Mean: Std Dev: 51.3 17.5 Project 2 Fastest Submissions • sgemm‐small (small): 1) 12.4 Gflop/s Harkiran Bolaria, Andrew Cai 2) 11.0 Gflop/s Yun Jae Cho, Duc Nguyen 3) 10.4 Gflop/s Shawn Park, Tananun Songdechakraiwut • sgemm‐small (36×36): 1) 16.4 Gflop/s Luis De Pombo, Steven Roger 2) 16.4 Gflop/s Bryan Cote, Myron Chen 3) 16.1 Gflop/s Chris Buonocore, Ali Jishi 8/06/2012 Summer 2012 ‐‐ Lecture #28 42 What’s Next? • Take classes from great teachers! (teacher > class) – Distinguished Teaching Award (very hard to get) – HKN Course evaluations (≥ 6 is very good) – Upcoming instructors for classes: (CS / EE) • Classes related to CS 61C – CS169 Software Engineering (for SaaS, Fox/Patterson Fall 12) – CS194‐15 Engineering Parallel Software – CS164 Programming Languages and Compilers – CS162 Operating Systems and Systems Programming – CS152 Computer Architecture and Engineering (Sp13) – CS150 Components and Design Techniques for Digital Systems 8/06/2012 Summer 2012 ‐‐ Lecture #28 45 Opportunities in Teaching • Interest in joining the CS staff? – Applies for CS 10, 61A, 61B, 61C – Usual path: Lab Assistant Reader TA – Also: Self‐Paced Center Tutor • Requirements: – Interest in teaching – Stricter grade requirements based on where you want to jump in • Applying: – Application form (for TA, Reader, or Lab Assistant) – Doesn’t hurt to e‐mail professor as well 8/06/2012 Summer 2012 ‐‐ Lecture #28 46 Opportunities at Cal • Why are we a top university in the WORLD? – Research, research, research! – Classes are just the tip of the iceberg – Whether you want to go to grad school or industry, you need someone to vouch for you – Won’t know if you like it or not until you try • Find out what you like, do lots of web research (read published papers), hit OH of professor, show enthusiasm & initiative 8/06/2012 Summer 2012 ‐‐ Lecture #28 47