Download Pros and Cons of Pointers and more Lecture notes Algorithms and Programming in PDF only on Docsity! 1 Carnegie Mellon Lecture 16 Pointer Analysis • Basics • Design Options • Pointer Analysis Algorithms • Pointer Analysis Using BDDs • Probabilistic Pointer Analysis Phillip B. Gibbons 15-745: Pointer Analysis 1 [ALSU 12.4, 12.6-12.7] Carnegie Mellon Pros and Cons of Pointers • Many procedural languages have pointers – e.g., C or C++: int *p = &x; • Pointers are powerful and convenient – can build arbitrary data structures • Pointers can also hinder compiler optimization – hard to know where pointers are pointing – must be conservative in their presence • Has inspired much research – analyses to decide where pointers are pointing – many options and trade-offs – open problem: a scalable accurate analysis 15-745: Pointer Analysis 2 Carnegie Mellon Pointer Analysis Basics: Aliases • Two variables are aliases if: – they reference the same memory location • More useful: – prove variables reference different locations 15-745: Pointer Analysis 3 int x,y; int *p = &x; int *q = &y; int *r = p; int **s = &q; Alias sets: {x, *p, *r} {y, *q, **s} {q, *s} p and q point to different locs Carnegie Mellon The Pointer Alias Analysis Problem • Decide for every pair of pointers at every program point: – do they point to the same memory location? • A difficult problem – shown to be undecidable by Landi, 1992 • Correctness: – report all pairs of pointers which do/may alias • Ambiguous: – two pointers which may or may not alias • Accuracy/Precision: – how few pairs of pointers are reported while remaining correct – ie., reduce ambiguity to improve accuracy 15-745: Pointer Analysis 4 2 Carnegie Mellon Many Uses of Pointer Analysis • Basic compiler optimizations – register allocation, CSE, dead code elimination, live variables, instruction scheduling, loop invariant code motion, redundant load/store elimination • Parallelization – instruction-level parallelism – thread-level parallelism • Behavioral synthesis – automatically converting C-code into gates • Error detection and program understanding – memory leaks, wild pointers, security holes 15-745: Pointer Analysis 5 Carnegie Mellon Challenges for Pointer Analysis • Complexity: huge in space and time – compare every pointer with every other pointer – at every program point – potentially considering all program paths to that point • Scalability vs accuracy trade-off – different analyses motivated for different purposes – many useful algorithms (adds to confusion) • Coding corner cases – pointer arithmetic (*p++), casting, function pointers, long-jumps • Whole program? – most algorithms require the entire program – library code? optimizing at link-time only? 15-745: Pointer Analysis 6 Carnegie Mellon Pointer Analysis: Design Options • Representation • Heap modeling • Aggregate modeling • Flow sensitivity • Context sensitivity 15-745: Pointer Analysis 7 Carnegie Mellon Representation • Track pointer aliases – <*a, b>, <*a, e>, <b, e>, <**a, c>, <**a, d>, … – More precise, less efficient • Track points-to information – <a, b>, <b, c>, <b, d>, <e, c>, <e, d> – Less precise, more efficient 15-745: Pointer Analysis 8 a = &b; b = &c; b = &d; e = b; a *a b e **a *e dc *b a b c de 5 Carnegie Mellon Andersen’s Algorithm • Flow-insensitive, context-insensitive, iterative • Representation: – one points-to graph for entire program – each node represents exactly one location • For each statement, build the points-to graph: • Iterate until graph no longer changes • Worst case complexity: O(n3), where n = program size 15-745: Pointer Analysis 17 y = &x y points-to x y = x if x points-to wthen y points-to w *y = x if y points-to z and x points-to wthen z points-to w y = *x if x points-to z and z points-to wthen y points-to w Carnegie Mellon Andersen Example pS5 = 15-745: Pointer Analysis 18 T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); S5: … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) s9: p = &local; } {heapS1, heapS4, local} Carnegie Mellon Steensgaard’s Algorithm • Flow-insensitive, context-insensitive • Representation: – a compact points-to graph for entire program • each node can represent multiple locations • but can only point to one other node – i.e. every node has a fan-out of 1 or 0 • union-find data structure implements fan-out – “unioning” while finding eliminates need to iterate • Worst case complexity: O(n) • Precision: less precise than Andersen’s 15-745: Pointer Analysis 19 Carnegie Mellon Steensgaard Example pS5 = 15-745: Pointer Analysis 20 T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); S5: … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) s9: p = &local; } {heapS1, heapS4, heapS6, local} 6 Carnegie Mellon Example with Flow Sensitivity pS5 = 15-745: Pointer Analysis 21 T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); S5: … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) s9: p = &local; } pS9 ={heapS4} {local, heapS1} Carnegie Mellon Pointer Analysis Using BDDs References: • “Cloning-based context-sensitive pointer alias analysis using binary decision diagrams”, Whaley and Lam, PLDI 2004 • “Symbolic pointer analysis revisited”, Zhu and Calman, PDLI 2004 • “Points-to analysis using BDDs”, Berndl et al, PDLI 2003 15-745: Pointer Analysis 22 Carnegie Mellon Binary Decision Diagram (BDD) 15-745: Pointer Analysis 23 Binary Decision Tree Truth Table BDD Carnegie Mellon BDD-Based Pointer Analysis • Use a BDD to represent transfer functions – encode procedure as a function of its calling context – compact and efficient representation • Perform context-sensitive, inter-procedural analysis – similar to dataflow analysis – but across the procedure call graph • Gives accurate results – and scales up to large programs 15-745: Pointer Analysis 24 7 Carnegie Mellon Probabilistic Pointer Analysis References: • “A Probabilistic Pointer Analysis for Speculative Optimizations”, DaSilva and Steffan, ASPLOS 2006 • “Compiler support for speculative multithreading architecture with probabilistic points-to analysis”, Shen et al., PPoPP 2003 • “Speculative Alias Analysis for Executable Code”, Fernandez and Espasa, PACT 2002 • “A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion”, Dai et al., CGO 2005 • “Speculative register promotion using Advanced Load Address Table (ALAT)”, Lin et al., CGO 2003 15-745: Pointer Analysis 25 Carnegie Mellon Pointer Analysis: Yes, No, & Maybe • Do pointers a and b point to the same location? – Repeat for every pair of pointers at every program point • How can we optimize the “maybe” cases? 15-745: Pointer Analysis 26 *a = ~ ~ = *b Definitely Not Definitely Maybe PointerAnalysis optimize *a = ~ ~ = *b Carnegie Mellon Let’s Speculate • Implement a potentially unsafe optimization – Verify and Recover if necessary 15-745: Pointer Analysis 27 int *a, x; … while(…) { x = *a; … } a is probably loop invariant int *a, x, tmp; … tmp = *a; while(…) { x = tmp; … } <verify, recover?> Carnegie Mellon Data Speculative Optimizations • EPIC Instruction sets – Support for speculative load/store instructions (e.g., Itanium) • Speculative compiler optimizations – Dead store elimination, redundancy elimination, copy propagation, strength reduction, register promotion • Thread-level speculation (TLS) – Hardware and compiler support for speculative parallel threads • Transactional programming – Hardware and software support for speculative parallel transactions Heavy reliance on detailed profile feedback 15-745: Pointer Analysis 28