Download Slides on Analyzing and Understanding Software | CMSC 631 and more Study notes Computer Science in PDF only on Docsity! 1 CMSC 631 – Program Analysis and Understanding Fall 2006 2CMSC 631 •Topic: Analyzing and understanding software •Three main focus areas: ¦ Formal systems and notations -Vocabulary for talking about programs ¦ Static analysis -Automatic reasoning about source code ¦ Programming language features -Affects programs and how we reason about them About this Class 3CMSC 631 • Michael Hicks ¦ Office: 4131 AVW ¦ E-mail: mwh at cs.umd.edu ¦ Office hours: M 3:30-4:30pm, W 10am-11am - Or by appointment •James Rose ¦ E-mail: rosejr at cs.umd.edu ¦ Programming project grading Personnel 4CMSC 631 • CMSC 430 or equivalent compiler class ¦ Ideas we will use in this class: - Parse trees/abstract syntax trees - BNF notation for grammars - Type checking (usually little coverage in a compilers class) - Data flow analysis (coverage varies in a compilers class) - Tools like yacc and lex may be useful for your project ¦ We won’t use most of the other material - So even without taking a compilers class, you may be OK - Talk to me if you’re not sure Prerequisite 5CMSC 631 Textbooks • No required textbooks • Two recommended texts ¦ Pierce, Types and Programming Languages ¦ Huth and Ryan, Logic in Computer Science • Neither covers everything in the course • On reserve in CS library 6CMSC 631 Forum • Web forum on CS dept server ¦ https://forum.cs.umd.edu/forumdisplay.php?f=40 • Can use the forum to communicate with others in the class and ask questions of general interest 2 7CMSC 631 Expectations: Homework • Two kinds of assignments: ¦ Programming assignments (20% of grade) - Every two weeks - Implement the ideas we see in lecture ¦ Written assignments (10% of grade) - Every week - Short problem sets • This is how you will learn things ¦ Much more effective than listening to a lecture 8CMSC 631 Late Policy on Assignments • Programming Assignments: Due at midnight ¦ We use Marmoset for submissions -http://submit.cs.umd.edu • Written assignments: Due at start of class ¦ No late submissions • Contact me about extenuating circumstances ¦ E.g., religious holidays ¦ Inform me as soon as possible 9CMSC 631 • Will need to read some papers for class ¦ More later on in the semester ¦ Should come prepared to contribute to discussion • (Possible) student presentations later in the semester ¦ Read 1-2 papers on a topic ¦ Present a lecture in class about the material • 10% of grade on class participation Expectations: Participation 10CMSC 631 • Class goal: Teach you how to do research ¦ So you have to do research as part of the class • Substantial research project (35% of grade) ¦ Any topic vaguely related to the class - Will post some suggestions for projects later on - May also be able to share project with other class ¦ Completed in groups of size 2 (possibly 1 or 3) • This will consume second-half of semester ¦ Will ease up on homeworks, reading Expectations: Project 11CMSC 631 Expectations: Project (cont’d) • Deliverables ¦ Project proposal (one page) + talk with me ¦ Project write-up - A conference-style paper (5-15 pages, as appropriate) ¦ Implementation, if any ¦ In-class presentation - 10-20 minutes, depending on # of projects 12CMSC 631 • Final exam (25% of grade) ¦ Based on written and programming assignments ¦ Take-home -or in-class if you’d prefer; we can vote Expectations: Exam 5 25CMSC 631 Program Semantics • To be able to analyze programs, we have to know what they mean ¦ Semantics comes from the Greek semaino, or “to mean” • Three styles of formal semantics ¦ Operational semantics - Like an interpreter ¦ Denotational semantics - Like a compiler ¦ Axiomatic semantics - Semantics is based on what you can prove about programs 26CMSC 631 Operational Semantics • Evaluation is depicted as operationally, as part of some abstract machine ¦ Program states are reduced according to some transition relation . An example is our lambda calculus rule: ¦ ( x.e 1 ) e 2 e 1 [e 2 \x] • There are different styles of abstract machine ¦ Small-step (as above), big-step (natural semantics), SECD machine … • The meaning of a program is its fully reduced form (a.k.a. a value) 27CMSC 631 Denotational Semantics • The meaning of a program is defined as a mathematical object, like a function or number ¦ Rather than a sequence of machine states • The semantics is given in terms of an interpretation function [|.|] • Things get interesting when trying to define denotations for recursive constructs 28CMSC 631 Denotational Semantics example • b ::= true | false | b b | b b • e ::= 0 | 1 | … | e + e | e * e • s ::= e | if b then s else s ¦ [| true |] = true ¦ [| b1 b2 |] = [| b1 |] or [| b2 |] ¦ [| if b then s1 else s2 |] = ¦ How would we handle a while loop? [|s1|] iff [|b|] holds [|s2|] iff [|b|] does not hold 29CMSC 631 •With the aforementioned semantics, we define the behavior of programs, and then reason about programs in terms of this behavior ¦ Are two programs equivalent? Does a program terminate? Does a program implement a particular specification? •Axiomatic semantics instead directly assigns meaning in terms of what one can prove ¦ Hoare, Dijkstra, Gries, others Axiomatic Semantics 30CMSC 631 • {P} S {Q} ¦ If statement S is executed in a state satisfying precondition P, then S will terminate, and Q will hold of the resulting state ¦ Partial correctness: ignore termination • Weakest precondition for assignment ¦ Axiom: {Q[e\x]} x := e {Q} ¦ Example: {y > 3} x := y {x > 3} Example: Hoare Triples 6 31CMSC 631 • Machine represents all values as bit patterns ¦ Is 00110110111100101100111010101000 - A signed integer? Unsigned integer? Floating-point number? Address of an integer? Address of a function? etc. • Type systems allow us to distinguish these ¦ To choose operation (which + op), e.g., FORTRAN ¦ To avoid programming mistakes - E.g., don’t treat integer as a function address Type Systems 32CMSC 631 •e ::= x | n | x: .e | e e • ::= int | •A e : in type environment A, expression e has type Simply-typed -calculus A n : int x dom(A) A x : A(x) A e1 : ' A e2 : A e1 e2 : ' A[ \x] e : ' A x: .e : ' 33CMSC 631 •Liskov: ¦ If for each object o 1 of type S there is an object o 2 of type T such that for all programs P defined in terms of o 1 , the behavior of P is unchanged when o 2 is substituted for o 1 then S is a subtype of T. •Informal statement ¦ If anyone expecting a T can be given an S instead, then S is a subtype of T. Subtyping 34CMSC 631 • Control-flow analysis • CFL reachablity and polymorphism • Constraint-based analysis • Alias and pointer analysis • Region-based memory management • Garbage collection • Model checking • More … Other Technologies and Topics 35CMSC 631 • Syntactic bug pattern checkers ¦ ASTLog ¦ PREFast - Buffer overflows! (sizeof() of wrong type in copy operations) ¦ FindBugs - wait() not inside of a loop - Pointer to internal array returned (unsafe) - Dereference of null pointer Applications: Parsing 36CMSC 631 • Everything! • But in particular, Polyspace ¦ Looks for race conditions, out-of-bounds array accesses, null pointer dereferences, non-initialized data access, etc. ¦ Also includes arithmetic equation solver • Stacktool ¦ Abstractly interprets machine code to check for possible stack overflow in embedded systems Applications: Abstract Interpretation 7 37CMSC 631 • Optimizing compilers ¦ I.e., any good compiler • ESP: Path-sensitive program cheker ¦ Example: can check for correct file I/O properties, like files are opened for reading before being read • LCLint: Memory error checker (plus more) • Meta-level compilation: Checks lots of stuff • ... Applications: Dataflow analysis 38CMSC 631 • PREFix ¦ Finds null pointer dereferences, array-out-of bounds errors, etc. ¦ Used regularly at Microsoft • Also ESP Applications: Symbolic Evaluation 39CMSC 631 • SLAM and BLAST ¦ Focus on device drivers: lock/unlock protocol errors, and other errors sequencing of operations • Uses alias analysis, predicate abstraction, and more • (We will not cover model checking extensively in this case; check out CMSC 630 in the Spring!) Applications: Model Checking 40CMSC 631 • Extended Static Checker ¦ Can perform deep reasoning about programs ¦ Array out-of-bounds ¦ Null pointer errors ¦ Failure to satisfy internal invariants • Based on theorem proving Applications: Axiomatic Semantics 41CMSC 631 • Type qualifiers ¦ Format-string vulnerabilities, deadlocks, file I/O protocol errors, kernel security holes • Vault and Cyclone ¦ Memory allocation and deallocation errors, library protocol errors, misuse of locks Applications: Type Systems 42CMSC 631 • PL has a great mix of theory and practice ¦ Very deep theory ¦ But lots of practical applications • Recent exciting new developments ¦ Focus on program correctness instead of speed ¦ Forget about full correctness, though ¦ Scalability to large programs essential Conclusion