Download Attribute Grammars and Ad-hoc Syntax-Directed Translation for Semantic Analysis and more Study notes Computer Science in PDF only on Docsity! 1 Content Sensitive Analysis CS430 2 Roadmap (Where are we?) Last lecture • LR(1) parsing → Building ACTION / GOTO tables → Shift / reduce and reduce / reduce conflicts → SLR(1), LALR(1) parsers This lecture • Context-sensitive analysis → Motivation → Attribute grammars Attributes Evaluation order → Ad hoc Syntax-directed translation CS430 3 Context-Sensitive Analysis: Beyond Syntax There is a level of correctness that is deeper than grammar To generate code, we need to understand its meaning ! fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0], h, i, j, k; char *p; fie(h,i,“ab”,j, k); k = f * i + j; h = g[17]; printf(“<%s,%s>.\n”, p,q); p = 10; } What is wrong with this program? (let me count the ways …) • declared g[0], used g[17] • wrong number of args to fie() • “ab” is not an int • wrong dimension on use of f • undeclared variable q • 10 is not a character string All these errors are “deeper than syntax” CS430 4 Beyond Syntax To generate code, the compiler needs to answer many questions • Is “x” a scalar, an array, or a function? Is “x” declared? • Are there names that are not declared? Declared but not used? • Which declaration of “x” does each use reference? • Is the expression “x * y + z” type-consistent? • In “a[i,j,k]”, does a have three dimensions? • Where can “z” be stored? (register, local, global, heap, static) • In “f ← 15”, how should 15 be represented? • How many arguments does “fie()” take? What about “printf ()” ? • Does “*p” reference the result of a “malloc()” ? • Do “p” & “q” refer to the same memory location? • Is “x” defined before it is used? These are beyond a CFG CS430 5 Beyond Syntax These questions are part of context-sensitive analysis • Answers depend on “values”, not parts of speech • Questions & answers involve non-local information • Answers may involve computation How can we answer these questions? • Use formal methods → Context-sensitive grammars? → Attribute grammars? (attributed grammars?) • Use ad-hoc techniques → Symbol tables → Ad-hoc code (action routines) In scanning & parsing, formalism won; different story here. CS430 6 Beyond Syntax Telling the story • The attribute grammar formalism is important → Succinctly makes many points clear → Sets the stage for actual, ad-hoc practice • The problems with attribute grammars motivate practice → Non-local computation → Need for centralized information We will cover attribute grammars, then move on to ad-hoc ideas 2 CS430 7 Attribute Grammars What is an attribute grammar? • A context-free grammar augmented with a set of rules • Each symbol in the derivation has a set of values, or attributes • The rules specify how to compute a value for each attribute Number → Sign List Sign → + | – List → List Bit | Bit Bit → 0 | 1 Example grammar This grammar describes signed binary numbers We would like to augment it with rules that compute the decimal value of each valid input string CS430 8 Examples We will use these two throughout the lecture Number ⇒ Sign List ⇒ – List ⇒ – Bit ⇒ – 1 Number List Bit 1 Sign – For “–1” Number ⇒ Sign List ⇒ Sign List Bit ⇒ Sign List 1 ⇒ Sign List Bit 1 ⇒ Sign List 0 1 ⇒ Sign Bit 0 1 ⇒ Sign 1 0 1 ⇒ – 101 Number ListSign – Bit 1 List Bit 0 List Bit 1 For “–101” CS430 9 Attribute Grammars Add rules to compute the decimal value of a signed binary number Productions Attribution Rules Number → Sign List List.pos ← 0 If Sign.neg then Number.val ← – List.val else Number.val ← List.val Sign → + Sign.neg ← false | – Sign.neg ← true List0 → List1 Bit List1.pos ← List0.pos + 1 Bit.pos ← List0.pos List0.val ← List1.val + Bit.val | Bit Bit.pos ← List.pos List.val ← Bit.val Bit → 0 Bit.val ← 0 | 1 Bit.val ← 2Bit.pos Symbol Attributes Number val Sign neg List pos, val Bit pos, val CS430 10 Attribute Grammars Productions Attribution Rules List0 → List1 Bit List1.pos ← List0.pos + 1 Bit.pos ← List0.pos List0.val ← List1.val + Bit.val pos val pos val pos val LIST0 LIST1 BIT • Semantic rules define partial dependency graph • Value flow top down or across: inherited attributes • Value flow bottom-up: synthesized attributes CS430 11 Attribute Grammars • Semantic rules associated with production A → α have to specify the values for all - synthesized attributes for A (root) - inherited attributes for grammar symbols in α (children) ⇒ rules must specify local value flow! • Terminals can be associated with values returned by the scanner. These input values are associated with a synthesized attribute. • Starting symbol cannot have inherited attributes. Note: pos val pos val pos val LIST0 LIST1 BIT CS430 12 Attribute Grammars •Question: What rules specify values for , and ? pos val pos val pos val LIST0 LIST1 BIT 5 CS430 25 Circularity We can only evaluate acyclic instances • We can prove that some grammars can only generate instances with acyclic dependence graphs • Largest such class is “strongly non-circular” grammars (SNC ) • SNC grammars can be tested in polynomial time Many evaluation methods discover circularity dynamically ⇒ Bad property for a compiler to have SNC grammars were first defined by Kennedy & Warren CS430 26 An Extended Example Grammar for a basic block (§ 4.3.3) Block0 → Block1 Assign Assign Assign → Ident = Expr ; Expr0 → Expr1 + Term Expr1 – Term Term Term0 → Term1 * Factor Term1 / Factor Factor Factor → ( Expr ) Number Identifier Let’s estimate cycle counts • Each operation has a COST • Add them, bottom up • Assume a load per value • Assume no reuse Simple problem for an AG Hey, this looks useful ! CS430 27 An Extended Example (continued) Block0 → Block 1 Ass ign Block0.cost ← Block 1.cost + Assign.cost Assign Block0.cost ← Assign.cost Assign → Ident = Expr ; Assign.cost ← COST(store) + Expr.cost Expr0 → Expr1 + Term Expr0.cost ← Expr1.cost + COST(add) + Term.cost Expr1 – Term Expr0.cost ← Expr1.cost + COST(add) + Term.cost Term Expr0.cost ← Term.cost Term0 → Term1 * Factor Term0.cost ← Term1.cost + COST(mult ) + Factor.cost Term1 / F actor Term0.cost ← Term1.cost + COST(div) + Factor.cost Factor Term0.cost ← Factor.cost Factor → ( Expr ) Factor.cost ← Expr.cost Numb er Factor.cost ← COST(loadI) Identifier Factor.cost ← COST(load) These are all synthesized attributes ! Values flow from rhs to lhs in prod’ns CS430 28 An Extended Example (continued) Properties of the example grammar • All attributes are synthesized ⇒ S-attributed grammar • Rules can be evaluated bottom-up in a single pass → Good fit to bottom-up, shift/reduce parser • Easily understood solution • Seems to fit the problem well What about an improvement? • Values are loaded only once per block (not at each use) • Need to track which values have been already loaded Things will get more complicated. CS430 29 Adding load tracking • Need sets Before and After for each production • Must be initialized, updated, and passed around the tree A Better Execution Model Factor → ( Expr ) Factor.cost ← Expr.cost ; Expr.Before ← Factor.Before ; Factor.After ← Expr.After Number Factor.cost ← COST(loadi) ; Factor.After ← Factor.Before Identifier If (Identifier.name ∉ Factor.Before) then Factor.cost ← COST(load); Factor.After ← Factor.Before ∪ Identifier.name else Factor.cost ← 0 Factor.After ← Factor.Before This looks more complex! CS430 30 • Load tracking adds complexity • But, most of it is in the “copy rules” • Every production needs rules to copy Before & After A sample production These copy rules multiply rapidly Each creates an instance of the set Lots of work, lots of space, lots of rules to write A Better Execution Model Expr0 → Expr1 + Term Expr0 .cost ← Expr1.cost + COST(a dd) + Term.cost ; Expr1.Before ← Expr0 .Before ; Term.Before ← Expr1.Afte r; Expr0 .Afte r ← Term.After 6 CS430 31 The Moral of the Story • Non-local computation needed lots of supporting rules • “Complex” local computation is relatively easy The Problems • Copy rules increase cognitive overhead • Copy rules increase space requirements → Need copies of attributes • Result is an attributed tree → Must build the parse tree → Either search tree for answers or copy them to the root CS430 32 Addressing the Problem What would a good programmer do? • Introduce a central repository for facts • Table of names → Field in table for loaded/not loaded state • Avoids all the copy rules, allocation & storage headaches • All inter-assignment attribute flow is through table → Clean, efficient implementation → Good techniques for implementing the table (hashing, § B.4) → When its done, information is in the table ! → Cures most of the problems • Unfortunately, this design violates the functional paradigm → Do we care? CS430 33 The Realist’s Alternative Ad-hoc syntax-directed translation • Associate pieces of code with each production • At each reduction, the corresponding code is executed • Allowing arbitrary code provides complete flexibility → Includes ability to do tasteless & bad things To make this work • Need names for attributes of each symbol on lhs & rhs → Typically, one attribute passed through parser + arbitrary code (structures, globals, statics, …) → Yacc introduced $$, $1, $2, … $n, left to right • Need an evaluation scheme → Fits nicely into LR(1) parsing algorithm CS430 34 Reworking the Example (with load tracking) Block0 → Block1 Assign Assign Assign → Ident = Expr ; cost← cost + COST(store); Expr0 → Expr1 + Term cost← cost + COST(add); Expr1 – Term cost← cost + COST(sub); Term Term0 → Term1 * Factor cost← cost + COST(mult); Term1 / Factor cost← cost + COST(div); Factor Factor → ( Expr ) Number cost← cost + COST(loadi); Identifier { i← hash(Identifier); if (Table[i].loaded = false) then { cost ← cost + COST(load); Table[i].loaded ← true; } } This looks cleaner & simpler than the AG sol’n ! One missing detail: initializing “cost”; (we ignore “Table[ ] for now) CS430 35 Reworking the Example (with load tracking) Start → Init Block Init → ε cost ← 0; Block0 → Block1 Assign Assign Assign → Ident = Expr ; cost← cost + COST(store); … and so on as in the previous version of the example … • Before parser can reach Block, it must reduce Init • Reduction by Init sets cost to zero This is an example of splitting a production to create a reduction in the middle — for the sole purpose of hanging an action routine there (marker production)! CS430 36 Reworking the Example (with load tracking) Block0 → Block1 Assign $$ ← $1 + $2 ; Assign $$ ← $1 ; Assign → Ident = Expr ; $$← COST(store) + $3; Expr0 → Expr1 + Term $$← $1 + COST(add) + $3; Expr1 – Term $$← $1 + COST(sub) + $3; Term $$ ← $1; Term0 → Term1 * Factor $$ ← $1 + COST(mult) + $3; Term1 / Factor $$ ← $1 + COST(div) + $3; Factor $$ ← $1; Factor → ( Expr ) $$ ← $2; Number $$ ← COST(loadi); Identifier { i← hash(Identifier); if (Table[i].loaded = false) then { $$ ← COST(load); Table[i].loaded ← true; } else $$ ← 0 } This version passes the values through attributes. It avoids the need for initializing “cost” However, Table[ ] still needs to be initialized 7 CS430 37 Example — Building an Abstract Syntax Tree • Assume constructors for each node • Assume stack holds pointers to nodes • Assume yacc syntax Goal → Expr $$ = $1; Expr → Expr + Term $$ = MakeAddNode($1,$3); | Expr – Term $$ = MakeSubNode($1,$3); | Term $$ = $1; Term → Term * Factor $$ = MakeMulNode($1,$3); | Term / Factor $$ = MakeDivNode($1,$3); | Factor $$ = $1; Factor → ( Expr ) $$ = $2; | number $$ = MakeNumNode(token); | id $$ = MakeIdNode(token); CS430 38 Reality Most parsers are based on this ad-hoc style of context- sensitive analysis Advantages • Addresses the shortcomings of the AG paradigm • Efficient, flexible Disadvantages • Must write the code with little assistance • Programmer deals directly with the details Most parser generators support a yacc-like notation CS430 39 Typical Uses (Semantic Analysis) • Building a symbol table → Enter declaration information as processed → At end of declaration syntax, do some post processing → Use table to check errors as parsing progresses • Simple error checking/type checking → Define before use → lookup on reference → Dimension, type, ... → check as encountered → Type conformability of expression → bottom-up walk → Procedure interfaces are harder Build a representation for parameter list & types Check actual vs. formal parameter list Positional or keyword associations assumes table is global CS430 40 Is This Really “Ad-hoc” ? Relationship between practice and attribute grammars Similarities • Both rules & actions associated with productions • Application order determined by tools • (Somewhat) abstract names for symbols Differences • Actions applied as a unit; not true for AG rules • Anything goes in ad-hoc actions; AG rules are (purely) functional • AG rules are higher level than ad-hoc actions CS430 41 Making Ad-hoc Syntax Directed Translation Work How do we fit this into an LR(1) parser? • Need a place to store the attributes → Stash them in the stack, along with state and symbol → Push three items each time, pop 3 x |β| symbols • Need a naming scheme to access them → $n translates into stack location: top - 3 x (|β| - n) • Need to sequence rule applications → On every reduce action, perform the action rule What about a rule that must work in mid-production? • Can transform the grammar → Split it into two parts at the point where rule must go and apply the rule on reduction to the appropriate part → Introduce marker productions M → ε with appropriate action