Download Understanding Parsing and Ambiguity in Computer Science: Intro to Context-Free Grammars - and more Study notes Computer Science in PDF only on Docsity! Outline Objectives and Review Context-Free Grammars Properties of Grammars CS421 Topic 15: Introduction to Grammars1 Sameer Sundresh sundresh@uiuc.edu University of Illinois at Urbana-Champaign June 19, 2007 1Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul Agha, Elsa Gunter, and Mark Hills Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars Objectives and Review Context-Free Grammars Properties of Grammars Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars Reminder: The Solution Characters Lexer Tokens Parser Tree The conversion from strings to trees is accomplished in two steps. I First, convert the stream of characters into a stream of tokens. I This is called lexing or scanning. I Turns characters into words and categorizes them. I We did this in the last few lectures! I Second, convert the stream of tokens into an abstract syntax tree. I This is called parsing. I Turns words into sentences. Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Context-free Grammars Def: A Context-free Grammar (CFG) is a 4-tuple: G = (N, Σ, P, S) where: 1. N is a finite, nonempty set of symbols (non-terminals) 2. Σ is a finite set of symbols (terminals) N ∩ Σ = Φ V ≡ N ∪ Σ (vocabulary) 3. P is a finite subset of N × V∗ (production rules) 4. S ∈ N (Goal symbol or start symbol) Sometimes written as G = (V, Σ, P, S), N = V − Σ Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Example Grammar: Arithmetic Expressions G = (N, Σ, P, S) where: N = {E , T , F} Σ = {(, ), +, ∗, id} P = {E → T E → E + T T → F T → T ∗ F F → id F → (E )} S = E . Note: P ⊆ N × V∗, where V = N ∪ Σ = {E , T , F , (, ), +, ∗, id} Note: (A, α) ∈ P is usually writ- ten: A → α or A ::= α or A : α Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Parse Trees of a Grammar A Parse Tree for a grammar G is any tree in which: I The root is labeled with S. I Each leaf is labeled with a token a (a ∈ Σ) or (the empty string) I Each interior node is labeled by a non-terminal. I If an interior node is labeled A and has children labeled X1, . . . ,Xn, then A → X1 . . .Xn is a production of G. I If A → is a production in G, then a node labeled A may have a single child labeled The string formed by the leaf labels (left to right) is the yield of the parse tree. Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Parse Trees (continued) I An intermediate parse tree is the same as a parse tree except the leaves can be non-terminals. Notes: I Every α ∈ L(G) is the yield of some parse tree. Why? I Consider a derivation, α0 ⇒ α1 ⇒ . . . ⇒ αn, where αn ∈ L(G). For each αi , we can construct an intermediate parse tree. The last one will be the parse tree for the sentence αn. I A parse tree ignores the order in which symbols are replaced to derive a string. Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Derivations and Parse Trees Example: The rightmost derivation and the parse tree for : id * id E ⇒ T ⇒ T ∗ F ⇒ T ∗ id ⇒ F ∗ id ⇒ id ∗ id E T T * F E T T F * F id E T T F id * F id Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Order of Evaluation of Parse Tree Note: These are conventions, not theorems I Code for a non-terminal is evaluated as a single “block” I I.e., cannot partially execute it, then execute something else, then evaluate the rest I A different parse tree would be needed to achieve that I E.g. 1: Non-terminal T enforces precedence of * over + I E.g. 2: E → E + T enforces left-associativity, E → T + E enforces right-associativity. I Parse tree does not specify order of execution of code blocks I Must be enforced by code generated for parent block. Obey: I Operator (e.g, +) cannot be evaluated before operands I Associativity rules Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Common Sources of Ambiguity I There are two common forms of ambiguity: I The “dangling else” form: E→ if E then E else E E→ if E then E E→ whatever Example: if a then if x then y else z ... to which if does the else belong? I The “double-ended recursion” form: E→ E + E E→ E * E Example “3 + 4 * 5” ... is it “(3 + 4) * 5” or “3 + (4 * 5)”? Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity The Dangling-Else Ambiguity Draw two separate parse trees for the “dangling else” example: if a then if x then y else z E→ if E then E else E E→ if E then E E→ id Note: id is the common token for variable names a, x, y, z. Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Fixing Ambiguity I Ambiguity can often be eliminated by thinking more carefully about what you are trying to express with your grammar. I Double-ended recursion usually reveals a lack of precedence and associativity information. I “Dangling else” usually matches with the nearest if. This can be encoded in the grammar. See §4.3 of the Dragon Book for details. I Language fixes can eliminate this problem – for instance, keywords or symbols to identify the start and end of control blocks (i.e. if-then-else-fi) Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars CFGs Example Derivations and Parse Trees Ambiguity Fixing Ambiguity I The “double-ended recursion” form usually reveals a lack of precedence and associativity information. A technique called stratification often fixes this. I Left-recursive means “associates to the left”, similarly right-recursive. I Higher precedence rules occur lower in the grammar. E→ E + T E→ T T→ T * F T→ F F→ ( E ) F→ integer Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars Properties of Grammars It is important to be able to say what properties a grammar has. Informally, Epsilon Productions A production of the form “E → ”, where represents the empty string. Right Linear Grammar Grammars where all the productions have the form “E → x E” or “E → x”. Left-Recursive Grammar a grammar that can generate “E −→ E + X” (for example). Similarly, “right-recursive grammars.” Ambiguous Grammar More than one parse tree is possible for a specific sentence. Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars Left-Recursive Grammars I A grammar is recursive if a symbol being produced (the one on the left-hand side) reappears in the right hand side after one or more steps. Example: “E → if E then E else E” I A grammar is left-recursive if the production symbol appears as the first symbol on the right-hand-side (in one or more steps). Example: “E → E + F” I Example with indirect left recursions (two steps): Example: A→ Bx B→ Ay Sameer Sundresh CS421 Topic 15: Introduction to Grammars Outline Objectives and Review Context-Free Grammars Properties of Grammars Representing Parse Trees in OCaml Our end goal of parsing will be to build an OCaml data structure representing the parsed form of a program. Although we will focus more on this later, our strategy will be to I create an OCaml datatype for each syntactic category in the language I this datatype will most likely be mutually recursive, to represent the inherent recursive structure of most language definitions I generate an OCaml term, using these mutually recursive types, representing the parsed form of the program – containment in a type constructor shows that the contained items are children of the containing item in the AST Sameer Sundresh CS421 Topic 15: Introduction to Grammars