Download Context-Free Grammars: Understanding Ambiguity and Associativity in Programming Languages and more Study notes Programming Languages in PDF only on Docsity! 1 CMSC 330: Organization of Programming Languages Context-Free Grammars CMSC 330 2 Review • Why should we study CFGs? • What are the four parts of a CFG? • How do we tell if a string is accepted by a CFG? • What’s a parse tree? 2 CMSC 330 3 Review A sentential form is a string of terminals and non- terminals produced from the start symbol Inductively: – The start symbol – If A is a sentential form for a grammar, where ( and (N|)*), and A is a production, then is a sentential form for the grammar • In this case, we say that A derives in one step, which is written as A CMSC 330 4 Leftmost and Rightmost Derivation • Example: S a | SbS String: aba Leftmost Derivation Rightmost Derivation S SbS abS aba S SbS Sba aba At every step, apply production At every step, apply production to leftmost non-terminal to rightmost non-terminal • Both derivations happen to have the same parse tree • A parse tree has a unique leftmost and a unique rightmost derivation • Not every string has a unique parse tree • Parse trees don’t show the order productions are applied 5 CMSC 330 9 More on Leftmost/Rightmost Derivations • Is the following derivation leftmost or rightmost? S aS aT aU acU ac – There’s at most one non-terminal in each sentential form, so there's no choice between left or right non- terminals to expand • How about the following derivation? – S SbS SbSbS SbabS ababS ababa CMSC 330 10 Tips for Designing Grammars 1. Use recursive productions to generate an arbitrary number of symbols A xA | Zero or more x’s A yA | y One or more y’s 2. Use separate non-terminals to generate disjoint parts of a language, and then combine in a production G = S AB A aA | B bB | L(G) = a*b* 6 CMSC 330 11 Tips for Designing Grammars (cont’d) 3. To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle {anbn | n 0} (not a regular language!) S aSb | Example: S aSb aaSbb aabb {anb2n | n 0} S aSbb | CMSC 330 12 Tips for Designing Grammars (cont’d) {anbm | m 2n, n 0} S aSbb | B | B bB | b The following grammar also works: S aSbb | B B bB | How about the following? S aSbb | bS | 7 CMSC 330 13 Tips for Designing Grammars (cont’d) {anbman+m | n 0, m 0} Rewrite as anbmaman, which now has matching superscripts (two pairs) Would this grammar work? S aSa | B B bBa | ba Corrected: S aSa | B B bBa | The outer anan are generated first, then the inner bmam Doesn’t allow m = 0 CMSC 330 14 Tips for Designing Grammars (cont’d) 4. For a language that’s the union of other languages, use separate nonterminals for each part of the union and then combine { an(bm|cm) | m > n 0} Can be rewritten as { anbm | m > n 0} { ancm | m > n 0} 10 CMSC 330 19 Example: a-b-c E E-E a-E a-E-E a- b-E a-b-c E E-E E-E-E a-E-E a-b-E a-b-c Corresponds to a-(b-c) Corresponds to (a-b)-c CMSC 330 20 The Issue: Associativity • Ambiguity is bad here because if the compiler needs to generate code for this expression, it doesn’t know what the programmer intended • So what do we mean when we write a-b-c? – In mathematics, this only has one possible meaning – It’s (a-b)-c, since subtraction is left-associative – a-(b-c) would be the meaning if subtraction was right- associative 11 CMSC 330 21 Another Example: If-Then-Else <stmt> ::= <assignment> | <if-stmt> | ... <if-stmt> ::= if (<expr>) <stmt> | if (<expr>) <stmt> else <stmt> – (Here <>’s are used to denote nonterminals and ::= for productions) • Consider the following program fragment: if (x > y) if (x < z) a = 1; else a = 2; – Note: Ignore newlines CMSC 330 22 Parse Tree #1 • Else belongs to inner if 12 CMSC 330 23 Parse Tree • Else belongs to outer if CMSC 330 24 Fixing the Expression Grammar • Idea: Require that the right operand of all of the operators is not a bare expression E E+T | E-T | E*T | T T a | b | c | (E) • Now there's only one parse tree for a-b-c – Exercise: Give a derivation for the string a-(b-c)