Download LR Parsing: Handling Left-Recursive Grammars and Shift-Reduce Conflicts and more Assignments Computer Science in PDF only on Docsity! 1 Prof. Su ECS 142 Lectures 7-8 1 Bottom Up Parsing ECS 142 Prof. Su ECS 142 Lectures 7-8 2 Reminders • PA2 due today by midnight • WA1 due today – In class – By 5pm in 142 homework box • PA3 has been assigned (due Oct 31) Prof. Su ECS 142 Lectures 7-8 3 Bottom-Up Parsing • Bottom-up parsing is more general than top- down parsing – And just as efficient – Builds on ideas in top-down parsing – Preferred method in practice • Also called LR parsing – L means that tokens are read left to right – R means that it constructs a rightmost derivation ! Prof. Su ECS 142 Lectures 7-8 4 An Introductory Example • LR parsers don’t need left-factored grammars and can also handle left-recursive grammars • Consider the following grammar: E → E + ( E ) | int – Why is this not LL(1)? • Consider the string: int + ( int ) + ( int ) Prof. Su ECS 142 Lectures 7-8 5 The Idea • LR parsing reduces a string to the start symbol by inverting productions: str ← input string of terminals repeat – Identify β in str such that A → β is a production (i.e., str = α β γ) – Replace β by A in str (i.e., str becomes α A γ) until str = S Prof. Su ECS 142 Lectures 7-8 6 A Bottom-up Parse in Detail (1) int++int int( ) int + (int) + (int) () 2 Prof. Su ECS 142 Lectures 7-8 7 A Bottom-up Parse in Detail (2) E int++int int( ) int + (int) + (int) E + (int) + (int) () Prof. Su ECS 142 Lectures 7-8 8 A Bottom-up Parse in Detail (3) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) () E Prof. Su ECS 142 Lectures 7-8 9 A Bottom-up Parse in Detail (4) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E () E Prof. Su ECS 142 Lectures 7-8 10 A Bottom-up Parse in Detail (5) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E () EE Prof. Su ECS 142 Lectures 7-8 11 A Bottom-up Parse in Detail (6) E E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E E () EEA rightmost derivation in reverse Prof. Su ECS 142 Lectures 7-8 12 Important Fact #1 Important Fact #1 about bottom-up parsing: An LR parser traces a rightmost derivation in reverse 5 Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E int++int int( ) E () E Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E int++int int( ) E () EE Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E + (E) I $ red. E → E + (E) E int++int int( ) E () EE Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E + (E) I $ red. E → E + (E) E I $ accept E E int++int int( ) E () EE Prof. Su ECS 142 Lectures 7-8 29 The Stack • Left string can be implemented as a stack – Top of the stack is the I • Shift pushes a terminal on the stack • Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non- terminal on the stack (production lhs) Prof. Su ECS 142 Lectures 7-8 30 Key Issue: When to Shift or Reduce? • Decide based on the left string (the stack) • Idea: use a finite automaton (DFA) to decide when to shift or reduce – The DFA input is the stack – The language consists of terminals and non-terminals • We run the DFA on the stack and we examine the resulting state X and the token tok after I – If X has a transition labeled tok then shift – If X is labeled with “A → β on tok” then reduce 6 LR(1) Parsing. An Example int E → int on $, + accept on $ E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( I int + (int) + (int)$ shift int I + (int) + (int)$ E → int E I + (int) + (int)$ shift(x3) E + (int I ) + (int)$ E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ E → E+(E) E I + (int)$ shift (x3) E + (int I )$ E → int E + (E I )$ shift E + (E) I $ E → E+(E) E I $ accept int E ) Prof. Su ECS 142 Lectures 7-8 32 Representing the DFA • Parsers represent the DFA as a 2D table – Recall table-driven lexical analysis • Lines correspond to DFA states • Columns correspond to terminals and non- terminals • Typically columns are split into: – Those for terminals: action table – Those for non-terminals: goto table Prof. Su ECS 142 Lectures 7-8 33 Representing the DFA. Example • The table for a fragment of our DFA: … rE→ E+(E)rE→ E+(E)7 s7s86 rE→ intrE→ int5 g6s54 s43 … E$)(+int E → int on ), + E → E + (E) on $, + ( int 3 4 56 7 ) E Prof. Su ECS 142 Lectures 7-8 34 The LR Parsing Algorithm • After a shift or reduce action we rerun the DFA on the entire stack – This is wasteful, since most of the work is repeated • Remember for each stack element to which state it brings the DFA • LR parser maintains a stack 〈 sym1, state1 〉 . . . 〈 symn, staten 〉 statek is the final state of the DFA on sym1 … symk Prof. Su ECS 142 Lectures 7-8 35 The LR Parsing Algorithm Let I = w$ be initial input Let j = 0 Let DFA state 0 be the start state Let stack = 〈 dummy, 0 〉 repeat case action[top_state(stack), I[j]] of shift k: push 〈 I[j++], k 〉 reduce X → α: pop |α| pairs, push 〈X, Goto[top_state(stack), X]〉 accept: halt normally error: halt and report error Prof. Su ECS 142 Lectures 7-8 36 LR Parsing Notes • Can be used to parse more grammars than LL • Most programming languages grammars are LR • Can be described as a simple table • There are tools for building the table • How is the table constructed? 7 Prof. Su ECS 142 Lectures 7-8 37 Outline • Review of bottom-up parsing • Computing the parsing DFA • Using parser generators Prof. Su ECS 142 Lectures 7-8 38 Bottom-up Parsing (Review) • A bottom-up parser rewrites the input string to the start symbol • The state of the parser is described as α I γ – α is a stack of terminals and non-terminals – γ is the string of terminals not yet examined • Initially: I x1x2 . . . xn Prof. Su ECS 142 Lectures 7-8 39 The Shift and Reduce Actions (Review) • Recall the CFG: E → int | E + (E) • A bottom-up parser uses two kinds of actions: • Shift pushes a terminal from input on the stack E + (I int ) ⇒ E + (int I ) • Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non- terminal on the stack (production lhs) E + (E + ( E ) I ) ⇒ E +(E I ) Prof. Su ECS 142 Lectures 7-8 40 Key Issue: When to Shift or Reduce? • Idea: use a finite automaton (DFA) to decide when to shift or reduce – The input is the stack – The language consists of terminals and non-terminals • We run the DFA on the stack and we examine the resulting state X and the token tok after I – If X has a transition labeled tok then shift – If X is labeled with “A → β on tok” then reduce LR(1) Parsing. An Example int E → int on $, + accept on $ E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( I int + (int) + (int)$ shift int I + (int) + (int)$ E → int E I + (int) + (int)$ shift(x3) E + (int I ) + (int)$ E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ E → E+(E) E I + (int)$ shift (x3) E + (int I )$ E → int E + (E I )$ shift E + (E) I $ E → E+(E) E I $ accept int E ) Prof. Su ECS 142 Lectures 7-8 42 End of review 10 Prof. Su ECS 142 Lectures 7-8 55 Shift/Reduce Conflicts • If a DFA state contains both [X → α•aβ, b] and [Y → γ•, a] • Then on input “a” we could either – Shift into state [X → αa•β, b], or – Reduce with Y → γ • This is called a shift-reduce conflict Prof. Su ECS 142 Lectures 7-8 56 Shift/Reduce Conflicts • Typically due to ambiguities in the grammar • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Will have DFA state containing [S → if E then S•, else] [S → if E then S• else S, x] • If else follows then we can shift or reduce • Default (bison, CUP, etc.) is to shift – Default behavior is as needed in this case Prof. Su ECS 142 Lectures 7-8 57 More Shift/Reduce Conflicts • Consider the ambiguous grammar E → E + E | E * E | int • We will have the states containing [E → E * • E, +] [E → E * E•, +] [E → • E + E, +] ⇒E [E → E • + E, +] … … • Again we have a shift/reduce on input + – We need to reduce (* binds more tightly than +) – Recall solution: declare the precedence of * and + Prof. Su ECS 142 Lectures 7-8 58 More Shift/Reduce Conflicts • In bison declare precedence and associativity: %left + %left * • Precedence of a rule = that of its last terminal – See bison manual for ways to override this default • Context-dependent precedence (Section 5.4, pp 70) • Resolve shift/reduce conflict with a shift if: – no precedence declared for either rule or terminal – input terminal has higher precedence than the rule – the precedences are the same and right associative Prof. Su ECS 142 Lectures 7-8 59 Using Precedence to Solve S/R Conflicts • Back to our example: [E → E * • E, +] [E →E * E•, +] [E → • E + E, +] ⇒E [E →E • + E, +] … … • Will choose reduce because precedence of rule E → E * E is higher than of terminal + Prof. Su ECS 142 Lectures 7-8 60 Using Precedence to Solve S/R Conflicts • Same grammar as before E → E + E | E * E | int • We will also have the states [E → E + • E, +] [E → E + E•, +] [E → • E + E, +] ⇒E [E → E • + E, +] … … • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative 11 Prof. Su ECS 142 Lectures 7-8 61 Using Precedence to Solve S/R Conflicts • Back to our dangling else example [S → if E then S•, else] [S → if E then S• else S, x] • Can eliminate conflict by declaring else with higher precedence than then – Or just rely on the default shift action • But this starts to look like “hacking the parser” • Best to avoid overuse of precedence declarations or you’ll end with unexpected parse trees Prof. Su ECS 142 Lectures 7-8 62 Reduce/Reduce Conflicts • If a DFA state contains both [X → α•, a] and [Y → β•, a] – Then on input “a” we don’t know which production to reduce • This is called a reduce/reduce conflict Prof. Su ECS 142 Lectures 7-8 63 Reduce/Reduce Conflicts • Usually due to gross ambiguity in the grammar • Example: a sequence of identifiers S → ε | id | id S • There are two parse trees for the string id S → id S → id S → id • How does this confuse the parser? Prof. Su ECS 142 Lectures 7-8 64 More on Reduce/Reduce Conflicts • Consider the states [S → id •, $] [S’ → • S, $] [S → id • S, $] [S → •, $] ⇒id [S → •, $] [S → • id, $] [S → • id, $] [S → • id S, $] [S → • id S, $] • Reduce/reduce conflict on input $ S’ → S → id S’ → S → id S → id • Better rewrite the grammar: S → ε | id S Prof. Su ECS 142 Lectures 7-8 65 Using Parser Generators • Parser generators construct the parsing DFA given a CFG – Use precedence declarations and default conventions to resolve conflicts – The parser algorithm is the same for all grammars (and is provided as a library function) • But most parser generators do not construct the DFA as described before – Because the LR(1) parsing DFA has 1000s of states even for a simple language Prof. Su ECS 142 Lectures 7-8 66 LR(1) Parsing Tables are Big • But many states are similar, e.g. and • Idea: merge the DFA states whose items differ only in the lookahead tokens – We say that such states have the same core • We obtain E → int on $, +E → int•, $/+ E → int•, )/+ E → int on ), + 51 E → int on $, +, )E → int•, $/+/) 1’ 12 Prof. Su ECS 142 Lectures 7-8 67 The Core of a Set of LR Items • Definition: The core of a set of LR items is the set of first components – Without the lookahead terminals • Example: the core of { [X → α•β, b], [Y → γ•δ, d]} is {X → α•β, Y → γ•δ} Prof. Su ECS 142 Lectures 7-8 68 LALR States • Consider for example the LR(1) states {[X → α•, a], [Y → β•, c]} {[X → α•, b], [Y → β•, d]} • They have the same core and can be merged • And the merged state contains: {[X → α•, a/b], [Y → β•, c/d]} • These are called LALR(1) states – Stands for LookAhead LR – Typically 10 times fewer LALR(1) states than LR(1) Prof. Su ECS 142 Lectures 7-8 69 A LALR(1) DFA • Repeat until all states have distinct core – Choose two distinct states with same core – Merge the states by creating a new one with the union of all the items – Point edges from predecessors to new state – New state points to all the previous successors A ED CB F A BE D C F Conversion LR(1) to LALR(1). Example. int E → int on $, + E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( int E ) accept on $ int E → int on $, +, ) E → E + (E) on $, +, ) ( E int 0 1,5 2 3,8 4,9 6,107,11 + + ) E accept on $ Prof. Su ECS 142 Lectures 7-8 71 The LALR Parser Can Have Conflicts • Consider for example the LR(1) states {[X → α•, a], [Y → β•, b]} {[X → α•, b], [Y → β•, a]} • And the merged LALR(1) state {[X → α•, a/b], [Y → β•, a/b]} • Has a new reduce-reduce conflict • In practice such cases are rare • However, no new shift/reduce conflicts. Why? Prof. Su ECS 142 Lectures 7-8 72 LALR vs. LR Parsing • LALR languages are not natural – They are an efficiency hack on LR languages • Any reasonable programming language has a LALR(1) grammar • LALR(1) has become a standard for programming languages and for parser generators