Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

LR Parsing: Handling Left-Recursive Grammars and Shift-Reduce Conflicts, Assignments of Computer Science

An introduction to lr parsing, a bottom-up parsing technique used in computer science to analyze and understand the structure of input strings based on given grammars. The basics of lr parsing, including the use of lr parsers for handling left-recursive grammars and the concept of shift-reduce conflicts. It also discusses the notation and terminology used in lr parsing, such as xn, shift, reduce, and the dfa table. Several examples to illustrate the concepts presented.

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-bl7
koofers-user-bl7 🇺🇸

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download LR Parsing: Handling Left-Recursive Grammars and Shift-Reduce Conflicts and more Assignments Computer Science in PDF only on Docsity! 1 Prof. Su ECS 142 Lectures 7-8 1 Bottom Up Parsing ECS 142 Prof. Su ECS 142 Lectures 7-8 2 Reminders • PA2 due today by midnight • WA1 due today – In class – By 5pm in 142 homework box • PA3 has been assigned (due Oct 31) Prof. Su ECS 142 Lectures 7-8 3 Bottom-Up Parsing • Bottom-up parsing is more general than top- down parsing – And just as efficient – Builds on ideas in top-down parsing – Preferred method in practice • Also called LR parsing – L means that tokens are read left to right – R means that it constructs a rightmost derivation ! Prof. Su ECS 142 Lectures 7-8 4 An Introductory Example • LR parsers don’t need left-factored grammars and can also handle left-recursive grammars • Consider the following grammar: E → E + ( E ) | int – Why is this not LL(1)? • Consider the string: int + ( int ) + ( int ) Prof. Su ECS 142 Lectures 7-8 5 The Idea • LR parsing reduces a string to the start symbol by inverting productions: str ← input string of terminals repeat – Identify β in str such that A → β is a production (i.e., str = α β γ) – Replace β by A in str (i.e., str becomes α A γ) until str = S Prof. Su ECS 142 Lectures 7-8 6 A Bottom-up Parse in Detail (1) int++int int( ) int + (int) + (int) () 2 Prof. Su ECS 142 Lectures 7-8 7 A Bottom-up Parse in Detail (2) E int++int int( ) int + (int) + (int) E + (int) + (int) () Prof. Su ECS 142 Lectures 7-8 8 A Bottom-up Parse in Detail (3) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) () E Prof. Su ECS 142 Lectures 7-8 9 A Bottom-up Parse in Detail (4) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E () E Prof. Su ECS 142 Lectures 7-8 10 A Bottom-up Parse in Detail (5) E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E () EE Prof. Su ECS 142 Lectures 7-8 11 A Bottom-up Parse in Detail (6) E E int++int int( ) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E E () EEA rightmost derivation in reverse Prof. Su ECS 142 Lectures 7-8 12 Important Fact #1 Important Fact #1 about bottom-up parsing: An LR parser traces a rightmost derivation in reverse 5 Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E int++int int( ) E () E Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E int++int int( ) E () EE Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E + (E) I $ red. E → E + (E) E int++int int( ) E () EE Shift-Reduce Example I int + (int) + (int)$ shift int I + (int) + (int)$ red. E → int E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ red. E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ red. E → E + (E) E I + (int)$ shift 3 times E + (int I )$ red. E → int E + (E I )$ shift E + (E) I $ red. E → E + (E) E I $ accept E E int++int int( ) E () EE Prof. Su ECS 142 Lectures 7-8 29 The Stack • Left string can be implemented as a stack – Top of the stack is the I • Shift pushes a terminal on the stack • Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non- terminal on the stack (production lhs) Prof. Su ECS 142 Lectures 7-8 30 Key Issue: When to Shift or Reduce? • Decide based on the left string (the stack) • Idea: use a finite automaton (DFA) to decide when to shift or reduce – The DFA input is the stack – The language consists of terminals and non-terminals • We run the DFA on the stack and we examine the resulting state X and the token tok after I – If X has a transition labeled tok then shift – If X is labeled with “A → β on tok” then reduce 6 LR(1) Parsing. An Example int E → int on $, + accept on $ E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( I int + (int) + (int)$ shift int I + (int) + (int)$ E → int E I + (int) + (int)$ shift(x3) E + (int I ) + (int)$ E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ E → E+(E) E I + (int)$ shift (x3) E + (int I )$ E → int E + (E I )$ shift E + (E) I $ E → E+(E) E I $ accept int E ) Prof. Su ECS 142 Lectures 7-8 32 Representing the DFA • Parsers represent the DFA as a 2D table – Recall table-driven lexical analysis • Lines correspond to DFA states • Columns correspond to terminals and non- terminals • Typically columns are split into: – Those for terminals: action table – Those for non-terminals: goto table Prof. Su ECS 142 Lectures 7-8 33 Representing the DFA. Example • The table for a fragment of our DFA: … rE→ E+(E)rE→ E+(E)7 s7s86 rE→ intrE→ int5 g6s54 s43 … E$)(+int E → int on ), + E → E + (E) on $, + ( int 3 4 56 7 ) E Prof. Su ECS 142 Lectures 7-8 34 The LR Parsing Algorithm • After a shift or reduce action we rerun the DFA on the entire stack – This is wasteful, since most of the work is repeated • Remember for each stack element to which state it brings the DFA • LR parser maintains a stack 〈 sym1, state1 〉 . . . 〈 symn, staten 〉 statek is the final state of the DFA on sym1 … symk Prof. Su ECS 142 Lectures 7-8 35 The LR Parsing Algorithm Let I = w$ be initial input Let j = 0 Let DFA state 0 be the start state Let stack = 〈 dummy, 0 〉 repeat case action[top_state(stack), I[j]] of shift k: push 〈 I[j++], k 〉 reduce X → α: pop |α| pairs, push 〈X, Goto[top_state(stack), X]〉 accept: halt normally error: halt and report error Prof. Su ECS 142 Lectures 7-8 36 LR Parsing Notes • Can be used to parse more grammars than LL • Most programming languages grammars are LR • Can be described as a simple table • There are tools for building the table • How is the table constructed? 7 Prof. Su ECS 142 Lectures 7-8 37 Outline • Review of bottom-up parsing • Computing the parsing DFA • Using parser generators Prof. Su ECS 142 Lectures 7-8 38 Bottom-up Parsing (Review) • A bottom-up parser rewrites the input string to the start symbol • The state of the parser is described as α I γ – α is a stack of terminals and non-terminals – γ is the string of terminals not yet examined • Initially: I x1x2 . . . xn Prof. Su ECS 142 Lectures 7-8 39 The Shift and Reduce Actions (Review) • Recall the CFG: E → int | E + (E) • A bottom-up parser uses two kinds of actions: • Shift pushes a terminal from input on the stack E + (I int ) ⇒ E + (int I ) • Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non- terminal on the stack (production lhs) E + (E + ( E ) I ) ⇒ E +(E I ) Prof. Su ECS 142 Lectures 7-8 40 Key Issue: When to Shift or Reduce? • Idea: use a finite automaton (DFA) to decide when to shift or reduce – The input is the stack – The language consists of terminals and non-terminals • We run the DFA on the stack and we examine the resulting state X and the token tok after I – If X has a transition labeled tok then shift – If X is labeled with “A → β on tok” then reduce LR(1) Parsing. An Example int E → int on $, + accept on $ E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( I int + (int) + (int)$ shift int I + (int) + (int)$ E → int E I + (int) + (int)$ shift(x3) E + (int I ) + (int)$ E → int E + (E I ) + (int)$ shift E + (E) I + (int)$ E → E+(E) E I + (int)$ shift (x3) E + (int I )$ E → int E + (E I )$ shift E + (E) I $ E → E+(E) E I $ accept int E ) Prof. Su ECS 142 Lectures 7-8 42 End of review 10 Prof. Su ECS 142 Lectures 7-8 55 Shift/Reduce Conflicts • If a DFA state contains both [X → α•aβ, b] and [Y → γ•, a] • Then on input “a” we could either – Shift into state [X → αa•β, b], or – Reduce with Y → γ • This is called a shift-reduce conflict Prof. Su ECS 142 Lectures 7-8 56 Shift/Reduce Conflicts • Typically due to ambiguities in the grammar • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Will have DFA state containing [S → if E then S•, else] [S → if E then S• else S, x] • If else follows then we can shift or reduce • Default (bison, CUP, etc.) is to shift – Default behavior is as needed in this case Prof. Su ECS 142 Lectures 7-8 57 More Shift/Reduce Conflicts • Consider the ambiguous grammar E → E + E | E * E | int • We will have the states containing [E → E * • E, +] [E → E * E•, +] [E → • E + E, +] ⇒E [E → E • + E, +] … … • Again we have a shift/reduce on input + – We need to reduce (* binds more tightly than +) – Recall solution: declare the precedence of * and + Prof. Su ECS 142 Lectures 7-8 58 More Shift/Reduce Conflicts • In bison declare precedence and associativity: %left + %left * • Precedence of a rule = that of its last terminal – See bison manual for ways to override this default • Context-dependent precedence (Section 5.4, pp 70) • Resolve shift/reduce conflict with a shift if: – no precedence declared for either rule or terminal – input terminal has higher precedence than the rule – the precedences are the same and right associative Prof. Su ECS 142 Lectures 7-8 59 Using Precedence to Solve S/R Conflicts • Back to our example: [E → E * • E, +] [E →E * E•, +] [E → • E + E, +] ⇒E [E →E • + E, +] … … • Will choose reduce because precedence of rule E → E * E is higher than of terminal + Prof. Su ECS 142 Lectures 7-8 60 Using Precedence to Solve S/R Conflicts • Same grammar as before E → E + E | E * E | int • We will also have the states [E → E + • E, +] [E → E + E•, +] [E → • E + E, +] ⇒E [E → E • + E, +] … … • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative 11 Prof. Su ECS 142 Lectures 7-8 61 Using Precedence to Solve S/R Conflicts • Back to our dangling else example [S → if E then S•, else] [S → if E then S• else S, x] • Can eliminate conflict by declaring else with higher precedence than then – Or just rely on the default shift action • But this starts to look like “hacking the parser” • Best to avoid overuse of precedence declarations or you’ll end with unexpected parse trees Prof. Su ECS 142 Lectures 7-8 62 Reduce/Reduce Conflicts • If a DFA state contains both [X → α•, a] and [Y → β•, a] – Then on input “a” we don’t know which production to reduce • This is called a reduce/reduce conflict Prof. Su ECS 142 Lectures 7-8 63 Reduce/Reduce Conflicts • Usually due to gross ambiguity in the grammar • Example: a sequence of identifiers S → ε | id | id S • There are two parse trees for the string id S → id S → id S → id • How does this confuse the parser? Prof. Su ECS 142 Lectures 7-8 64 More on Reduce/Reduce Conflicts • Consider the states [S → id •, $] [S’ → • S, $] [S → id • S, $] [S → •, $] ⇒id [S → •, $] [S → • id, $] [S → • id, $] [S → • id S, $] [S → • id S, $] • Reduce/reduce conflict on input $ S’ → S → id S’ → S → id S → id • Better rewrite the grammar: S → ε | id S Prof. Su ECS 142 Lectures 7-8 65 Using Parser Generators • Parser generators construct the parsing DFA given a CFG – Use precedence declarations and default conventions to resolve conflicts – The parser algorithm is the same for all grammars (and is provided as a library function) • But most parser generators do not construct the DFA as described before – Because the LR(1) parsing DFA has 1000s of states even for a simple language Prof. Su ECS 142 Lectures 7-8 66 LR(1) Parsing Tables are Big • But many states are similar, e.g. and • Idea: merge the DFA states whose items differ only in the lookahead tokens – We say that such states have the same core • We obtain E → int on $, +E → int•, $/+ E → int•, )/+ E → int on ), + 51 E → int on $, +, )E → int•, $/+/) 1’ 12 Prof. Su ECS 142 Lectures 7-8 67 The Core of a Set of LR Items • Definition: The core of a set of LR items is the set of first components – Without the lookahead terminals • Example: the core of { [X → α•β, b], [Y → γ•δ, d]} is {X → α•β, Y → γ•δ} Prof. Su ECS 142 Lectures 7-8 68 LALR States • Consider for example the LR(1) states {[X → α•, a], [Y → β•, c]} {[X → α•, b], [Y → β•, d]} • They have the same core and can be merged • And the merged state contains: {[X → α•, a/b], [Y → β•, c/d]} • These are called LALR(1) states – Stands for LookAhead LR – Typically 10 times fewer LALR(1) states than LR(1) Prof. Su ECS 142 Lectures 7-8 69 A LALR(1) DFA • Repeat until all states have distinct core – Choose two distinct states with same core – Merge the states by creating a new one with the union of all the items – Point edges from predecessors to new state – New state points to all the previous successors A ED CB F A BE D C F Conversion LR(1) to LALR(1). Example. int E → int on $, + E → int on ), + E → E + (E) on $, + E → E + (E) on ), + (+ E int 10 9 11 0 1 2 3 4 56 8 7 + E + ) ( int E ) accept on $ int E → int on $, +, ) E → E + (E) on $, +, ) ( E int 0 1,5 2 3,8 4,9 6,107,11 + + ) E accept on $ Prof. Su ECS 142 Lectures 7-8 71 The LALR Parser Can Have Conflicts • Consider for example the LR(1) states {[X → α•, a], [Y → β•, b]} {[X → α•, b], [Y → β•, a]} • And the merged LALR(1) state {[X → α•, a/b], [Y → β•, a/b]} • Has a new reduce-reduce conflict • In practice such cases are rare • However, no new shift/reduce conflicts. Why? Prof. Su ECS 142 Lectures 7-8 72 LALR vs. LR Parsing • LALR languages are not natural – They are an efficiency hack on LR languages • Any reasonable programming language has a LALR(1) grammar • LALR(1) has become a standard for programming languages and for parser generators
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved