Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

SLR(1) Parsing: An Improvement on LR(0) for Parsing Grammars, Assignments of Computer Science

Slr(1) parsing is an extension of lr(0) parsing that allows for a larger class of grammars to be parsed. In slr(1), the parser reduces only if the next input token is a member of the follow set of the non-terminal being reduced. The concept of slr(1) parsing, its improvements over lr(0), and how to construct lr(1) parsing tables. It also includes an example of parsing the expression id = id using slr(1) and lr(1) parsing tables.

Typology: Assignments

Pre 2010

Uploaded on 08/09/2009

koofers-user-lz9
koofers-user-lz9 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download SLR(1) Parsing: An Improvement on LR(0) for Parsing Grammars and more Assignments Computer Science in PDF only on Docsity! CS143 Handout 12 Autumn 2007 October 15, 2007 SLR-SR Parsing Handout written by Maggie Johnson and revised by Julie Zelenski. LR(0) Isn’t Good Enough LR(0) is the simplest technique in the LR family. Although that makes it the easiest to learn, these parsers are too weak to be of practical use for anything but a very limited set of grammars. The examples given at the end of the LR(0) handout show how even small additions to an LR(0) grammar can introduce conflicts that make it no longer LR(0). The fundamental limitation of LR(0) is the zero, meaning no lookahead tokens are used. It is a stifling constraint to have to make decisions using only what has already been read, without even glancing at what comes next in the input. If we could peek at the next token and use that as part of the decision-making, we will find that it allows for a much larger class of grammars to be parsed. SLR(1) We will first consider SLR(1) where the S stands for simple . SLR(1) parsers use the same LR(0) configurating sets and have the same table structure and parser operation, so everything you've already learned about LR(0) applies here. The difference comes in assigning table actions, where we are going to use one token of lookahead to help arbitrate among the conflicts. If we think back to the kind of conflicts we encountered in LR(0) parsing, it was the reduce actions that cause us grief. A state in an LR(0) parser can have at most one reduce action and cannot have both shift and reduce instructions. Since a reduce is indicated for any completed item, this dictates that each completed item must be in a state by itself. But let's revisit the assumption that if the item is complete, the parser must choose to reduce. Is that always appropriate? If we peeked at the next upcoming token, it may tell us something that invalidates that reduction. If the sequence on top of the stack could be reduced to the non-terminal A, what tokens do we expect to find as the next input? What tokens would tell us that the reduction is not appropriate? Perhaps Follow(A) could be useful here! The simple improvement that SLR(1) makes on the basic LR(0) parser is to reduce only if the next input token is a member of the follow set of the non-terminal being reduced. When filling in the table, we don't assume a reduce on all inputs as we did in LR(0), we selectively choose the reduction only when the next input symbols in a member of the follow set. To be more precise, here is the algorithm for SLR(1) table construction (note all steps are the same as for LR(0) table construction except for 2a) 1. Construct F = {I0, I1, ... In}, the collection of LR(0) configurating sets for G'. 2. State i is determined from Ii . The parsing actions for the state are determined as follows: 2 a) If A –> u• is in Ii then set Action[i,a] to reduce A –> u for all a in Follow(A) (A is not S'). b) If S' –> S• is in Ii then set Action[i,$] to accept. c) If A –> u•av is in Ii and successor(Ii, a) = Ij, then set Action[i,a] to shift j (a must be a terminal). 3. The goto transitions for state i are constructed for all non-terminals A using the rule: If successor(Ii , A) = Ij, then Goto [i, A] = j. 4. All entries not defined by rules 2 and 3 are errors. 5. The initial state is the one constructed from the configurating set containing S' –> •S. In the SLR(1) parser, it is allowable for there to be both shift and reduce items in the same state as well as multiple reduce items. The SLR(1) parser will be able to determine which action to take as long as the follow sets are disjoint. Let's consider those changes at the end of the LR(0) handout to the simplified expression grammar that would have made it no longer LR(0). Here is the version with the addition of array access: E' –> E E –> E + T | T T –> (E) | id | id[E] Here are the first two LR(0) configurating sets entered if id is the first token of the input. In an LR(0) parser, the set on the right has a shift-reduce conflict. However, an SLR(1) will compute Follow(T) = { + ) ] $ } and only enter the reduce action on those tokens. The input [ will shift and there is no conflict. Thus this grammar is SLR(1) even though it is not LR(0). Similarly, the simplified expression grammar with the assignment addition: E' –> E E –> E + T | T | V = E T –> (E) | id V –> id E' -> •E E -> •E + T E -> •T T -> •(E) T -> •id T -> •id[E] T -> id• T -> id•[E]id 5 already seen a * or an =. Just using the entire follow set is not discriminating enough as the guide for when to reduce. The follow set contains symbols that can follow R in any position within a valid sentence but it does not precisely indicate which symbols follow R at this particular point in a derivation. So we will augment our states to include information about what portion of the follow set is appropriate given the path we have taken to that state. We can be in state 2 for one of two reasons, we are trying to build from S –> L = R or from S –> R –> L. If the upcoming symbol is =, then that rules out the second choice and we must be building the first, which tells us to shift. The reduction should only be applied if the next input symbol is $. Even though = is Follow(R) because of the other contexts that an R can appear, in this particular situation, it is not appropriate because when deriving a sentence S –> R –> L, = cannot follow R. Constructing LR(1) parsing tables LR or canonical LR parsing incorporates the required extra information into the state by redefining configurations to include a terminal symbol as an added component. LR(1) configurations have the general form: A –> X1...Xi • Xi+1...Xj , a This means we have states corresponding to X1...Xi on the stack and we are looking to put states corresponding to Xi+1...Xj on the stack and then reduce, but only if the token following Xj is the terminal a. a is called the lookahead of the configuration. The lookahead only comes into play with LR(1) configurations with a dot at the right end: A –> X1…Xj •, a This means we have states corresponding to X1...Xj on the stack but we may only reduce when the next symbol is a. The symbol a is either a terminal or $ (end of input marker). With SLR(1) parsing, we would reduce if the next token was any of those in Follow(A). With LR(1) parsing, we reduce only if the next token is exactly a. We may have more than one symbol in the lookahead for the configuration, as a convenience, we list those symbols separated by a forward slash. Thus, the configuration A –> u•, a/b/c says that it is valid to reduce u to A only if the next token is equal to a, b, or c. The configuration lookahead will always be a subset of Follow(A). Recall the definition of a viable prefix from the previous handout. Viable prefixes are those prefixes of right sentential forms that can appear on the stack of a shift-reduce parser. Formally we say that a configuration [A –> u•v , a] is valid for a viable prefix α if there is a rightmost derivation S =>* βAw =>* βuvw where α = βu and either a is the first symbol of w or w is ∂ and a is $. For example: 6 S –> ZZ Z –> xZ | y There is a rightmost derivation S =>* xxZxy => xxxZxy. We see that configuration [Z –> x•Z, x] is valid for viable prefix α = xxx by letting β = xx, A = Z, w = xy, u = x and v = Z. Another example is from the rightmost derivation S =>* ZxZ => ZxxZ, making [Z –> x•Z, $] valid for viable prefix Zxx. Often we have a number of LR(1) configurations that differ only in their lookahead components. The addition of a lookahead component to LR(1) configurations allows us to make parsing decisions beyond the capability of SLR(1) parsers. There is, however, a big price to be paid. There will be more distinct configurations and thus many more possible configurating sets. This increases the size of the goto and action tables considerably. In the past when memory was smaller, it was difficult to find storage- efficient ways of representing these tables, but now this is not as much of an issue. Still, it’s a big job building LR tables for any substantial grammar by hand. The method for constructing the configurating sets of LR(1) configurations is essentially the same as for SLR, but there are some changes in the closure and successor operations because we must respect the configuration lookahead. To compute the closure of an LR(1) configurating set I: Repeat the following until no more configurations can be added to state I: — For each configuration [A –> u•Bv , a] in I, for each production B –> w in G', and for each terminal b in First(va) such that [B –> •w , b] is not in I: add [B –> •w , b] to I. What does this mean? We have a configuration with the dot before the non-terminal B. In LR(0), we computed the closure by adding all B productions with no indication of what was expected to follow them. In LR(1), we are a little more precise— we add each B production but insist that each have a lookahead of va. The lookahead will be First(v a) since this is what follows B in this production. Remember that we can compute first sets not just for a single non-terminal, but also a sequence of terminal and non-terminals. First(va) includes the first set of the first symbol of v and then if that symbol is nullable, we include the first set of the following symbol, and so on. If the entire sequence v is nullable, we add the lookahead a already required by this configuration. The successor function for the configurating set I and symbol X is computed as this: Let J be the configurating set [A –> uX•v , a] such that [A –> u• Xv , a] is in I. successor(I,X) is the closure of configurating set J. We take each production in a configurating set, move the dot over a symbol and close on the resulting production. This is basically the same successor function as defined for LR(0), but we have to propagate the lookahead when computing the transitions. 7 We construct the complete family of all configurating sets F just as we did before. F is initialized to the set with the closure of [S' –> S, $]. For each configurating set I and each grammar symbol X such that successor(I,X) is not empty and not in F, add successor (I,X) to F until no other configurating set can be added to F. Let’s consider an example. The augmented grammar below that recognizes the regular language a*ba*b (this example from pp. 231-236 Aho/Sethi/Ullman). 0) S' –> S 1) S –> XX 2) X –> aX 3) X –> b Here is the family of LR configuration sets: I0: S' –> •S, $ S –> •XX, $ X –> •aX, a/b X –> •b, a/b I1: S' –> S•, $ I2: S –> X•X, $ X –> •aX, $ X –> •b, $ I3: X –> a•X, a/b X –> •aX, a/b X –> •b, a/b I4: X –> b•, a/b I5: S –> XX•, $ I6: X –> a•X, $ X –> •aX, $ X –> •b, $ I7: X –> b•, $ I8: X –> aX•, a/b I9: X –> aX•, $ The above grammar would only have seven SLR states, but has ten in canonical LR. We end up with additional states because we have split states that have different lookaheads. For example, states 3 and 6 are the same except for lookahead, state 3 corresponds to the context where we are in the middle of parsing the first X, state 6 is the second X. Similarly, states 4 and 7 are completing the first and second X respectively. In SLR, those states are not distinguished, and if we were attempting to parse a single b by itself, we would allow that to be reduced to X, even though this will not lead to a valid sentence. The SLR parser will eventually notice the syntax error, too, but the LR parser figures it out a bit sooner. To fill in the entries in the action and goto tables, we use a similar algorithm as we did for SLR(1), but instead of assigning reduce actions using the follow set, we use the specific lookaheads. Here are the steps to build an LR(1) parse table:
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved