Download Top-down Parsing with LL(k) Grammars: Deterministic Parsing Algorithms and LL(k) Parsing - and more Study notes Discrete Structures and Graph Theory in PDF only on Docsity! COT 4210 Top-down Parsing with LL(k) Grammars Fall 2001 Deterministic Top-down Parsing A top-down parsing algorithm is one that attempts to simulate a left-most derivation of the input. Algorithms of this kind use a stack to hold the unexpanded sentential form produced at each step of the parse. At the beginning of a parse, the stack is initialized with the start symbol of the underlying grammar. Parsing proceeds using two pieces of information to make a decision as to what action the parser should take. The lookahead is a string composed of the next k tokens in the input stream, where k 1 is a bound on the length of the lookahead string; a token is essentially a terminal symbol of the underlying grammar, but more precisely it is a lexical category such as identifier or reserved word, real literal, etc. In addition to the lookahead, the parser uses information in the parse stack, usually only the top symbol of the stack is needed. Figure 1 illustrates a conceptual model of top- down parsing. Figure 1. Conceptual Model of Top-down Parsing Algorithms Top-down parsing algorithms may use a technique known as back-tracking when the parser reaches a configuration where the parse cannot complete successfully. Recursive descent is an example of such an algorithm. The idea is that the parser “backs up” or “back tracks” to a previous parsing configuration where an alternative action is possible. It then takes one of the alternative actions and continues. Back tracking may occur many times before the parser exhausts all alternatives or eventually succeeds in finding a parse. Top-down algorithms that use backtracking are very general and can be used with a very large class of Context-free grammars. Almost all Context-free Languages can be parsed with this approach. The central problem with backtracking algorithms is that they are slow and impractical for implementing production-quality compilers for real programming languages. To avoid backtracking, one must design the grammar to satisfy certain constraints so that the information the parser has in any configuration will lead to only one possible action – that is, there is never a valid alternative at any step. Because no alternative actions exist in any configuration, no backtracking is possible nor needed. Consequently, a decision will ultimately succeed or fail absolutely. 11/29/2020 Page 1 July 20, 2000 COT 4210 Top-down Parsing with LL(k) Grammars Fall 2001 LL(k) Parsing One class of top-down parsing algorithms (and grammars) that do not use backtracking is LL(k) algorithms. The “LL” denotes the direction of input scan (Left-to-Right) and the type of derivation produced (Leftmost). The “k” is a positive integer that bounds the length of the lookahead string. The parsing algorithm based on LL(k) grammars is given below. By increasing the value of k, the family of LL(k) languages that can be parsed deterministically top-down increases; that is, LL(1) is properly included in LL(2), etc. The downside is that as k increases, the size of the parser must also increase and the space required grows exponentially with k. Algorithm Given: An LL(k) grammar G = (N, , P, S) for some k > 0. A parse stack holding some string over the vocabulary of G and initialized with the start symbol, S. For each X P, let Pref(X, k) = { y *kk | Xw w + yw, for some leftmost derivation, , and some string w Follow(X,k). } 1. Compute the value of the lookahead, Look. 2. If stack = and Look = then “Accept” and halt! 3. If stack = then report “Syntax Error” and halt! 4. If X = Top(stack) N then: if Look Pref(X,k) for some X P, then replace X by in the stack and goto 2; else report “Syntax Error” and halt! 5. If a = Top(stack) then: if Look[1] = “a”, Pop(stack), Advance(Input), and goto 1; else report “Syntax Error” and halt! Example (LL(1) Grammar for Arithmetic Expressions). Construct a deterministic parser for the language L(G) defined by the CFG, G, given below. By deterministic parser we mean any DPDA that will accept L(G) by empty stack. G = (N, , P, S), where = {n, v, ~, +, -, *k, /, (, ), EOF }, EOF denotes the end-of-file mark, the operator “~” denotes negation, N = {S, S’, X, X’, Y, Z}, and P = { 1: S XS’ lookahead { n, v, ( } 2: S’ +S lookahead { + } 3: S’ -S lookahead { - } 4: S’ lookahead { ), EOF } 5: X YX’ lookahead { n, v, ( } 6: X’ *kX lookahead { *k } 7: X’ /X lookahead { / } 8: X’ lookahead {+ , - , ) , EOF } 9: Y ~Z lookahead { ~ } 10: Y Z lookahead {n, v, ( } 11: Z n lookahead { n } 12: Z v lookahead { v } 13: Z (S) lookahead { ( } } We now illustrate the LL(1) algorithm for input “((n))”. The reader should parse, “n*k~v+n”, as an exercise. A configuration of the parse is a triple [w, u, ] where w is the remaining input, u = Prefk(w) is the lookahead, and is the stack contents with Top() being its leftmost symbol. [ ((n)), (, S] 1 [ ((n)), (, XS’] 5 [ ((n)), (, YX’S’] 10 [ ((n)), (, ZX’S’] 13 [ ((n)), (, (S)X’S’] pop/rd [ (n)), (, S)X’S’] 1 [ (n)), (, XS’)X’S’] 5 [ (n)), (, YX’S’)X’S’] 10 11/29/2020 Page 2 July 20, 2000