Download Context-Free Grammars and Pushdown Automata in CMSC 330 - Prof. Atif M. Memon and more Study notes Programming Languages in PDF only on Docsity! 1 CMSC 330: Organization of Programming Languages Context-Free Grammars: Pushdown Automaton CMSC 330 2 Reminders • Project 2 Due Oct. 12 CMSC 330 3 Regular expressions and CFGs • Programming languages are not regular – Matching (an arbitrary number of) brackets so that they are balanced • Usually almost context-free, with some hacks MachineDescription pushdown automata (PDAs) context-free grammars context-free languages DFAs, NFAsregular expressions regular languages CMSC 330 4 Equivalence of DFA and regular grammars CMSC 330 5 Pushdown Automaton (PDA) • A pushdown automaton (PDA) is an abstract machine similar to the DFA – Has a finite set of states – Also has a pushdown stack • Moves of the PDA are as follows: – An input symbol is read and the top symbol on the stack is read – Based on both inputs, the machine • Enters a new state, and • Writes zero or more symbols onto the pushdown stack • Or pops zero or more symbols from the stack – String accepted if the stack is empty AND the string has ended CMSC 330 6 Power of PDAs • PDAs are more powerful than DFAs – anbn, which cannot be recognized by a DFA, can easily be recognized by the PDA • Stack all a symbols and, for each b, pop an a off the stack. • If the end of input is reached at the same time that the stack becomes empty, the string is accepted 2 CMSC 330 7 Context-free Grammars in Practice • Regular expressions are used to turn raw text into a string of tokens – E.g., “if”, “then”, “identifier”, etc. – Whitespace and comments are simply skipped – These tokens are the input for the next phase of compilation – Standard tools used include lex and flex • Many others for Java • CFGs are used to turn tokens into parse trees – This process is called parsing – Standard tools used include yacc and bison • Those trees are then analyzed by the compiler, which eventually produces object code CMSC 330 8 Parsing • There are many efficient techniques for turning strings into parse trees – They all have strange names, like LL(k), SLR(k), LR(k)... – Take CMSC 430 for more details • We will look at one very simple technique: recursive descent parsing – This is a “top-down” parsing algorithm because we’re going to begin at the start symbol and try to produce the string CMSC 330 9 Example E id = n | { L } L E ; L | – Here n is an integer and id is an identifier • One input might be – { x = 3; { y = 4; }; } – This would get turned into a list of tokens { x = 3 ; { y = 4 ; } ; } – And we want to turn it into a parse tree CMSC 330 10 Example (cont’d) E id = n | { L } L E ; L | { x = 3; { y = 4; }; } E { L } E ; L x = 3 E ; L { L } E ; L y = 4 CMSC 330 11 Parsing Algorithm • Goal: determine if we can produce a string to be parsed from the grammar's start symbol • At each step, we'll keep track of two facts – What tree node are we trying to match? – What is the next token (lookahead) of the input string? • There are three cases: – If we’re trying to match a terminal and the next token (lookahead) is that token, then succeed, advance the lookahead, and continue – If we’re trying to match a nonterminal then pick which production to apply based on the lookahead – Otherwise, fail with a parsing error CMSC 330 12 Example (cont’d) E id = n | { L } L E ; L | { x = 3 ; { y = 4 ; } ; } E { L } E ; L x = 3 E ; L { L } E ; L y = 4 lookahead 5 CMSC 330 25 What’s Wrong with Parse Trees? • Parse trees contain too much information – E.g., they have parentheses and they have extra nonterminals for precedence – This extra stuff is needed for parsing • But when we want to reason about languages, it gets in the way (it’s too much detail) CMSC 330 26 Abstract Syntax Trees (ASTs) • An abstract syntax tree is a more compact, abstract representation of a parse tree, with only the essential parts parse tree AST CMSC 330 27 ASTs (cont’d) • Intuitively, ASTs correspond to the data structure you’d use to represent strings in the language – Note that grammars describe trees (so do OCaml datatypes which we’ll see later) – E a | b | c | E+E | E-E | E*E | (E) CMSC 330 28 The Compilation Process CMSC 330 29 Producing an AST • To produce an AST, we modify the parse() functions to construct the AST along the way CMSC 330 30 Producing an AST (cont’d) type ast = Assn of string * int | Block of ast list let rec parse_E () = if lookahead = 'id' then let id = parse_term 'id' in let _ = parse_term '=' in let n = parse_term 'n' in Assn(id, int_of_string n) else if lookahead = '{' then begin let _ = parse_term '{' in let l = parse_L () in let _ = parse_term '}' in Block l end else raise <Parse error>; 6 CMSC 330 31 Producing an AST (cont’d) type ast = Assn of string * int | Block of ast list and parse_L () = if lookahead = 'id' then let e = parse_E () in let _ = parse_term ';' in let l = parse_L () in e::l else []