Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Exploring Programming Language Design: A Mini-Language Approach, Study notes of Programming Languages

An introduction to programming language design, focusing on the elements of syntax, semantics, and pragmatics using a mini-language called PostFix. PostFix is a simple stack-based language inspired by PostScript, Forth, and HP calculators. the syntax and semantics of PostFix commands, errors, and program arguments, as well as the utility of natural language descriptions for learning programming languages.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

bairloy
bairloy 🇺🇸

4.2

(5)

22 documents

1 / 17

Toggle sidebar

Related documents


Partial preview of the text

Download Exploring Programming Language Design: A Mini-Language Approach and more Study notes Programming Languages in PDF only on Docsity! Design Concepts in Programming Languages Chapter 1: Introduction Franklyn Turbak and David Gi!ord with Mark A. Sheldon The MIT Press Cambridge, Massachusetts London, England 1.2 Syntax, Semantics, and Pragmatics 5 or as a graphical tree: + * v w / y z Although these concrete notations are superficially di"erent, they all designate the same abstract phrase structure (the sum of a product and a quotient). The syntax of a programming language specifies which concrete notations (strings of characters, lines on a page) in the language are legal and which tree-shaped abstract phrase structure is denoted by each legal notation. Semantics Semantics specifies the mapping between the structure of a programming lan- guage phrase and what the phrase means. Such phrases have no inherent mean- ing: their meaning is determined only in the context of a system for interpreting their structure. For example, consider the following expression tree: * + 1 11 10 Suppose we interpret the nodes labeled 1, 10, and 11 as the usual decimal notation for numbers, and the nodes labeled + and * as the sum and product of the values of their subnodes. Then the root of the tree stands for (1+11) ·10 = 120. But there are many other possible meanings for this tree. If * stands for exponentiation rather than multiplication, the meaning of the tree could be 1210. If the numerals are in binary notation rather than decimal notation, the tree could stand for (in decimal notation) (1 + 3) · 2 = 8. Alternatively, suppose that odd integers stand for the truth value true, even integers stand for the truth value false, and + and * stand for, respectively, the logical disjunction (") and conjunction (#) operators on truth values; then the meaning of the tree is false. Perhaps the tree does not indicate an evaluation at all, and only stands for a property intrinsic to the tree, such as its height (3), its number of nodes (5), or its shape (perhaps it describes a simple corporate hierarchy). Or maybe the tree is an arbitrary encoding for a particular object of interest, such as a person or a book. 6 Chapter 1 Introduction This example illustrates how a single program phrase can have many possible meanings. Semantics describes the relationship between the abstract structure of a phrase and its meaning. Pragmatics Whereas semantics deals with what a phrase means, pragmatics focuses on the details of how that meaning is computed. Of particular interest is the e"ective use of various resources, such as time, space, and access to shared physical devices (storage devices, network connections, video monitors, printers, speakers, etc.). As a simple example of pragmatics, consider the evaluation of the following expression tree (under the first semantic interpretation described above): / - + a b * 2 3 + a b Suppose that a and b stand for particular numeric values. Because the phrase (+ a b) appears twice, a naive evaluation strategy will compute the same sum twice. An alternative strategy is to compute the sum once, save the result, and use the saved result the next time the phrase is encountered. The alternative strategy does not change the meaning of the program, but does change its use of resources; it reduces the number of additions performed, but may require extra storage for the saved result. Is the alternative strategy better? The answer depends on the details of the evaluation model and the relative importance of time and space. Another potential improvement in the example involves the phrase (* 2 3), which always stands for the number 6. If the sample expression is to be evalu- ated many times (for di"erent values of a and b), it may be worthwhile to replace (* 2 3) by 6 to avoid unnecessary multiplications. Again, this is a purely prag- matic concern that does not change the meaning of the expression. 1.3 Goals The goals of this book are to explore the semantics of a comprehensive set of pro- gramming language design idioms, show how they can be combined into complete 1.3 Goals 7 practical programming languages, and discuss the interplay between semantics and pragmatics. Because syntactic issues are so well covered in standard compiler texts, we won’t say much about syntax except for establishing a few syntactic conventions at the outset. We will introduce a number of tools for describing the semantics of programming languages, and will use these tools to build intuitions about programming language features and study many of the dimensions along which languages can vary. Our coverage of pragmatics is mainly at a high level. We will study some simple programming language implementation techniques and program improvement strategies rather than focus on squeezing the last ounce of performance out of a particular computer architecture. We will discuss programming language features in the context of several mini- languages. Each of these is a simple programming language that captures the essential features of a class of existing programming languages. In many cases, the mini-languages are so pared down that they are hardly suitable for serious programming activities. Nevertheless, these languages embody all of the key ideas in programming languages. Their simplicity saves us from getting bogged down in needless complexity in our explorations of semantics and pragmatics. And like good modular building blocks, the components of the mini-languages are designed to be “snapped together” to create practical languages. Issues of semantics and pragmatics are important for reasoning about proper- ties of programming languages and about particular programs in these languages. We will also discuss them in the context of two fundamental strategies for pro- gramming language implementation: interpretation and translation. In the interpretation approach, a program written in a source language S is directly executed by an S-interpreter, which is a program written in an implementa- tion language. In the translation approach, an S program is translated to a program in the target language T , which can be executed by a T -interpreter. The translation itself is performed by a translator program written in an im- plementation language. A translator is also called a compiler, especially when it translates from a high-level language to a low-level one. We will use mini- languages for our source and target languages. For our implementation lan- guage, we will use the mathematical metalanguage described in Appendix A. However, we strongly encourage readers to build working interpreters and trans- lators for the mini-languages in their favorite real-world programming languages. Metaprogramming — writing programs that manipulate other programs — is perhaps the most exciting form of programming! 10 Chapter 1 Introduction N : Push the numeral N onto the stack. sub: Call the top stack value v1 and the next-to-top stack value v2 . Pop these two values o" the stack and push the result of v2 $ v1 onto the stack. If there are fewer than two values on the stack or the top two values aren’t both numerals, signal an error. The other binary arithmetic operators — add (addition), mul (multiplication), div (integer divisiona), and rem (remainder of integer division) — behave similarly. Both div and rem signal an error if v1 is zero. lt: Call the top stack value v1 and the next-to-top stack value v2 . Pop these two values o" the stack. If v2 < v1 , then push a 1 (a true value) on the stack, otherwise push a 0 (false). The other binary comparison operators — eq (equals) and gt (greater than) — behave similarly. If there are fewer than two values on the stack or the top two values aren’t both numerals, signal an error. pop: Pop the top element o" the stack and discard it. Signal an error if the stack is empty. swap: Swap the top two elements of the stack. Signal an error if the stack has fewer than two values. sel: Call the top three stack values (from top down) v1 , v2 , and v3 . Pop these three values o" the stack. If v3 is the numeral 0, push v1 onto the stack; if v3 is a nonzero numeral, push v2 onto the stack. Signal an error if the stack does not contain three values, or if v3 is not a numeral. nget: Call the top stack value vindex and the remaining stack values (from top down) v1 , v2 , . . ., vn . Pop vindex o" the stack. If vindex is a numeral i such that 1 & i & n and vi is a numeral, push vi onto the stack. Signal an error if the stack does not contain at least one value, if vindex is not a numeral, if i is not in the range [1..n], or if vi is not a numeral. (C1 . . . Cn): Push the executable sequence (C1 . . . Cn) as a single value onto the stack. Executable sequences are used in conjunction with exec. exec: Pop the executable sequence from the top of the stack, and prepend its component commands onto the sequence of currently executing commands. Signal an error if the stack is empty or the top stack value isn’t an executable sequence. aThe integer division of n and d returns the integer quotient q such that n = qd + r, where r (the remainder) is such that 0 ! r < |d| if n " 0 and #|d| < r ! 0 if n < 0. Figure 1.1 English semantics of PostFix commands. (postfix 2) $[3,4]$$% 3 {Initial stack has 3 on top with 4 below.} (postfix 2 swap) $[3,4]$$% 4 (postfix 3 pop swap) $[3,4,5]$$$% 5 1.4.2 Semantics 11 It is an error if the actual number of arguments does not match the number of parameters specified in the program. (postfix 2 swap) $[3]$% error {Wrong number of arguments.} (postfix 1 pop) $[4,5]$$% error {Wrong number of arguments.} Note that program arguments must be integers — they cannot be executable sequences. Numerical operations are expressed in postfix notation, in which each operator comes after the commands that compute its operands. add, sub, mul, and div are binary integer operators. lt, eq, and gt are binary integer predicates returning either 1 (true) or 0 (false). (postfix 1 4 sub) $[3]$% -1 (postfix 1 4 add 5 mul 6 sub 7 div) $[3]$% 4 (postfix 5 add mul sub swap div) $[7,6,5,4,3]$$$$$% -20 (postfix 3 4000 swap pop add) $[300,20,1]$$$$$% 4020 (postfix 2 add 2 div) $[3,7]$$% 5 {An averaging program.} (postfix 1 3 div) $[17]$% 5 (postfix 1 3 rem) $[17]$% 2 (postfix 1 4 lt) $[3]$% 1 (postfix 1 4 lt) $[5]$% 0 (postfix 1 4 lt 10 add) $[3]$% 11 (postfix 1 4 mul add) $[3]$% error {Not enough numbers to add.} (postfix 2 4 sub div) $[4,5]$$% error {Divide by zero.} In all the above examples, each stack value is used at most once. Sometimes it is desirable to use a number two or more times or to access a number that is not near the top of the stack. The nget command is useful in these situations; it puts at the top of the stack a copy of a number located on the stack at a specified index. The index is 1-based, from the top of the stack down, not counting the index value itself. (postfix 2 1 nget) $[4,5]$$% 4 {4 is at index 1, 5 at index 2.} (postfix 2 2 nget) $[4,5]$$% 5 It is an error to use an index that is out of bounds or to access a nonnumeric stack value (i.e., an executable sequence) with nget. (postfix 2 3 nget) $[4,5]$$% error {Index 3 is too large.} (postfix 2 0 nget) $[4,5]$$% error {Index 0 is too small.} (postfix 1 (2 mul) 1 nget) $[3]$% error {Value at index 1 is not a number but an executable sequence.} 12 Chapter 1 Introduction The nget command is particularly useful for numerical programs, where it is common to reference arbitrary parameter values and use them multiple times. (postfix 1 1 nget mul) $[5]$% 25 {A squaring program.} (postfix 4 4 nget 5 nget mul mul swap 4 nget mul add add) $[3,4,5,2]$$$$% 25 {Given a, b, c, x, calculates ax2 + bx + c.} As illustrated in the last example, the index of a given value increases every time a new value is pushed onto the stack. The final stack in this example contains (from top down) 25 and 2, showing that the program may end with more than one value on the stack. Executable sequences are compound commands like (2 mul) that are pushed onto the stack as a single value. They can be executed later by the exec command. Executable sequences act like subroutines in other languages; execution of an executable sequence is similar to a subroutine call, except that transmission of arguments and results is accomplished via the stack. (postfix 1 (2 mul) exec) $[7]$% 14 {(2 mul) is a doubling subroutine.} (postfix 0 (0 swap sub) 7 swap exec) $[ ]% -7 {(0 swap sub) is a negation subroutine.} (postfix 0 (2 mul)) $[ ]% error {Final top of stack is not an integer.} (postfix 0 3 (2 mul) gt) $[ ]% error {Executable sequence where number expected.} (postfix 0 3 exec) $[ ]% error {Number where executable sequence expected.} (postfix 0 (7 swap exec) (0 swap sub) swap exec) $[ ]% -7 (postfix 2 (mul sub) (1 nget mul) 4 nget swap exec swap exec) $[!10,2]$$$% 42 {Given a and b, calculates b $ a·b2 .} The last two examples illustrate that evaluations involving executable sequences can be rather contorted. The sel command selects between two values based on a test value, where zero is treated as false and any nonzero integer is treated as true. It can be used in conjunction with exec to conditionally execute one of two executable sequences. (postfix 1 2 3 sel) $[1]$% 2 (postfix 1 2 3 sel) $[0]$% 3 (postfix 1 2 3 sel) $[17]$% 2 {Any nonzero number is “true.”} (postfix 0 (2 mul) 3 4 sel) $[ ]% error {Test not a number.} (postfix 4 lt (add) (mul) sel exec) $[3,4,5,6]$$$$% 30 (postfix 4 lt (add) (mul) sel exec) $[4,3,5,6]$$$$% 11 (postfix 1 1 nget 0 lt (0 swap sub) () sel exec) $[!7]$$% 7 {An absolute value program.} (postfix 1 1 nget 0 lt (0 swap sub) () sel exec) $[6]$% 6 1.5 Overview of the Book 15 teams might resolve the ambiguity in incompatible ways. What’s needed in this case is an unambiguous specification of the language as well as a means of proving that an implementation meets that specification. The problem with informal descriptions of a programming language is that they’re neither concise nor precise enough for these kinds of situations. English is often verbose, and even relatively simple ideas can be unduly complicated to explain. Moreover, it’s easy for the writer of an informal specification to underspecify a language by forgetting to cover all the special cases (e.g., error situations in PostFix). It isn’t that covering all the special cases is impossible; it’s just that the natural-language framework doesn’t help much in pointing out what the special cases are. It is possible to overspecify a language in English as well. Consider the Post- Fix programming model introduced above. The current state of a program is captured in two entities: the stack and the current command sequence. To pro- grammers and implementers alike, this might imply that a language implemen- tation must have explicit stack and command sequence elements in it. Although these would indeed appear in a straightforward implementation, they are not in any way required; there are alternative models and implementations for PostFix (e.g., see Exercise 3.12 on page 70). It would be desirable to have a more ab- stract definition of what constitutes a legal PostFix implementation so that a would-be implementer could be sure that an implementation was faithful to the language definition regardless of the representations and algorithms employed. 1.5 Overview of the Book The remainder of Part I introduces a number of tools that address the inade- quacies outlined above and that form an essential foundation for the study of programming language design. Chapter 2 presents s-expression grammars, a simple specification for syntax that we will use to describe the structure of all of the mini-languages we will explore. Then, using PostFix and a simple ex- pression language as our objects of study, we introduce two approaches to formal semantics: • An operational semantics (Chapter 3) explains the meaning of programming language constructs in terms of the step-by-step process of an abstract machine. • A denotational semantics (Chapter 4) explains the meaning of programming language constructs in terms of the meaning of their subparts. 16 Chapter 1 Introduction These approaches support the unambiguous specification of programming lan- guages and provide a framework in which to reason about properties of programs and languages. Our discussion of tools concludes in Chapter 5 with a presentation of a technique for determining the meaning of recursive specifications. Through- out the book, and especially in these early chapters, we formalize concepts in terms of a mathematical metalanguage described in Appendix A. Readers are encouraged to familiarize themselves with this language by skimming this ap- pendix early on and later referring to it in more detail on an “as needed” basis. Part II focuses on dynamic semantics, the meaning of programming lan- guage constructs and the run-time behavior of programs. In Chapter 6, we in- troduce FL, a mini-language we use as a basis for investigating dimensions of programming language design. By extending FL in various ways, we then ex- plore programming language features along key dimensions: naming (Chapter 7), state (Chapter 8), control (Chapter 9), and data (Chapter 10). Along the way, we will encounter several programming paradigms, high-level approaches for viewing computation: function-oriented programming, imperative programming, and object-oriented programming. In Part III, we shift our focus to static semantics, properties of programs that can be determined without executing them. In Chapter 11, we introduce the notion of type — a description of what an expression computes — and develop a simple type-checking system for a dialect of FL such that “well-typed” programs cannot encounter certain kinds of run-time errors. In Chapter 12, we study some more advanced features of typed languages: subtyping, universal polymorphism, bounded quantification, and kind systems. A major drawback to many of our typed mini-languages is that programmers are required to annotate programs with significant amounts of explicit type information. In some languages, many of these annotations can be eliminated via type reconstruction, a technique we study in Chapter 13. Types can be used as a mechanism for enforcing data abstraction, a notion that we explore in Chapter 14. In Chapter 15, we show how many of the dynamic and static semantics features we have studied can be combined to yield a mini-language in which program modules with both value and type components can be independently type-checked and then linked together in a type-safe way. We wrap up our discussion of static semantics in Chapter 16 with a study of e"ect systems, which describe how expressions compute rather than what they compute. The book culminates, in Part IV, in a pragmatics segment that illustrates how concepts from dynamic and static semantics play an important role in the implementation of a programming language. Chapter 17 presents a compiler that translates from a typed dialect of FL to a low-level language that resembles 1.5 Overview of the Book 17 assembly code. The compiler is organized as a sequence of meaning-preserving translation steps that construct explicit representations for the naming, state, control, and data aspects of programs. In order to automatically reclaim memory in a type-safe way, the run-time system for executing the low-level code generated by the compiler uses garbage collection, a topic that is explored in Chapter 18. While we will emphasize formal tools throughout this book, we do not imply that formal tools are a panacea or that formal approaches are superior to informal ones in an absolute sense. In fact, informal explanations of language features are usually the simplest way to learn about a language. In addition, it’s very easy for formal approaches to get out of control, to the point where they are overly obscure, or require too much mathematical machinery to be of any practical use on a day-to-day basis. For this reason, we won’t cover material as a dry sequence of definitions, theorems, and proofs. Instead, our goal is to show that the concepts underlying the formal approaches are indispensable for understanding particular programming languages as well as the dimensions of language design. The tools, techniques, and features introduced in this book should be in any serious computer scientist’s bag of tricks.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved