Table of Contents
1. Introduction
This document specifies and describes the syntax and the static semantics of the language Compila 22. The dynamic semantics, i.e., the description of the language's behavior when being executed, should be fairly clear even without explicit formal specification.
1.1. Notational conventions and syntax of this document
In the description of the grammar later, we use capital letters for non-terminals. As meta-symbols for the grammar, we use the following:
->, |, (, ), {, }, [, ], "
Commas in the previous line are used as ``meta-meta symbols'' in the enumeration of the meta-symbols.
When writing down the grammar in some variant of EBNF, {...}
represents
iteration of zero or more times, [...]
represents optional
clauses. Everything else, written as contiguous sequences, are terminal
symbols. Those with only small letters are reserved keywords of the
meta-language.
Note that terminal symbols of the Compila-language are written in
``string-quotes'' (i.e., with a "
at the beginning and at the end) to
distinguish them from symbols from the meta-language. Some specific
terminal symbols are written in capitals, and without quotes. Those are
NAME
,INT_LITERAL
,FLOAT_LITERAL
andSTRING_LITERAL
.
See the following section about lexical aspects for what those terminal symbols represent.
2. Lexical aspects
2.1. Identifiers and literals
NAME
must start with a letter, followed by a (possibly empty) sequence of numeric characters, letters, and underscore characters; the underscore is not allowed to occur at the end. Capital and small letters are considered different.- All keywords of the languages are written in with lower-case letters. Keyword cannot be used for standard identifiers.
INT_LITERAL
contains one or more numeric characters.FLOAT_LITERAL
contains one or more numeric characters, followed by a decimal point sign, which is followed by one or more numeric characters.STRING_LITERAL
consists of a string of characters, enclosed in quotation marks ("
). The string is not allowed to contain line shift, new-line, carriage return, or similar. The semantic value of aSTRING_LITERAL
is only the string itself, the quotation marks are not part of the string value itself.
2.2. Comments
Compila supports single line and multi-line comments.
- Single-line comments start with
//
and the comment extends until the end of that line (as in, for instance, Java, C++, and most modern C-dialects). - Multi-line comments start with
(*
and end with*)
.
The latter form cannot be nested. The first one is allowed to be ``nested''
(in the sense that a commented out line can contain another //
or the
multi-line comment delimiters, which are then ignored).
3. Data types
3.1. Built in data types
The language has four built-in types and user-defined types:
- Built-in types
- floating point numbers (
"float"
), - integers (
"int"
), - strings (
"string"
), and - booleans (
"bool"
).
- floating point numbers (
- User-defined types:
- Each (name of a) record represents a type.
- Reference types, representing references to elements of the specified types. The reference type constructor can be nested.
3.2. Records
The language supports records. For people coming from Java/C++ etc.,
records can be seen as (very) simple form of classes, containing only
instance variables as members, but support neither methods nor
inheritance nor explicitly programmable constructors. ``Instance
variables'' are more commonly called record fields or just fields when
dealing with records. Records do support instantiation, here via the
new
keyword. Another aspect which resembles classes as in Java is that
variables of record type contains either a pointer to an element
(``object'') of that record type or the special pointer value
null
.1
3.3. References
The language allows that variables can be declared to be of reference type ("pointers", if you will) by writing, e.g.,
var x: ref(int);
Variables, like x
, can be assigned values that are references, so, e.g.,
the following is allowed:
var y: int; y := 42; x := ref(y);
Correspondingly, one can ``follow'' a reference r
by using deref(r)
, so
that, given the previous definitions, the following is legal:
y := deref(x);
Expressions with deref
can also be used as L-values, so that we can
assign values to the location that they are pointing to, e.g.:
deref(x) := y;
See also later the swap procedure example later in the context of parameter passing.
4. Syntax
4.1. Grammar
The following productions in EBNF describe the syntax of the language. For precedences and associativity of various constructs, see later.
PROGRAM -> "program" NAME "begin" [ DECL {";" DECL}] "end" DECL -> VAR_DECL | PROC_DECL | REC_DECL VAR_DECL -> "var" NAME ":" TYPE [ ":=" EXP ] | "var" NAME ":=" EXP PROC_DECL -> "procedure" NAME "(" [ PARAMFIELD_DECL { "," PARAMFIELD_DECL } ] ")" [ ":" TYPE ] "begin" [[DECL{";" DECL}] "in"] STMT_LIST "end" REC_DECL -> "struct" NAME "{" [ PARAMFIELD_DECL {";" PARAMFIELD_DECL }] "}" PARAMFIELD_DECL -> NAME ":" TYPE STMT_LIST -> [STMT {";" STMT}] EXP -> EXP LOG_OP EXP | "not" EXP | EXP REL_OP EXP | EXP ARITH_OP EXP | LITERAL | CALL_STMT | "new" NAME | VAR | REF_VAR | DEREF_VAR | "(" EXP ")" REF_VAR -> "ref" "(" VAR ")" DEREF_VAR -> "deref" "(" VAR ")" | "deref" "(" DEREF_VAR ")" VAR -> NAME | EXP "." NAME LOG_OP -> "&&" | "||" REL_OP -> "<" | "<=" | ">" | ">=" | "=" | "<>" ARITH_OP -> "+" | "-" | "*" | "/" | "^" LITERAL -> FLOAT_LITERAL | INT_LITERAL | STRING_LITERAL | BOOL_LITERAL | "null" BOOL_LITERAL -> "true" | "false" STMT -> ASSIGN_STMT | IF_STMT | WHILE_STMT | RETURN_STMT | CALL_STMT ASSIGN_STMT -> VAR ":=" EXP | DEREF_VAR ":=" EXP IF_STMT -> "if" EXP "then" { STMT_LIST } [ "else" { STMT_LIST } ] "fi" WHILE_STMT -> "while" EXP "do" { STMT_LIST } "od" RETURN_STMT -> "return" [ EXP ] CALL_STMT -> NAME "(" [ EXP { "," EXP } ] ")" TYPE -> "float" | "int" | "string" | "bool" | NAME | "ref" "(" TYPE ")"
4.2. Precedence
The precedence of the following constructs is ordered as follows (from lowest precedence to the highest):
||
&&
not
- All relational symbols
+
and-
*
and/
^
(exponentiation).
(``dot'', to access fields of a record)
4.3. Associativity
- The binary operations
||
,&&
,+
,-
,*
, and.
are left-associative, but exponentiation^
is right-associative. - Relation symbols are non-associative. That means that for instance
a < b + c < d
is illegal. - It's legal to write
not not not b
and it stands for(not (not (not b)))
.
5. Parameter passing
When describing the parameter passing mechanisms of the language, this document distinguishes (as is commonly done) between
- actual parameters and
- formal parameters.
The actual parameters are the expressions (which include among other things variables) as part of a procedure call. The formal parameters are the variables mentioned as part of procedure definition. The language supports supports as only parameter passing mechanism call-by-value:
5.1. Call-by-value
The formal parameters are local variables in the procedure definition. When a procedure is being called, the values of the local parameters are copied into the corresponding formal parameters.2
5.2. Small example for call-by-reference
program swapexample begin procedure swap (a : ref(int), b : ref(int)) begin var tmp : int in tmp := deref(a); deref(a) := deref(b); deref(b) := tmp end; procedure main ( ) begin var x : int; var y : int; var xr : ref (int); var yr : ref (int) in x := 42; y := 84; xr := ref (x); yr := ref(y); swap (xr,yr) end end
6. Standard library
The programming language comes with a standard library which offers a number of IO-procedures. All reading, i.e., input, is done from standard input (``stdin''). All writing, i.e., output, is to standard output (``stdout'')
proc readint(): int |
read one integer |
proc readfloat(): float |
read one float |
proc readchar(): int |
read one character and return its ASCII value. Return -1 for EOF |
proc readstring(): string |
read a string (until first whitespace |
proc readline(): string |
read one line |
proc printint(i:int) |
write an integer |
proc printfloat(f:float) |
write one floating point number |
proc printstr(s:string) |
write one string |
proc printline(s:string) |
write one string, followed by a ``newline'' |
7. Static semantics / typing / evaluation.
7.1. Binding of names
The using occurence of an identifier (without a preceding dot) is bound in the common way to a declaration. This association of the use of an identifier to a declaration (``binding'') can be described informally as follows: Look through the block or scope which encloses the use-occurence of the identifer (where the block refers to the procedure body or program). The binding occurrence corresponding to the use occurence is the first declaration found in this way. If no binding occurence is found that way, the program is erroneous. Formal parameters count as declarations local to the procedure body.
Use occurences of a name preceded by a dot correspond to the clause EXP
"." NAME
in the production for the non-terminal VAR
in the grammar.
Those names are bound by looking at the type of EXP
(which is required to
be a record-type) and look up the field with name NAME
in that record. It's
an error, if EXP
is not of record-type or else, there is not such field in
that record,
7.2. Typing of compound constructs
- expressions: expressions need to be checked for type correctness in the obvious manner. The whole expression (if it type-checks) will thus carry a type.
- assignments: Both sides of an assignment must be of the same type. Note: it is allowed to assign to the formal parameters of a procedure. That applies to both call-by-value and call-by-reference parameters. Of course, the effect of an assignment in these two mechanisms is different.
- conditionals and while loop: the condition (i.e., expression) in the
conditional construct must be of type
bool
. Same for the condition in the while loop. - field selection:
- the expression standing in front of a dot must be of record type.
- the name standing after a dot are the name of a field/attribute of the record type of the expression in front of the dot. The type of the field selection expression (if it type checks) is the type as declared for the field of the record.
7.3. Types and implicit type conversion
It is allowed to assign an expression of type int
to a variable of type
float
. The inverse situation is not allowed. There is no type cast
operator. If an arithmetic expression has at least one operand of type
float
, the operation is evaluated using floating point arithmetic and the
result is of type float
. Exponentiation is always considered done with
floating point arithmetic and the result is of type float
.
7.4. Type inference
The language requires a tiny bit of type inference (so small that one might
not even call type inference). Variables need to be declared, using the
keyword var
. In particular, variables must have a type at the point when
they are declared. The declaration can be combined with a definition of an
initial value of introduced variable (using the syntax like var x : int :=
42
). When provided with an initial expression, the type can be omitted
(using a syntax like var x := 42
), in which case an appropriate type
needs to be inferred.
7.5. Short-circuit evaluation
The logical operators &&
and ||
use so-called short-circuit
evaluation. That means that if the value of the logical expression can
be determined after one has evaluated the first part, only, the rest of
the expression is not evaluated.
8. Procedures
- In a procedure, all declarations are required to occur before executable code (statements). In a procedure, the same declarations are allowed as on the outermost, global scope, i.e., procedure-local declarations of variables, procedures, and records are allowed.
- Procedures called within an expression must have a defined return type. That type must match with the way the call is used in an expression.
- Concerning the number and types of the parameters of a procedure: they
must coincide comparing the declaration/definition of the procedure and
the use of a procedure. That requirement applies also to the parameter
passing mechanism (i.e., whether the variable resp. actual parameter is
marked as ``by
ref
''. Return statements:
- A
return
-statment is allowed only in procedure-definitions. Such a statement marks that the procedure terminates (and returns). In addition, the return statement gives an expression for the value to be returned to the caller. - If a procedure is declared without return type, the procedure does not need a return statement. In that case, the procedure returns (without a return value) when the last statement in the procedure body has been executed.
- If a procedure has declared a return type, its body is required to have a return statement (with corresponding expression of the correct type). That statement need to be the last statement in the procedure's body.
- A
9. Further conditions
- Declarations must be unique per block. Two declarations (within one block) of a procedure, a record, or a variable with the same name are considered as double declarations, which are forbidden.
- The name of a formal parameter must not collide with names of local declarations within the procedure. Besides, the names of all formal parameters of one procedure must by distinct.
- All names being used must be declared.
- Each program must have a procedure named
main
. This procedure is the one called upon start of the program.
Footnotes:
Records are sometimes also called ``structs''.
The language supports reference types. Variables or expressions of reference type are passed by value. That leads to a behavior resembling pretty much ``call-by-reference'', but the parameter passing mechanism proper is call-by-value (of reference data). One may speak of call-by-value-reference.