Miniproject.
The first section of the course Programming Language Implementation and Formalisation
, is about programming. In particular, we have learned some
advanced programming techniques that are commonly used in interpreter and
compiler implementations.
In this assignment, you are asked to write a parser for the Boa
programming langauge, using the monadic combinator library for constructing
parsers called Parsec
.
Credits
The Boa
programming language was designed by Andrzej Filinski, and this
project is based on two weekly assignments from the course Advanced
Programming, where Andrzej is
course responsible at the time of writing.
Grammar of Boa.
In Oblig 1, we introduced the abstract syntax of Boa, and we wrote an interpreter for it. The concrete syntax of Boa is given by the following grammar, written in EBNF form:
Program ::= Statements
Statements ::= Statement
| Statement ';' Statements
Statement ::= ident '=' Expr
| Expr
Expr ::= numConst
| stringConst
| 'None' | 'True' | 'False'
| ident
| Expr Oper Expr
| 'not' Expr
| '(' Expr ')'
| ident '(' Exprz ')'
| '[' Exprz ']'
| '[' Expr ForClause Clausez ']'
Oper ::= '+' | '-' | '*' | '//' | '%'
| '==' | '!=' | '<' | '<=' | '>' | '>='
| 'in' | 'not' 'in'
ForClause ::= 'for' ident 'in' Expr
IfClause ::= 'if' Expr
Clausez ::= eps
| ForClause Clausez
| IfClause Clausez
Exprz ::= eps
| Exprs
Exprs ::= Expr
| Expr ',' Exprs
ident ::= (see text)
numConst ::= (see text)
stringConst ::= (see text)
In naming the nonterminals, we use the informal mnemonic convention that a Foos is a sequence containing at least one Foo, whereas a Fooz sequence may also be of length zero. Supplementing the formal grammar, we also specify the following:
-
ident :
Identifiers (used for variable and function names) consist of one or more letters, digits, and underscores, where the first character must not be a digit. Additionally, an identifier must not be one of the Boa reserved words:
None
,True
,False
,for
,if
,in
, andnot
. (For compatibility, it may be advisable to also avoid using other Python reserved words in Boa programs, but your parser should not treat those specially.) -
numConst :
Numeric constants consist of an optional negative sign
-
, followed (without any intervening whitespace) by one or more decimal digits. The first digit must not be a zero unless it is the only digit. Thus,-0
or100
are well formed _numConst_s, while007
,+2
'', or- 4
are not. -
stringConst :
String constants in Boa are written between single quotes
'
. (Unlike in Python, neither the"
nor"""
delimiters are supported.) Inside a string constant, all printable ASCII characters are allowed, except for single quotes and backslashes, which must be escaped as\'
and\\
, respectively. Raw newline characters (being non-printable) are not allowed, but can be included by the sequence\n
. On the other hand, to allow string constants to be written over multiple lines, a backslash followed immediately by a newline causes both characters to be ignored. Thus, the Boa string constant'fo\\o\ b\na\'r'
should be parsed as the 9 characters "
fo\ob?a'r
" (where '?' represents a single newline character). All other escape sequences (i.e., a backslash followed by anything other than a single quote, another backslash, the lowercase letter 'n', or a newline) are illegal. -
General lexical conventions :
All identifiers and keywords are case sensitive.
Tokens of the grammar may be surrounded by arbitrary whitespace (spaces, tabs, and newlines).
Comments start with a hash character (
#
) and run until the end of the line (or the input, if there is no final newline).Unlike Python, Boa is not indentation-sensitive. Thus, there might be Boa programs that would not be valid Python programs due to, e.g., leading whitespace on a line.
-
Disambiguation :
The raw grammar of Expr is highly ambiguous. To remedy this, we specify the following operator precedences, from loosest to tightest grouping:
-
The logical-negation operator
not
. Nesting is allowed, sonot not x < 3
parses likenot (not (x < 3))
. -
All relational operators (
==
, etc., includingin
andnot``in
). These are all non-associative, i.e., chains likex < y < z
are syntactically illegal (unlike in Python). -
Additive arithmetic operators (
+
and-
). These are left-associative, e.g.,x-y+z
parses like(x-y)+z
. -
Multiplicative arithmetic operators (
*
,//
, and\%
). These are also left-associative.
-
What to hand in
Code
Please fill in the blanks, in the handed out stack project. That is, any
function which is undefined
should be defined by you.
-
In your test suite, consider including any relevant negative test cases, i.e., tests verifying that your code correctly detects and reports error conditions.
-
If some functionality is known to be missing or wrong in your code, the corresponding test cases should still compare against the correct expected output.
-
Your code should ideally give no warnings when compiled with
ghc(i) -W
.
Report
Your handin should include a short (1--2 pages) report. In your report, we want you to address the following
-
Did you make any (non-trivial) design and implementation choices? Focus on high-level aspects and ideas, and explain why you did something non-obvious, not only what you did.
-
Give an honest, justified assessment of the quality of your submitted code, and the degree to which it fulfills the requirements of the assignment. Consult
06-guide-on-writing-an-assesment.md
on the course page if you need inspiration. -
If at all, we want you to explain where and why you introduced backtracking/lookahead (
try
) combinator in your parser.