This is a big specification and there is a lot of code for you to
pick through! Don't hesitate to ask for clarification on any part
of the project (up to and including requests for better documentation
for some part of the provided code).
Extend your control flow compiler to support static typechecking,
mutually recursive functions, constants and arrays. Your compiler will
accept a program of the form:
= 65 && char <= 90) (
65 + ((char - 65 + 13) % 26)
) else if (char >= 97 && char <= 122) (
97 + ((char - 97 + 13) % 26)
) else char
)
con_init();
con_clear();
con_attrset(con_color(2));
con_print(rot13("Hello, world!"));
con_getch();
con_cleanup();
0]]>
and, for this example, will produce a program that prints
Uryyb, jbeyq!
colored green to a blank terminal.
(10/23 1:14) T-For should check that the
body of the for loop is well-typed.
(10/22 1:02) Added example programs:
rfk.ap
(under the GPL);
web.ap
project4.tar.gz
contains all that you
need to get started with this assignment. Extract it to your
projects directory such that apast.ml is located at
cis341/projects/arrproc/apast.ml. (All files are stored in the
tarball relative to cis341, so switching to your cis341
directory and running tar xvf project4.tar.gz should do the trick.)
Edit the OMakefile at cis341/projects/OMakefile by
adding the new directory to the .SUBDIRS declaration:
PROJECT_SUBDIRS[] =
x86interp
expressions
controlflow
arrproc
Remember that you may change the provided code so long as you keep
the interfaces the same (so that we can use everyone's testers on
everyone's assignments). If you invested a lot of time into writing a
good algorithm for code generation for the last assignment, you should
definitely bring it over for this assignment.
arrproc programs depend on the fruntime library
rather than the runtime library (as did the previous two
projects). You should read
more about fruntime before starting this assignment.
IL operands are now split into doperands (destination
operands for writing) and soperands (source operands for reading).
Any destination operand can become a source operand using the Dest
constructor. For example, an svar s as a destination operand appears
as Svar s. As a source operand, it would look like
Dest (Svar s).
A complete lexer and parser are provided for this assignment in
lexer.mll and parser.mly. You should understand these
files, but do not modify them.
This assignment's lexer and parser keep track of token locations from
source files. This will allow you to raise helpful error messages that
point to the exact tokens that cause problems.
Notice the new toplevel rule in the grammar for this language:
An arrproc program is a list of includes, followed by
a list of declarations and finished with a single 'main' or
'top' exp.
Previously, programs consisted only of an exp.
The includes section of a program can contain zero or more
files to inline. The compiler will topologically sort included files
according to the dependency graph before doing any inlining. Files are
kept track of only by name, so trying to include both a/a.ap
and b/a.ap will have undefined results. When the compiler merges
file B in after A (because B includes A), B's declarations are appended
to A's declarations and B's expression is appended to A's expression using
a sequencing operator. Consider the following example:
/* inlineA.ap */
unit afunc() (())
"A"
/* inlineB.ap */
include "inlineC.ap"
include "inlineA.ap"
unit bfunc() (())
"B"
/* inlineC.ap */
include "inlineA.ap"
unit cfunc() (())
"C"
/* inlineD.ap */
include "inlineB.ap"
unit dfunc() (())
"D"; 42
Compiling inlineD.ap will yield the following program:
unit afunc () (())
unit cfunc () (())
unit bfunc () (())
unit dfunc () (())
"A"; "C"; "B"; "D"; 42
By default, the compiler searches in the lib and .
directories for includes. You can add more include directories with
the -i option.
AT's typechecker requires explicit typing annotations.
The relevant rules in the grammar for types are:
That is, you can form regular types (unit, string, and
int). You can also form arrays using Java-style syntax: a
single-dimensional array of int is int[]; a
multi-dimensional array is int[][]. Some types, like
unit[], are not well-formed. These are caught during
typechecking.
A declaration can either declare a new function or a new constant. Names
for functions and constants must be unique (you may not have a function named
foo and a constant named foo, or two functions named foo or two constants
named foo). The order of declaration does not matter for functions, but
it does matter for constants (see the information on mutual recursion below).
The declaration of a constant looks like:
KW_CONST ty IDENT OP_ASSIGN E1 OP_SEMI
Which, in a source file, would turn into:
const int aint = 1;
Note the mandatory semicolon. A function declaration looks like:
ty IDENT OP_LPAREN arglist OP_RPAREN OP_LPAREN exp OP_RPAREN
And in the source file:
int plusone(int a, int b) ( a + 1 )
Note that the parenthesis surrounding the function body are mandatory
and that there is no semicolon following the declaration.
Some functions may be declared as external (because they appear in the
runtime):
ty IDENT OP_LPAREN arglist OP_RPAREN KW_EXTERN
And in code:
int printinc(int a, int b) extern
There are several new syntactic forms beyond those that were in
the control flow language. These include:
String literals: "Like this."
The escape codes \n, \t, \\ and
\" are supported.
The null keyword: null can take on
any reference type.
The unit value: As in OCaml, unit looks like
() and has the type unit. This value is useful
only where you need a unit return value (as in a function
with side effects) or for ending a file meant only to be included
by other files.
Function calls: These look like C or Java function calls:
foo(a, b, c) calls function foo with the arguments
a, b, and c.
Array indexing: Like C or Java: arr[i].
Array allocation: Like Java, but with a twist: you must
provide an initial value that is copied to every element of the
array. For example, to allocate a new array of ints
(all equal to zero), use new int[42](0).
The topmost expression of a program must return something of type
int. This becomes the return code of the program.
There's quite a bit of code in the provided package. To get an idea of
how it all fits together, we can track the progress of a program being
compiled.
arrproc.run is invoked on the command line.
Arrproc_main parses the command-line arguments and configures
the compiler settings in Arrproc_driver. It then invokes
Arrproc_driver.execute ().
Arrproc_driver does its work in do_one_file. At this
point, the program is represented only by its filename.
Arrproc_driver.preprocess opens the program file and passes
it to the lexer and parser. These components work together to turn the
program into a Range.t Apast.preprog. The Range.t means
that the program does not currently have type information, but it does
have information that will help us print error messages to the user.
An Apast.preprog contains the program's include file list,
its declaration list and its expression.
preprocess uses the include component of the preprog
to determine which other files to load (and in what order). Once all of
this is worked out, it combines the files together (in the manner described
above in the 'Includes' section) to produce a Range.t Apast.prog.
A prog is much like a preprog except a prog
does not have any includes.
do_one_file then passes the Range.t prog to
Aptypecheck.typecheck (the first function you are responsible
for building). This function endeavors to associate types with each
node in the prog's AST. This change is reflected in the type
of the program. Consider the signature of typecheck:
val typecheck : Range.t Apast.prog -> typed Apast.progtypecheck takes a Range.t Apast.prog (or an untyped
program) and produces a typed Apast.prog (or a typed program).
typed here means (Range.t * Apast.typ). This is a pair
of the original location information with the type associated with an
AST node.
Now that it has a typed prog, do_one_file passes
the program on to Arrproc.il_of_ast (the second function that
you are responsible for completing by finishing some cases in
rev_il_of_ast). il_of_ast's type looks like:
val il_of_ast : Aptypecheck.typed Apast.prog ->
(Apil.Il.label,unit) Apil.IlCfg.appil_of_ast takes a typed program and produces an app
which uses regular labels and does not contain any additional tagging
information on blocks. An app contains one or more procs,
each of which may refer to one another by label.
Now that the app has been created, do_one_file
lowers the IL to X86lite and passes it to the compiler backend,
which builds the executable and links it with the runtime system.
This is a group project. Groups of size 1 or 2 are acceptable.
Constants and functions both exist in the same global namespace and all
global names must be unique (there is no 'shadowing'). A function may not
share its name with a constant. Locally bound variables may shadow
constants; you may have a locally bound variable with the same name
as a global function, but that variable will not shadow the function.
Function declarations are mutually recursive: the body of any function can
refer to the name of any other function irrespective of their order of
declaration.
Constant declarations are not mutually recursive. A constant C2
may refer to a constant C1 only if C1 was declared before C2. In addition,
constants may not refer to any functions.
Complete typecheck as defined in aptypecheck.mli. There
is some code in aptypecheck.ml to get you started. Part of your
grade for this section depends on the quality of the error messages you
raise for the user. Simply printing Type error. for badly-formed
programs is insufficient. Give more details about what kind of error
occurred.
Typechecking constants occurs under the empty binding context.
The signature context contains nothing but constants that have
been declared previously to the constant being checked.
well-formed(T)e : EE = Tconst T v = e; is OK
Typechecking functions occurs under the binding context consisting
only of that function's argument names. The signature context contains
all constants and all functions (including the one being checked).
well-formed(T) or T is TUnita1 ... aN are uniqueA1 ... AN are well-formeda1:A1, a2:A2, aN:ANe : EE = TT v(A1 a1, A2 a2 ... AN aN) (e) is OK
If the function body is None (as in an extern
function), just pass on the None through the typechecker.
You are given functions for testing type equality,
well-formedness (of types for arguments, arrays and variables), and
join ().
The following inference rules apply to expressions under a variable
binding context and an implied signature context S. is read
as 'any operator'.
integer-literal : TIntstring-literal : TRef (TString)() : TUnitnull : TRef (TNull)x : Xx : Xx
constant x : X Sx : Xa : Ab : Ba; b : Bwell-formed(T)i : II = T, v:Te : Elet T v = i in e : Ev : Te : ET = Ev = e : Ta : TRef (TArray T)i : TInte : ET = Ea[i] = e : Ta : TIntb : TInt is a binop or a scbinopab : TInta : Ab : BA is TRef A'B is TRef B'A = B is == or !=ab : TInte : TInt is a unope : TIntc : TIntl : Lr : RT is LRif(c) l else r : Tc : TInte : Ewhile(c) e : TUnitc : TInti : Ie : Efor(c,i) e : TUnitc1 ... cN uniquee : TInteM : TM (M
[1,N])eD : TDT is T1T2 ... TNTDswitch(e) c1 => e1 ... cN => eN default => eD :
Tf has return type T and arg types A1 ... ANeM : TM (M
[1,N])TM = AM (M [1,N])f(e1 ... eN) : Ta : TRef (TArray T)i : TInta[i] : Twell-formed(T)i : TInte : ET = Enew T[i](e) : TRef (TArray T)
Variables, constants, and function names will appear as
Ident AST nodes.
For T-Assign-Var, you can assign the variable v
its type T.
For T-Assign-Arr, you can assign the array a
its type TRef (TArray T)
For T-For, if either i or c are None
, just pass None into the typed AST. (It is not a type error
to be missing those components.)
In T-Invoke-Name, the name of the function to invoke will
appear as an Ident AST node.
The return type of a program's expression (the code that does not
appear in any function or constant) must be TInt.
Complete il_of_ast as defined in arrproc.mli by
filling in the missing cases for rev_il_of_ast.
To compile code that deals with arrays, you will need to use the new
InvokeCdecl, CheckedDeref and CheckedMove
IL instructions.
For array indexing (a[i]), evaluate a, then
evaluate i.
For array creation (new T[i](e)), evaluate i,
then evaluate e. The runtime system exposes a function
called newarr that takes two arguments: the first is
the length of the array and the second is the array's initialization
value. You must call this function to allocate the array.
Assignment always evaluates its right hand side first. If the
left hand side is a variable, set the variable. If the left hand
side is an index expression a[i], first evaluate
a, then evaluate i.
All array operations should be checked for nullness (eg, you
should always set the null check flag on the IL instructions).
To compile code that deals with procedures, you will need to use
the new InvokeCdecl IL instruction. Instead of building
a single proc as you did in the previous project, this
time around you will map each non-extern function to its own
proc and pull everything together into an app. You
will need to declare at least one special proc that first
initializes the program's constants in order and then executes the
program's expression. The provided code takes care of this.
Function arguments in Invoke AST nodes are evaluated
from left to right.
The cdecl calling convention gives certain rules for
decorating function names. When you want to create a name for
or call a non-extern function, create a label using
Apil.decorate_internal_function. If
you want to call an extern function (including functions from
the runtime system), create a label using
Apil.decorate_external_function. Both of these functions
use mk_label_named, so you need to make sure (for internal
functions) that function names are distinct. The provided code
takes care of decorating functions in the code generation
context.
Binary operators always evaluate their left-hand sides first,
then their right-hand sides.
Variable bindings in for loops appear (and are executed)
in order. The expression
for(int x = 1, int y = 2, int z = 3;;) ()
is equivalent to
(int x = 1; int y = 2; int z = 3; for (; ; ) (()))
null has a runtime representation of 0.
() (unit) has no runtime representation.
The return values of && and || are only defined
up to zero or nonzero. 0 || 6 could be 1 or
6 (or even 42), but it can't be 0.
You can receive extra credit for this project for any of the
following:
Improvements to the runtime system.
An especially clever or clear typechecker implementation.
An especially clever or clear IL generator.
A rigorous test suite.
An interesting program implemented in the AT language that
demonstrates new features. (We're looking for something closer to
the Game of Life or robotfindskitten examples and farther from
factorial.)
If you have an ambitious idea, you should check with us to see how much
we think it could add to your grade.
Edit the included readme.txt file, making sure to fill in your
name(s) and email address(es). If you want any part of your project
to count for extra credit, be sure to document it.
You should also spend some time to ensure that your code is readable and
well-commented. We have some suggestions
and requirements for your programming style. You will definitely lose
points for the file submission requirements (code must compile, 80 column
limit, no tabs); however, barring any particularly egregious cases, you may
take the remaining rules as suggestions.
Part of your grade will be based on testing. Some example tests and
utility functions are provided in arrproc_test.ml. You must
use the testing framework provided and explicitly note any modifications in
your readme.txt file. We ask that you name your test cases
regularly: choose a username from those in your group. All test names and
files should contain that username followed by an underscore as a prefix.
(For example, if your username is cat and you have written a
test case called alive, that test should be called
cat_alive.)
For this assignment, we require that you define at least 10 new unit tests.
(These must be sufficiently different from one another to count— tests
like 1 + 1 is 2, 1 + 2 is 3, 1 + 3 is 4 are too similar to be useful).
In addition, you should write (and identify in your documentation) an
eleventh more difficult test. Although anything will do, it is better if
this test exercises most of the project's functionality. We will collect
everyone's difficult tests and run them as part of our standard test suite
during grading.
You should contribute at least one small test to the mailing list
and are encouraged to send in test cases to highlight parts of
the specification about which you may have questions. It's OK to
count tests from the mailing list toward your ten (but you must write
your own unique larger test case).
If you have built additional functionality that is beyond
what is explicitly specified in the assignment description, make sure
that the tests you count toward this test requirement do not depend
on those extensions.
All test cases that you count toward the testing portion of your grade
must be fully automatic and must check for some result. The tests must
run when we call omake test from the arrproc
directory.
The grading breakdown for this project is as follows:
~60%: Typechecking.
~20%: New code generation.
~10%: Code style, readability and documentation.
~10%: Testing.
Extra credit will vary depending on what you implement. Remember
to contact us before putting in the time!
Remember that your raw score will be scaled— that is, more difficult
assignments will be worth more than less difficult assignments, even if
the more difficult assignment has fewer available points.
arrproc.run, the executable the OMakefile will produce,
offers several helpful command line flags that might make debugging easier.
These are:
-c: cleans the temporary directories c_bin and c_obj.
--printasm: prints the generated assembler (in AT&T
syntax) to the console.
-p: prints the generated IL to the console.
-r: runs the resulting executable.
-d: runs the resulting executable in the debugger, setting a
breakpoint at the start of your generated code.
--ddd: runs the program in DDD, a graphical interface to
gdb.
A good combination of flags for this assignment is:
./arrproc.run -c -p -r testfile.ap
First, make sure to clean your project with omake clean. All of
the files you've written should be in projects/arrproc. Submit
your assignment on ENIAC. Your project must run on ENIAC, including all
test cases. Be sure to submit from ENIAC and not from some other
Linux machine; otherwise you will get an error.
Switch to the projects directory and type:
turnin -c cis341 -p project4 arrproc
Verify that you've submitted your files with:
turnin -c cis341 -v -p arrproc
You can re-submit your work any number of times, but only the last
submission will be counted. New submissions completely overwrite old
submissions— so if you're missing a single file, make sure to
resubmit your entire project. Check the
main course page for information about late policies.