Project 4 Index

This is a big specification and there is a lot of code for you to pick through! Don't hesitate to ask for clarification on any part of the project (up to and including requests for better documentation for some part of the provided code).

Extend your control flow compiler to support static typechecking, mutually recursive functions, constants and arrays. Your compiler will accept a program of the form: = 65 && char <= 90) ( 65 + ((char - 65 + 13) % 26) ) else if (char >= 97 && char <= 122) ( 97 + ((char - 97 + 13) % 26) ) else char ) con_init(); con_clear(); con_attrset(con_color(2)); con_print(rot13("Hello, world!")); con_getch(); con_cleanup(); 0]]> and, for this example, will produce a program that prints Uryyb, jbeyq! colored green to a blank terminal.
  • (10/23 1:14) T-For should check that the body of the for loop is well-typed.
  • (10/22 1:02) Added example programs: rfk.ap (under the GPL); web.ap
project4.tar.gz contains all that you need to get started with this assignment. Extract it to your projects directory such that apast.ml is located at cis341/projects/arrproc/apast.ml. (All files are stored in the tarball relative to cis341, so switching to your cis341 directory and running tar xvf project4.tar.gz should do the trick.) Edit the OMakefile at cis341/projects/OMakefile by adding the new directory to the .SUBDIRS declaration: PROJECT_SUBDIRS[] = x86interp expressions controlflow arrproc

This package depends on the files from Project 3.

Remember that you may change the provided code so long as you keep the interfaces the same (so that we can use everyone's testers on everyone's assignments). If you invested a lot of time into writing a good algorithm for code generation for the last assignment, you should definitely bring it over for this assignment.

arrproc programs depend on the fruntime library rather than the runtime library (as did the previous two projects). You should read more about fruntime before starting this assignment.

IL operands are now split into doperands (destination operands for writing) and soperands (source operands for reading). Any destination operand can become a source operand using the Dest constructor. For example, an svar s as a destination operand appears as Svar s. As a source operand, it would look like Dest (Svar s).

A complete lexer and parser are provided for this assignment in lexer.mll and parser.mly. You should understand these files, but do not modify them.

This assignment's lexer and parser keep track of token locations from source files. This will allow you to raise helpful error messages that point to the exact tokens that cause problems.

Notice the new toplevel rule in the grammar for this language:

toplevel: | includes decls exp { ($1, List.rev $2, $3) }

An arrproc program is a list of includes, followed by a list of declarations and finished with a single 'main' or 'top' exp. Previously, programs consisted only of an exp.

The includes section of a program can contain zero or more files to inline. The compiler will topologically sort included files according to the dependency graph before doing any inlining. Files are kept track of only by name, so trying to include both a/a.ap and b/a.ap will have undefined results. When the compiler merges file B in after A (because B includes A), B's declarations are appended to A's declarations and B's expression is appended to A's expression using a sequencing operator. Consider the following example:

/* inlineA.ap */ unit afunc() (()) "A" /* inlineB.ap */ include "inlineC.ap" include "inlineA.ap" unit bfunc() (()) "B" /* inlineC.ap */ include "inlineA.ap" unit cfunc() (()) "C" /* inlineD.ap */ include "inlineB.ap" unit dfunc() (()) "D"; 42

Compiling inlineD.ap will yield the following program:

unit afunc () (()) unit cfunc () (()) unit bfunc () (()) unit dfunc () (()) "A"; "C"; "B"; "D"; 42

By default, the compiler searches in the lib and . directories for includes. You can add more include directories with the -i option.

AT's typechecker requires explicit typing annotations. The relevant rules in the grammar for types are:

T2: | KW_INT { Apast.TInt } | KW_UNIT { Apast.TUnit } | R { Apast.TRef $1 } | OP_LPAREN ty OP_RPAREN { $2 } R: | T2 OP_LBRACKET OP_RBRACKET { Apast.TArray $1 } | KW_STRING { Apast.TString }

That is, you can form regular types (unit, string, and int). You can also form arrays using Java-style syntax: a single-dimensional array of int is int[]; a multi-dimensional array is int[][]. Some types, like unit[], are not well-formed. These are caught during typechecking.

A declaration can either declare a new function or a new constant. Names for functions and constants must be unique (you may not have a function named foo and a constant named foo, or two functions named foo or two constants named foo). The order of declaration does not matter for functions, but it does matter for constants (see the information on mutual recursion below).

The declaration of a constant looks like:

KW_CONST ty IDENT OP_ASSIGN E1 OP_SEMI Which, in a source file, would turn into: const int aint = 1;

Note the mandatory semicolon. A function declaration looks like:

ty IDENT OP_LPAREN arglist OP_RPAREN OP_LPAREN exp OP_RPAREN And in the source file: int plusone(int a, int b) ( a + 1 )

Note that the parenthesis surrounding the function body are mandatory and that there is no semicolon following the declaration. Some functions may be declared as external (because they appear in the runtime):

ty IDENT OP_LPAREN arglist OP_RPAREN KW_EXTERN

And in code:

int printinc(int a, int b) extern

There are several new syntactic forms beyond those that were in the control flow language. These include:

  • String literals: "Like this." The escape codes \n, \t, \\ and \" are supported.
  • The null keyword: null can take on any reference type.
  • The unit value: As in OCaml, unit looks like () and has the type unit. This value is useful only where you need a unit return value (as in a function with side effects) or for ending a file meant only to be included by other files.
  • Function calls: These look like C or Java function calls: foo(a, b, c) calls function foo with the arguments a, b, and c.
  • Array indexing: Like C or Java: arr[i].
  • Array allocation: Like Java, but with a twist: you must provide an initial value that is copied to every element of the array. For example, to allocate a new array of ints (all equal to zero), use new int[42](0).

The topmost expression of a program must return something of type int. This becomes the return code of the program.

There's quite a bit of code in the provided package. To get an idea of how it all fits together, we can track the progress of a program being compiled.

  • arrproc.run is invoked on the command line.
  • Arrproc_main parses the command-line arguments and configures the compiler settings in Arrproc_driver. It then invokes Arrproc_driver.execute ().
  • Arrproc_driver does its work in do_one_file. At this point, the program is represented only by its filename.
  • Arrproc_driver.preprocess opens the program file and passes it to the lexer and parser. These components work together to turn the program into a Range.t Apast.preprog. The Range.t means that the program does not currently have type information, but it does have information that will help us print error messages to the user. An Apast.preprog contains the program's include file list, its declaration list and its expression.
  • preprocess uses the include component of the preprog to determine which other files to load (and in what order). Once all of this is worked out, it combines the files together (in the manner described above in the 'Includes' section) to produce a Range.t Apast.prog. A prog is much like a preprog except a prog does not have any includes.
  • do_one_file then passes the Range.t prog to Aptypecheck.typecheck (the first function you are responsible for building). This function endeavors to associate types with each node in the prog's AST. This change is reflected in the type of the program. Consider the signature of typecheck: val typecheck : Range.t Apast.prog -> typed Apast.prog typecheck takes a Range.t Apast.prog (or an untyped program) and produces a typed Apast.prog (or a typed program). typed here means (Range.t * Apast.typ). This is a pair of the original location information with the type associated with an AST node.
  • Now that it has a typed prog, do_one_file passes the program on to Arrproc.il_of_ast (the second function that you are responsible for completing by finishing some cases in rev_il_of_ast). il_of_ast's type looks like: val il_of_ast : Aptypecheck.typed Apast.prog -> (Apil.Il.label,unit) Apil.IlCfg.app il_of_ast takes a typed program and produces an app which uses regular labels and does not contain any additional tagging information on blocks. An app contains one or more procs, each of which may refer to one another by label.
  • Now that the app has been created, do_one_file lowers the IL to X86lite and passes it to the compiler backend, which builds the executable and links it with the runtime system.

This is a group project. Groups of size 1 or 2 are acceptable.

Constants and functions both exist in the same global namespace and all global names must be unique (there is no 'shadowing'). A function may not share its name with a constant. Locally bound variables may shadow constants; you may have a locally bound variable with the same name as a global function, but that variable will not shadow the function.

Function declarations are mutually recursive: the body of any function can refer to the name of any other function irrespective of their order of declaration.

Constant declarations are not mutually recursive. A constant C2 may refer to a constant C1 only if C1 was declared before C2. In addition, constants may not refer to any functions.

Complete typecheck as defined in aptypecheck.mli. There is some code in aptypecheck.ml to get you started. Part of your grade for this section depends on the quality of the error messages you raise for the user. Simply printing Type error. for badly-formed programs is insufficient. Give more details about what kind of error occurred.

Typechecking constants occurs under the empty binding context. The signature context contains nothing but constants that have been declared previously to the constant being checked.

well-formed(T) e : E E = T const T v = e; is OK

Typechecking functions occurs under the binding context consisting only of that function's argument names. The signature context contains all constants and all functions (including the one being checked).

well-formed(T) or T is TUnit a1 ... aN are unique A1 ... AN are well-formed a1:A1, a2:A2, aN:AN e : E E = T T v(A1 a1, A2 a2 ... AN aN) (e) is OK
  • If the function body is None (as in an extern function), just pass on the None through the typechecker.

You are given functions for testing type equality, well-formedness (of types for arguments, arrays and variables), and join ().

The following inference rules apply to expressions under a variable binding context and an implied signature context S. is read as 'any operator'.

integer-literal : TInt string-literal : TRef (TString) () : TUnit null : TRef (TNull) x : X x : X x constant x : X S x : X a : A b : B a; b : B well-formed(T) i : I I = T , v:Te : E let T v = i in e : E v : T e : E T = E v = e : T a : TRef (TArray T) i : TInt e : E T = E a[i] = e : T a : TInt b : TInt is a binop or a scbinop a b : TInt a : A b : B A is TRef A' B is TRef B' A = B is == or != a b : TInt e : TInt is a unop e : TInt c : TInt l : L r : R T is L R if(c) l else r : T c : TInt e : E while(c) e : TUnit c : TInt i : I e : E for(c,i) e : TUnit c1 ... cN unique e : TInt eM : TM ( M [1,N]) eD : TD T is T1 T2 ... TN TD switch(e) c1 => e1 ... cN => eN default => eD : T f has return type T and arg types A1 ... AN eM : TM ( M [1,N]) TM = AM ( M [1,N]) f(e1 ... eN) : T a : TRef (TArray T) i : TInt a[i] : T well-formed(T) i : TInt e : E T = E new T[i](e) : TRef (TArray T)
  • Variables, constants, and function names will appear as Ident AST nodes.
  • For T-Assign-Var, you can assign the variable v its type T.
  • For T-Assign-Arr, you can assign the array a its type TRef (TArray T)
  • For T-For, if either i or c are None , just pass None into the typed AST. (It is not a type error to be missing those components.)
  • In T-Invoke-Name, the name of the function to invoke will appear as an Ident AST node.
  • The return type of a program's expression (the code that does not appear in any function or constant) must be TInt.

Complete il_of_ast as defined in arrproc.mli by filling in the missing cases for rev_il_of_ast.

To compile code that deals with arrays, you will need to use the new InvokeCdecl, CheckedDeref and CheckedMove IL instructions.

  • For array indexing (a[i]), evaluate a, then evaluate i.
  • For array creation (new T[i](e)), evaluate i, then evaluate e. The runtime system exposes a function called newarr that takes two arguments: the first is the length of the array and the second is the array's initialization value. You must call this function to allocate the array.
  • Assignment always evaluates its right hand side first. If the left hand side is a variable, set the variable. If the left hand side is an index expression a[i], first evaluate a, then evaluate i.
  • All array operations should be checked for nullness (eg, you should always set the null check flag on the IL instructions).

To compile code that deals with procedures, you will need to use the new InvokeCdecl IL instruction. Instead of building a single proc as you did in the previous project, this time around you will map each non-extern function to its own proc and pull everything together into an app. You will need to declare at least one special proc that first initializes the program's constants in order and then executes the program's expression. The provided code takes care of this.

  • Function arguments in Invoke AST nodes are evaluated from left to right.
  • The cdecl calling convention gives certain rules for decorating function names. When you want to create a name for or call a non-extern function, create a label using Apil.decorate_internal_function. If you want to call an extern function (including functions from the runtime system), create a label using Apil.decorate_external_function. Both of these functions use mk_label_named, so you need to make sure (for internal functions) that function names are distinct. The provided code takes care of decorating functions in the code generation context.
  • Binary operators always evaluate their left-hand sides first, then their right-hand sides.
  • Variable bindings in for loops appear (and are executed) in order. The expression for(int x = 1, int y = 2, int z = 3;;) () is equivalent to (int x = 1; int y = 2; int z = 3; for (; ; ) (()))
  • null has a runtime representation of 0.
  • () (unit) has no runtime representation.
  • The return values of && and || are only defined up to zero or nonzero. 0 || 6 could be 1 or 6 (or even 42), but it can't be 0.

You can receive extra credit for this project for any of the following:

  • Improvements to the runtime system.
  • An especially clever or clear typechecker implementation.
  • An especially clever or clear IL generator.
  • A rigorous test suite.
  • An interesting program implemented in the AT language that demonstrates new features. (We're looking for something closer to the Game of Life or robotfindskitten examples and farther from factorial.)

If you have an ambitious idea, you should check with us to see how much we think it could add to your grade.

Edit the included readme.txt file, making sure to fill in your name(s) and email address(es). If you want any part of your project to count for extra credit, be sure to document it.

You should also spend some time to ensure that your code is readable and well-commented. We have some suggestions and requirements for your programming style. You will definitely lose points for the file submission requirements (code must compile, 80 column limit, no tabs); however, barring any particularly egregious cases, you may take the remaining rules as suggestions.

Part of your grade will be based on testing. Some example tests and utility functions are provided in arrproc_test.ml. You must use the testing framework provided and explicitly note any modifications in your readme.txt file. We ask that you name your test cases regularly: choose a username from those in your group. All test names and files should contain that username followed by an underscore as a prefix. (For example, if your username is cat and you have written a test case called alive, that test should be called cat_alive.)

  • For this assignment, we require that you define at least 10 new unit tests. (These must be sufficiently different from one another to count— tests like 1 + 1 is 2, 1 + 2 is 3, 1 + 3 is 4 are too similar to be useful).
  • In addition, you should write (and identify in your documentation) an eleventh more difficult test. Although anything will do, it is better if this test exercises most of the project's functionality. We will collect everyone's difficult tests and run them as part of our standard test suite during grading.
  • You should contribute at least one small test to the mailing list and are encouraged to send in test cases to highlight parts of the specification about which you may have questions. It's OK to count tests from the mailing list toward your ten (but you must write your own unique larger test case).
  • If you have built additional functionality that is beyond what is explicitly specified in the assignment description, make sure that the tests you count toward this test requirement do not depend on those extensions.
  • All test cases that you count toward the testing portion of your grade must be fully automatic and must check for some result. The tests must run when we call omake test from the arrproc directory.

The grading breakdown for this project is as follows:

  • ~60%: Typechecking.
  • ~20%: New code generation.
  • ~10%: Code style, readability and documentation.
  • ~10%: Testing.

Extra credit will vary depending on what you implement. Remember to contact us before putting in the time!

Remember that your raw score will be scaled— that is, more difficult assignments will be worth more than less difficult assignments, even if the more difficult assignment has fewer available points.

arrproc.run, the executable the OMakefile will produce, offers several helpful command line flags that might make debugging easier. These are:
  • -c: cleans the temporary directories c_bin and c_obj.
  • --printasm: prints the generated assembler (in AT&T syntax) to the console.
  • -p: prints the generated IL to the console.
  • -r: runs the resulting executable.
  • -d: runs the resulting executable in the debugger, setting a breakpoint at the start of your generated code.
  • --ddd: runs the program in DDD, a graphical interface to gdb.
A good combination of flags for this assignment is: ./arrproc.run -c -p -r testfile.ap

First, make sure to clean your project with omake clean. All of the files you've written should be in projects/arrproc. Submit your assignment on ENIAC. Your project must run on ENIAC, including all test cases. Be sure to submit from ENIAC and not from some other Linux machine; otherwise you will get an error.

Switch to the projects directory and type: turnin -c cis341 -p project4 arrproc Verify that you've submitted your files with: turnin -c cis341 -v -p arrproc You can re-submit your work any number of times, but only the last submission will be counted. New submissions completely overwrite old submissions— so if you're missing a single file, make sure to resubmit your entire project. Check the main course page for information about late policies.