Project 5 Index

This is a big specification and there is a lot of code for you to pick through! Don't hesitate to ask for clarification on any part of the project (up to and including requests for better documentation for some part of the provided code).

Extend your arrays-and-procedures compiler to support subtyping, nullable types, and objects. Your compiler will accept a program of the form: and, for this example, will produce a program that prints arf moo This dog is stray! to the terminal.
  • (11/7 0:45) The subtyping relation now includes the rule S-Ref-NullRef, which will allow you to check programs like provided_notnull.oat without problems. You will not be penalized if you do not implement this rule.
  • (11/6 22:00) There is a bug in the compiler framework that affects division of negative numbers. You can download a patch here. Extract it at your cis341 root. You will not be penalized if you do not apply the patch; we will apply it to your project before grading.
  • (11/2 23:09) We've uploaded binaries of our solution. Extract these as you normally extract project packages. Since the make scripts don't run omake, you will have to be sure that you can compile your own full implementation before linking the files we provide.
project5.tar.gz contains all that you need to get started with this assignment. Extract it to your projects directory such that full.ml is located at cis341/projects/full/full.ml. (All files are stored in the tarball relative to cis341, so switching to your cis341 directory and running tar xvf project5.tar.gz should do the trick.) Edit the OMakefile at cis341/projects/OMakefile by adding the new directory to the .SUBDIRS declaration: PROJECT_SUBDIRS[] = x86interp expressions controlflow arrproc full

This package depends on the files from Project 4.

Remember that you may change the provided code so long as you keep the interfaces the same (so that we can use everyone's testers on everyone's assignments). If you invested a lot of time into writing a good algorithm for code generation for the last assignment, you should definitely bring it over for this assignment.

In Oat, identifiers that begin with uppercase letters are types; identifiers beginning with lowercase letters are non-types. In particular, in all of the places you saw an IDENT in the previous assignment, you are now required to use a lower-case identifier.

Programs in Oat have the following top-level structure:

toplevel: | includes consts decls exp

It is now mandatory that all constant definitions be floated to the top of the file. Constants are still non-mutually-recursive; they may also not refer to any classes save for Object. Class declarations, which are covered in detail below, are mutually recursive (modulo some details) with other class declarations and function declarations.

new[len](exp)'s semantics have changed: exp is now evaluated for every location a0..aLEN-1 in the new array (in order). Code for this is provided.

The types from the previous assignment return, along with some new ones:

T2: | KW_INT { Fullast.TInt } | KW_UNIT { Fullast.TUnit } | R { Fullast.TRef $1 } | R OP_QUESTION { Fullast.TNullRef (Fullast.TMaybe $1) } | OP_LPAREN ty OP_RPAREN { $2 } R: | T2 OP_LBRACKET OP_RBRACKET { Fullast.TArray $1 } | KW_STRING { Fullast.TString } | UC_IDENT { Fullast.TClass (snd $1) }

Note that references (R) have been split between not-null references and nullable references. A regular string may now not take on the value null; however, the nullable string? may do so.

Oat provides a safe way to cast from a nullable type to a regular type. This is demonstrated in the example code above:

if?(string owned = adog.owner) ( print_string(string_cat("This dog is owned by ", owned)) ) else ( adog.owner = "Timmy"; print_string("This dog is stray!") ) At runtime, adog.owner is checked for nullness. If it is not null, then owned is set to its current value and the first branch of the if? is evaluated. If it is null, then owned is not bound and the second branch is evaluated.

Classes must have superclasses, constructors, and unique names:

class Dog <: Animal { string? owner; new(string name)(name) ( () ) string noise() ( "arf" ) }

Classes may also have zero or more methods and zero or more fields. The semicolon after a field definition is mandatory (as with constants) and the parenthesis around methods are mandatory (as with toplevel functions). Methods may be overridden provided that the return type of the derived method is narrower and the argument types of the derived method are wider. A field name, once declared in some ancestor class, may not be re-declared in any child of that ancestor.

Since Oat does not support overloading or static methods, every class must have exactly one constructor (which is declared with the new keyword). This constructor can accept arguments (the first pair of parenthesis) and pass those arguments to a superclass constructor.

At runtime, Oat always checks that notnull members of a new object are initialized correctly when the object's constructor exits.

Oat allows checked downcasts for objects at runtime:

cast(Dog adog = d) ( "It's a dog!" ) else ( "It's not a dog!" )

If at runtime the Object d is discovered to have type Dog, adog is set to d's current value and the left-hand branch of the cast is evaluated. If not, the right-hand branch is evaluated (and adog is not bound).

There are several new syntactic forms beyond those that were in the arrproc language. These include:

  • Field access: foo.bar.baz. Associates left to right, just as in Java. While inside a class member function, you must refer to fields through this (this.foo).
  • Member invocation: foo.bar(). As with fields, to call methods within a member function you must call them through this (or super): this.foo()
  • Calling constructors: new Foo(bar, baz).
  • Nullness and dynamic type checks: covered above.
  • Special functions: there are two builtin functions, arrlen (which returns the length of any notnull array) and fail (which takes a type and a single string argument, returns something of that type, and immediately ends the program printing an error at runtime). Since Oat's type system is not polymorphic, we need to bake these into the typing rules.

Aside from some updates to the core record types to support objects, the story of compilation hasn't changed very much since Project 4.

This is a group project. Groups of size 1 or 2 are acceptable.

Complete typecheck as defined in fulltypecheck.mli. There is some code in fulltypecheck.ml to get you started. Part of your grade for this section depends on the quality of the error messages you raise for the user. Simply printing Type error. for badly-formed programs is insufficient. Give more details about what kind of error occurred.

is pronounced 'join'; is read as 'any operator'. Type equality = is equivalent to OCaml structural equality on type ASTs.

There are three different judgement types :

  • TsNormal is the regular .
  • TsMethod C is for typechecking inside methods of some class C.
  • TsCtor (P,C) is for typechecking inside the constructor of some class C with parent class P.

Oat's rules for subtyping () follow. Subtyping depends on a signature context .

T = S T S T TTop class_subtype T S TRef (TClass T) TRef (TClass S) TNullRef TNull TNullRef (TMaybe T) class_subtype T S TNullRef (TMaybe (TClass T)) TNullRef (TMaybe (TClass S)) class_subtype T S TRef (TClass T) TNullRef (TMaybe (TClass S)) TRef T TNullRef (TMaybe T)
  • class_subtype T S determines if class S is a supertype of class T based on the signatures in .

Function names must be unique and they may not share names with constants. Class names must also be unique. Parent classes must be declared before child classes (so you don't have to check for cycles in the inheritance hierarchy).

We suggest the following multi-pass algorithm for typechecking:

  • Check all constants in turn, adding each one to .
  • Check and register all class names and supertype relationships.
  • Check and register all class and function signatures.
  • Check all code.

Typechecking constants occurs under the empty binding context. The signature context contains nothing but constants that have been declared previously to the constant being checked and no classes but Object. Constant names should be unique. should be TsNormal.

well-formed(T) prev-constantse : E E T const T v = e; is OK

Registering a class name requires that you verify that the class name is unique and that the class's superclass exists. Checking the well-formedness of a class signature is a little more involved:

  • A field F in class C may not shadow any other fields declared in C, nor may it shadow any fields declared in any superclass of C.
  • All fields must have well-formed types. Fields may not have type TUnit.
  • A member function with name M may be declared up to once in a given class C.
  • Overriding member functions must follow the Override rule (where is defined on ): T S SM TM ( M [1,N]) T f(T1 t1 ... TN tN) may override S f(S1 s1 ... SN sN)
  • Member functions are subject to the same well-formedness requirements as top-level functions, but member functions may not be extern.
  • Every class must have exactly one constructor.

Typechecking functions occurs under the binding context consisting only of that function's argument names. The signature context contains all constants, all functions (including the one being checked), and all classes. This is nearly the same as it was in the last assignment but with the addition of subtyping and a parameterized judgment.

well-formed'(T) a1 ... aN are unique A1 ... AN are well-formed a1:A1, a2:A2, ..., aN:AN e : E E T or T = TUnit T v(A1 a1, A2 a2 ... AN aN) (e) is OK with return type T

Checking a DeclFunction uses rule T-DFuncMethodCtor and takes place under = TsNormal.

Checking a MemberFunction on a class C uses rule T-DFuncMethodCtor and takes place under = TsMethod C

Checking a MemberConstructor on a class C with parent P uses rule T-DFuncMethodCtor (ignoring the initialization list and the return type) and takes place under = TsCtor (P,C). In addition, the constructor's initialization list must follow the rule CtorInit with = TsNormal:

P defines a constructor with argument types P1 ... PN ,a1:A1, ..., aX:AXeM : TM ( M [1,N]) TM PM ( M [1,N]) new (A1 a1 ... AX aX)(e1 ... eN) is a valid constructor header for class C with parent P

Oat's rules for typing expressions depend on a signature context , a binding context , and a judgment type . Most rules behave the same under any judgment type. Take here to mean defined on the context .

The following rules are brand new for this assignment.

well-formed'(T) ,s : S S TRef TString ,fail T (s) : T ,a : TRef (TArray A) ,arrlen(a) : TInt ,l : TRef (TClass C) holds that C has method f with return type T and arg types A1 ... AN ,eM : TM ( M [1,N]) TM AM ( M [1,N]) ,l . f(e1 ... eN) : T ,l : TRef (TClass L) holds that L has field f with type F ,l . f : F well-formed(TRef N) ,i : TNullRef (TMaybe I) TRef I TRef N ,, v : TRef Nl : L ,r : R T is TUnit if L or R is TUnit; else L R well-formed'(T) ,if?((TRef N) v = i) l else r : T well-formed(TRef (TClass N)) ,i : TRef (TClass I) TRef (TClass N) TRef (TClass I) ,, v:TRef (TClass N)l : L ,r : R T is TUnit if L or R is TUnit; else L R well-formed'(T) ,cast(TRef (TClass N) v = i) l else r : T holds that C's constructor has arg types A1 ... AN ,eM : TM ( M [1,N]) TM AM ( M [1,N]) ,new C(e1 ... eN) : TRef (TClass C) ,l : TRef (TClass L) ,e : E holds that L has field f with type F E F ,l . f = e : F

The following rules differ in behavior depending on the judgment type.

= TsMethod C:

shows that C has superclass S ,super : TRef (TClass S) ,this : TRef (TClass C)

= TsCtor (P,C):

,e : E holds that C has field f with type F E F ,this . f = e : F ,super : TRef (TClass P)

The next group of rules are old rules that have been updated for this assignment.

,integer-literal : TInt ,null : TNullRef TNull ,string-literal : TRef (TString) ,() : TUnit x : X , x : X x constant x : X , x : X ,a : A ,b : B ,a; b : B well-formed(T) ,i : I I T ,, v:Te : E ,let T v = i in e : E v : T ,e : E E T ,v = e : T ,a : TRef (TArray T) ,i : I ,e : E I TInt E T ,a[i] = e : T ,a : A ,b : B A TInt B TInt is a binop or a scbinop ,a b : TInt ,a : A ,b : B is == or != A is TRef A' or TNullRef A' B is TRef B' or TNullRef B' ,a b : TInt ,e : E E TInt is a unop ,e : TInt ,c : C C TInt ,l : L ,r : R T is TUnit if L or R is TUnit; else L R well-formed'(T) ,if(c) l else r : T ,c : C C TInt ,e : E ,while(c) e : TUnit ,c : C C TInt ,i : I ,e : E ,for(c,i) e : TUnit c1 ... cN unique ,e : E E TInt ,eM : TM ( M [1,N]) ,eD : TD T is T1 T2 ... TN TD or TUnit if any of T1 ... TN, TD is TUnit well-formed'(T) ,switch(e) c1 => e1 ... cN => eN default => eD : T holds that f has return type T and arg types A1 ... AN ,eM : TM ( M [1,N]) TM AM ( M [1,N]) ,f(e1 ... eN) : T ,a : TRef (TArray T) ,i : I I TInt ,a[i] : T well-formed(T) ,i : I I TInt ,e : E E T ,new T[i](e) : TRef (TArray T)
  • well_formed is the standard well-formedness relation; well_formed' allows TUnit to be well-formed.
  • The typechecker should convert Fail nodes to Invoke nodes (that invoke the function named fail).

Fullil defines the record classtbl, which you should fill out with the vtable and runtime type information for each class:

type classtbl = { ct_name_lbl : Il.label; (** .cname.CLASSNAME *) ct_super_lbl : Il.label; (** .super.CLASSNAME *) ct_dispatch_lbl : Il.label; (** .dispatch.CLASSNAME *) ct_class_name : string; (** class name *) ct_supertbl_lbl : Il.label option; (** super's dispatch table label *) ct_dispatch_tbl : Il.label list (** dispatch table *) }

For example, the fully-specified classtbl for the class:

class WindupDog <: Dog { new(string name)(name) ( () ) int wound; unit wind() ( this.wound = this.wound + 1 ) }

is:

ct_name_lbl = __.cname.WindupDog74 ct_super_lbl = __.super.WindupDog73 ct_dispatch_lbl = .dispatch.WindupDog ct_class_name = "WindupDog" ct_supertbl_lbl = .dispatch.Dog ct_dispatch_tbl = [__fun__Dog.noise; __fun__WindupDog.wind]

Which, in assembled form, becomes:

__.cname.WindupDog74: .long 9 .ascii "WindupDog\0" __.super.WindupDog73: .long .dispatch.Dog .dispatch.WindupDog: .long __fun__Dog.noise .long __fun__WindupDog.wind

An object of type WindupDog will at runtime occupy 3 words. It is first allocated using the newcls runtime service routine, which returns a pointer to the second word. Thus, the field numbers (for FieldDeref, FieldMove and FieldNullCheck) look like:

.dispatch.WindupDog; Animal.name, WindupDog.wound -1 0 1
  • The ordering of the vtable is significant. Methods are called by slot number using the InvokeVirtual instruction.
  • The constructor does not appear anywhere in the classtbl. Constructors are always called statically.
  • The pointer to the superclass comes as the word directly before the first entry of the dispatch table. If you have a pointer to the dispatch table, you can access this pointer using the FieldDeref instruction with an offset of -1. This pointer will be null (0) for the classtbl of Object.

The Oat calling convention for methods requires that the this pointer be passed on the stack as though it were the first parameter to a cdecl function. The argument is given the name .this so as to prevent possible clashes with user-provided argument names. Method labels should be decorated with the decorate_method_name function.

Calling a method C.f() requires that you look up the index for f in C's vtable at compile time. You can then use this index to emit an InvokeVirtual instruction, which takes care of both the runtime lookup in the vtable and adding the this pointer to the argument list.

Oat allows a child class C to invoke any of its parent class's member functions, even if C overrides one of these functions. For this it provides the super keyword. super.f() requires that you make a regular InvokeCdecl call to f. You should use get_static_decorated_method to get the label for the method from the superclass (which should be available from the typing information on the Super node). You will need to explicitly pass the this pointer as the first argument to the InvokeCdecl instruction.

Assigning into fields and reading from fields requires the use of the FieldMove and FieldDeref instructions. For some field C.f, you will need to look up the index of f in C's runtime object representation.

As stated before, objects are allocated at runtime using the newcls service routine. newcls takes a single argument that represents the number of words to allocate including the word for the vtable pointer. It returns a pointer to the word directly following the vtable pointer.

Constructors are always statically linked and called with InvokeCdecl. If you have a cgcxt_class record, you can get the label for its constructor using the decorate_ctor_name function. Like member functions, their first argument is always .this, which starts out as a pointer into an allocated but uninitialized object record. The return value of a constructor is always ignored.

The base case for any object construction is always the constructor for class Object. This constructor needs only set the vtable of the passed-in .this to Object's vtable.

For any other constructor on class C with parent P:

  • Evaluate all of the initializer arguments in the context including the constructor's arguments. Use these initializer arguments to call P's constructor. Remember to pass the .this pointer as the first argument. (This is how we passed the name into Animal in the introductory demo.)
  • When P's constructor exits, the vtable set on .this will be P's vtable. Evaluate the expression in C's constructor.
  • After evaluating the expression, check that all non-nullable reference fields have been initialized. The IL provides an instruction FieldNullCheck that takes a list of non-nullable reference field indices specifically for this purpose.
  • Finally, set .this's vtable pointer to C's vtable.
  • fail: fail should have been converted to a regular Invoke during typechecking.
  • super.f(): Covered above. Arguments to the function should be evaluated from left to right.
  • a.f(): Covered above. Evaluate the object (a) first, then evaluate the arguments from left to right.
  • a.f = b: Evaluate the object (a) first, then evaluate the value to set (b). You can move values into fields using the FieldMove instruction.
  • new C(): Covered above. Evaluate the arguments from left to right.
  • a.f: Evaluate the object (a), then use the FieldDeref instruction to retrieve the data stored in f.
  • if?(T v = a) (l) else (r): First evaluate a. If a is not null, evaluate l with v bound to a and use the result. Otherwise evaluate r and use the result. If the if? was assigned the type TUnit, then you should discard any result after evaluation.
  • cast(T v = a) (l) else (r): First evaluate a. Determine if the runtime type of a is a subtype of T. (Remember that only objects can be used in cast.) If so, evaluate l with v bound to a and use the result. Otherwise evaluate r and use the result. If the cast was assigned the type TUnit, then you should discard any result after evaluation.

You can receive extra credit for this project by extending the language to support first-class functions. We have provided AST nodes for function types and function expressions. You do not need full support for closures to be awarded extra credit points, but you should let us know about your plans so that we can tell you how many points you might be given.

The different parts that you can complete include:

  • Written-out rules (in your readme.txt) for typechecking function expressions (fun(int x, int y) (x + y)) and for typechecking closure application (a(1,2), where a is a closure with the appropriate type).
  • Implementation of these rules in fulltypecheck.ml.
  • Closure conversion and lifting. We provide a function lift_typed in the module Fulllift that is called by the compiler after typechecking but before IL generation. From there you should transform the closure-containing code to closure-free code.

You should target using only explicit fun expressions to introduce closures; don't try to give anything that can be executed a function type. You do not need to support, for instance, storing a methods (some_fun = obj.method) or top-level functions (some_fun = string_cat). While you should allow closures to participate in subtyping as was discussed in class, you should not attempt to allow closures to participate in runtime type tests (cast()).

Edit the included readme.txt file, making sure to fill in your name(s) and email address(es). If you want any part of your project to count for extra credit, be sure to document it.

You should also spend some time to ensure that your code is readable and well-commented. We have some suggestions and requirements for your programming style. You will definitely lose points for the file submission requirements (code must compile, 80 column limit, no tabs); however, barring any particularly egregious cases, you may take the remaining rules as suggestions.

Part of your grade will be based on testing. Some example tests and utility functions are provided in full_test.ml. You must use the testing framework provided and explicitly note any modifications in your readme.txt file. We ask that you name your test cases regularly: choose a username from those in your group. All test names and files should contain that username followed by an underscore as a prefix. (For example, if your username is cat and you have written a test case called alive, that test should be called cat_alive.)

  • For this assignment, we require that you define at least 10 new unit tests. (These must be sufficiently different from one another to count— tests like 1 + 1 is 2, 1 + 2 is 3, 1 + 3 is 4 are too similar to be useful).
  • In addition, you should write (and identify in your documentation) an eleventh more difficult test. Although anything will do, it is better if this test exercises most of the project's functionality. We will collect everyone's difficult tests and run them as part of our standard test suite during grading.
  • You should contribute at least one small test to the mailing list and are encouraged to send in test cases to highlight parts of the specification about which you may have questions. It's OK to count tests from the mailing list toward your ten (but you must write your own unique larger test case).
  • If you have built additional functionality that is beyond what is explicitly specified in the assignment description, make sure that the tests you count toward this test requirement do not depend on those extensions.
  • All test cases that you count toward the testing portion of your grade must be fully automatic and must check for some result. The tests must run when we call omake test from the full directory.

The grading breakdown for this project is as follows:

  • ~40%: Code generation.
  • ~40%: Typechecking.
  • ~10%: Code style, readability and documentation.
  • ~10%: Testing.

Remember that your raw score will be scaled— that is, more difficult assignments will be worth more than less difficult assignments, even if the more difficult assignment has fewer available points.

full.run, the executable the OMakefile will produce, offers several helpful command line flags that might make debugging easier. These are:
  • -c: cleans the temporary directories c_bin and c_obj.
  • --printasm: prints the generated assembler (in AT&T syntax) to the console.
  • -p: prints the generated IL to the console.
  • -r: runs the resulting executable.
  • -d: runs the resulting executable in the debugger, setting a breakpoint at the start of your generated code.
  • --ddd: runs the program in DDD, a graphical interface to gdb.
A good combination of flags for this assignment is: ./full.run -c -p -r testfile.oat

First, make sure to clean your project with omake clean. All of the files you've written should be in projects/full. Submit your assignment on ENIAC. Your project must run on ENIAC, including all test cases. Be sure to submit from ENIAC and not from some other Linux machine; otherwise you will get an error.

Switch to the projects directory and type: turnin -c cis341 -p project5 full Verify that you've submitted your files with: turnin -c cis341 -v -p project5 You can re-submit your work any number of times, but only the last submission will be counted. New submissions completely overwrite old submissions— so if you're missing a single file, make sure to resubmit your entire project. Check the main course page for information about late policies.