Project 3 Index
Build a compiler that supports integer expressions, lexically-scoped integer variables and local control flow. Your compiler will accept source files of the form int acc = 1; int f = 6; while (f > 0) ( acc = acc * f; f = f - 1 ); acc and will produce executables that print (on standard out): Result: 720
  • (10/4 13:21) switch should be in exp_noseq, not exp_seq.
  • (10/1 21:22) Rules for ; and var_decl in the summary grammar were made more explicit with regard to exp_noseq. This has the effect of forcing the required right-associativity of ;.
project3.tar.gz contains all that you need to get started with this assignment. Extract it to your projects directory such that cfast.ml is located at cis341/projects/controlflow/cfast.ml. (All files are stored in the tarball relative to cis341, so switching to your cis341 directory and running tar xvf project3.tar.gz should do the trick.) Edit the OMakefile at cis341/projects/OMakefile by adding the new directory to the .SUBDIRS declaration: PROJECT_SUBDIRS[] = x86interp expressions controlflow

This package depends on the files from Project 2.

This is a group project. Groups of size 1 or 2 are acceptable.

A complete lexer for this project is included in lexer.mll (incuding rules for the extra credit). You should understand this file, but do not modify it.

Find in cfast.ml a specification of the AST for this assignment. There are already rules in parser.mly and lexer.mll for arithmetic. You must add the following rules, summarized below:

int tyIDENT=exp_noseq exp_noseq;exp var_decl;exp exp_noseq IDENT exp_noseq=exp_noseq old_arithmetic_rules ifexp then exp_noseq elseexp_noseq whileexpexp_noseq for var_decl; exp_noseq; exp_noseq exp_noseq (exp)
  • For hints on precedence levels, look at the function prec_of_exp in cfast.ml.
  • = is right-associative: a = b = c is a = (b = c)
  • ; from the sequencing rule is also right-associative. This is consistent with the ; in variable binding expressions. The ;s in a for loop are used purely as delimiters and therefore do not have any associativity.
  • The var_decl part of for can be zero or more comma-separated variable declarations.
  • The trickiest part of the grammar is the solution to the dangling-else ambiguity introduced by if. Try to figure this out first before adding rules for the other control flow structures. An else should always bind to the closest if.
  • If you do come across an if without an else, fill in the else branch of the AST node with a Cfast.Unit.
  • You may have noticed that the AST node for for has no space for storing variable declarations. During parsing, you should lift any variables declared in the for loop outside the For node as Let nodes.
  • As in the expressions assignment, you may not use ocamlyacc's precedence directives.

Instead of directly building X86 assembler as you did in the expressions assignment, this time you'll be generating intermediate code. The abstract instructions are defined in cfil.mli. These are mapped to X86 instructions in the file cfilerase.ml, which you may read at your leisure. All of your code generation logic belongs in controlflow.ml, where you must implement il_of_ast as defined in controlflow.mli.

We have provided some code to get you started. You don't need to use our definitions if you have a different design in mind.

To store temporary values or the values of variables in this assignment, you should use space on the stack. The module Cfil defines the type svar to represent a single 4-byte stack location. You can generate unique svars with mk_svar (). You'll have to keep track of the svars you use, as when you finally construct the proc containing your IL, you will need to specify all of the stack locations you use as an svar list.

You don't have to recycle svars after you're finished with them, but doing so will result in better code (since we aren't doing any optimization).

The let node, like OCaml's let, introduces a scoped variable binding. A variable's name bound by some let is only valid inside the expression wrapped by that let. It is not an error for one let to shadow another let's binding; that is, a program like this:

int i = 1; int j = 2; ( int i = 3; i + j ) + i

is perfectly valid (and has a return value of 6).

Note that, as in C and Java, the expression a = 3 returns the value 3 (and not unit as would a := 3 in OCaml).

Module Cfil expects you to generate program code in basic blocks. Since this can be inconvenient, we suggest that you first generate your program as a flat list of instructions. After this step, you can easily convert the list to a control flow graph. Look at the type cgext we have provided and the IL pseudo-instruction Ext for hints. Also remember that you can create fresh (that is, uniquely named to the current run of the compiler) labels using X86struct.mk_label ().

Seq evaluates its left hand side, discards the result if there is any, then evaluates its right hand side.

If evaluates its guard. If the guard is nonzero, the If evaluates and uses the result of its left branch. If the guard is zero, it evaluates and uses the result of its right branch.

Since this language does not require any static checking, it is possible to write unsafe programs. In particular, you can write a program of the form int a = ( if (0) 1 ); a Your code will not be tested for this case; you may assume that if one branch of an if returns unit, the other branch also returns unit (ditto with ints). We will investigate methods for static checking in future assignments; this is the last assignment for which you can assume well-formed input.

While evaluates its guard. If the guard is nonzero, the body of the While is executed and its result is discarded; control flow then returns to the start of the loop. If the guard is zero, then the body of the While is skipped.

For evaluates its condition (or uses 1 as the value of its condition if none was given). If the condition is nonzero, then the body of the For is evaluated and its result is discarded; For then evaluates its increment expression (or does nothing if none was given) and returns to the start of the loop. If the condition is zero, then the body of the loop (and its increment expression) are skipped.

The while and for loops do not ever return values. Seq and Let return the value of their right-hand expressions. if may return values or it may return unit (or no value). Again, assume that if one branch of the if returns a value, the other branch will do the same.

Unless you're doing the extra credit, you don't need to generate code for the Switch or the ScBinop AST nodes. For those nodes, you can simply failwith "No extra credit."

Implement some combination of switch and short-circuit evaluation. switch looks up an integer in a table of labels. If the integer is bound to a label in the table, control flow passes to that label. If nothing in the table matches, control flow passes to the default label. Switches can return values (and it is expected that every branch will consistently return or not return a value). In short-circuit evaluation, the right-hand side of a logical operator is evaluated only if necessary. If the operator is ||, the right-hand side need not be evaluated if the left-hand side is nonzero. If the operator is &&, the right-hand side need not be evaluated if the left-hand side is zero. Make sure to return the correct value!

For switch, you will need to add the following to the grammar:

switchexp INT=>exp_noseq default=>exp_noseq

There are no delimiters between switch branches: switch(4) 1 => 2 200 => 3 300 => 4 4 => 5 7 => 8 default => 4 A switch may have no branches, but must always have a default case. It is illegal for a branch to be duplicated.

No new grammar rules for short-circuit evaluation are necessary. You will need to generate code for the ScBinop node. There are no special instructions for short-circuit evaluation.

You can choose to implement switch in terms of If instructions or in terms of the Switch control-flow IL instruction. If you choose to do the latter, you will need to implement the Il.Switch case in cfilerase.ml. This will allow you to use different dispatch techniques, including jump tables (in which case you are permitted to use LInsn.Ix (X86.Insn.ExtInlineJumpTable (table_label, branch_labels)) which emits inline with the rest of your X86 code the labels in the branch_labels list).

We will also award extra credit points for interesting and clever solutions. This will be based in part on timings for benchmarks collected on ENIAC.

Edit the included readme.txt file, making sure to fill in your name(s) and email address(es).

You should also spend some time to ensure that your code is readable and well-commented. We have some suggestions and requirements for your programming style. You will definitely lose points for the file submission requirements (code must compile, 80 column limit, no tabs); however, barring any particularly egregious cases, you may take the remaining rules as suggestions.

Part of your grade will be based on testing. Some example tests and utility functions are provided in controlflow_test.ml. For this assignment, we require that you define at least 10 new unit tests. (These must be sufficiently different from one another to count— tests like 1 + 1 is 2, 1 + 2 is 3, 1 + 3 is 4 are too similar to be useful). In addition, you should write (and identify in your documentation) an eleventh more difficult test. Although anything will do, it is better if this test exercises most of the project's functionality. We will collect everyone's difficult tests and run them as part of our standard test suite during grading.

You should contribute at least one small test to the mailing list and are encouraged to send in test cases to highlight parts of the specification about which you may have questions. It's OK to count tests from the mailing list toward your ten (but you must write your own unique larger test case).

The grading breakdown for this project is as follows:

  • ~20%: Amendments to the parser.
  • ~30%: Code generation for control flow.
  • ~30%: Code generation for variables.
  • ~10%: Code style, readability and documentation.
  • ~10%: Testing.

Extra credit breaks down like so:

  • up to 15%: Implementing switch.
  • up to 5%: Implementing short-circuit operators.
  • up to 5%: Impressive overall project.

Remember that your raw score will be scaled— that is, more difficult assignments will be worth more than less difficult assignments, even if the more difficult assignment has fewer available points.

controlflow.run, the executable the OMakefile will produce, offers several helpful command line flags that might make debugging easier. These are:
  • --stdin: runs in a loop listening for lines of input at the terminal, then parses those lines and prints out a representation of the resulting AST.
  • --showast: like --stdin, but shows the AST for files.
  • -c: cleans the temporary directories c_bin and c_obj.
  • --printasm: prints the generated assembler (in AT&T syntax) to the console.
  • -p: prints the generated IL to the console.
  • -r: runs the resulting executable.
  • -d: runs the resulting executable in the debugger, setting a breakpoint at the start of your generated code.
  • --ddd: runs the program in DDD, a graphical interface to gdb.
A good combination of flags for this assignment is: ./controlflow.run -c --showast --printasm -p -r testfile.cf
First, make sure to clean your project with omake clean. All of the files you've written should be in projects/controlflow. Submit your assignment on ENIAC by switching to the projects directory and typing: turnin -c cis341 -p project3 controlflow Verify that you've submitted your files with: turnin -c cis341 -v -p project3 You can re-submit your work any number of times, but only the last submission will be counted. New submissions completely overwrite old submissions— so if you're missing a single file, make sure to resubmit your entire project. Check the main course page for information about late policies.