Project 2 Index
Build a compiler for simple integer expressions. Your program will accept source files of the form 1 + 2 + 3 + 4 + 32 and will produce executables that print (on standard out): Result: 42
project2.tar.gz contains all that you need to get started with this assignment. Extract it over your projects directory such that expressions.ml is located at cis341/projects/expressions/expressions.ml. (All files are stored in the tarball relative to cis341, so switching to your cis341 directory and running tar xvf project2.tar.gz should do the trick.) Edit the OMakefile at cis341/projects/OMakefile by adding the new directory to the .SUBDIRS declaration: PROJECT_SUBDIRS[] = x86interp expressions This package depends on the files from Project 1.

Note that the archive includes some patches for the assembler generation libraries that should fix issues on 64-bit machines.

This is a group project. Groups of size 1 or 2 are acceptable.

Implement an ocamllex parser for integer arithmetic expressions:

  • Whitespace is permitted anywhere before, between or after tokens. Whitespace is defined as the characters '\t' ' ' '\n' '\r'.
  • Numeric literals should be parsed as int32s. Use Int32.of_string to convert lexemes to their int32 representation. Remember that it might throw a Failure (which you should translate to a Lexer_error). A numeric literal may contain one or more of the ten decimal digits 0123456789. Negative numeric literals should be handled by the parser as a combination of a positive literal and the unary negation operator.
  • The operators you must implement have the forms:
    • + binary signed addition
    • - binary signed subtraction OR signed unary negation
    • * binary signed multiplication
    • == binary equality (returns 1 or 0)
    • << binary shift left
    • >> binary arithmetic shift right
    • >>> binary logical shift right
    • != binary not-equal (returns 1 or 0)
    • < binary signed less-than (returns 1 or 0)
    • <= binary signed less-than or equals (returns 1 or 0)
    • > binary signed greater-than (returns 1 or 0)
    • >= binary signed greater-than or equals (returns 1 or 0)
    • ! unary logical not (returns 1 for a zero input, 0 for nonzero input)
    • ~ unary bitwise not
    • & binary bitwise and
    • | binary bitwise or
    • ( left parenthesis
    • ) right parenthesis
  • All other input is invalid and should raise a Lexer_error.

You will need to declare new tokens in parser.mly to support your lexer rules.

Implement an ocamlyacc parser for the AST defined in east.ml. Precedences for operators are specified in that file. Parenthesis should have the highest precedence. Don't worry about preserving information for error reporting (besides, where would you use it?). Do not use ocamlyacc's precedence declarations.

Implement rev_lite_of_ast from the module Expressions. This function will build a reversed list of X86lite instructions that Expressions_main will reverse and package into a module.

Since we don't have very much by way of compiler infrastructure at this point, there aren't too many options for storing temporary values. For this assignment, you should store your temporaries on the stack and return results in Eax. You should only have to push new values to the stack when compiling an expression containing a binary operator. In that case, first generate code for the left-hand branch, then push the result, then generate code for the right-hand branch. Remember to pop this value when you're done with it.

You can access the value at the top of the stack with the indirect operand Ind (stk_ofs 0l). To pop an item off the stack without being forced to store it somewhere, just manipulate Esp directly with LInsn.Add (esp, CImm 4l). Remember that the shift instructions put special requirements on the location of the shift amount; also note that you will have to generate some extra code for the comparison and logical negation operators.

You may use only the registers Eax Ecx Edx (and memory) for computation.

Edit the included readme.txt file, making sure to fill in your name(s) and email address(es).

You should also spend some time to ensure that your code is readable and well-commented. We have some suggestions and requirements for your programming style. You will definitely lose points for the file submission requirements (code must compile, 80 column limit, no tabs); however, barring any particularly egregious cases, you may take the remaining rules as suggestions.

Part of your grade will be based on testing. Some example tests and utility functions are provided in expressions_test.ml. For this assignment, we require that you define at least 10 new unit tests. (These must be sufficiently different from one another to count— tests like 1 + 1 is 2, 1 + 2 is 3, 1 + 3 is 4 are too similar to be useful). A good way to hit and surpass this requirement is to write tests for each operator, but remember to test combinations of operators (to check that you have precedence correct and that you aren't doing unusual things to the stack).

Since the tests are a little less tricky this time around, please do not share them on the mailing list.

The grading breakdown for this project is as follows:

  • ~40%: Lexing and Parsing.
  • ~40%: Code generation (independent of lexing/parsing).
  • ~10%: Code style, readability and documentation.
  • ~10%: Testing.

Remember that your raw score will be scaled— that is, more difficult assignments will be worth more than less difficult assignments, even if the more difficult assignment has fewer available points.

expressions.run, the executable the OMakefile will produce, offers several helpful command line flags that might make debugging easier. These are:
  • --stdin: runs in a loop listening for lines of input at the terminal, then parses those lines and prints out a representation of the resulting AST.
  • --showast: like --stdin, but shows the AST for files.
  • -c: cleans the temporary directories c_bin and c_obj.
  • -p: prints the generated assembler (in AT&T syntax) to the console.
  • -r: runs the resulting executable.
  • -d: runs the resulting executable in the debugger, setting a breakpoint at the start of your generated code.
  • --ddd: runs the program in DDD, a graphical interface to gdb.
A good combination of flags for this assignment is: ./expressions.run -c --showast -p -r testfile.e
First, make sure to clean your project with omake clean. All of the files you've written should be in projects/expressions. Submit your assignment on ENIAC by switching to the projects directory and typing: turnin -c cis341 -p project2 expressions Verify that you've submitted your files with: turnin -c cis341 -v -p project2 You can re-submit your work any number of times, but only the last submission will be counted. New submissions completely overwrite old submissions— so if you're missing a single file, make sure to resubmit your entire project. Check the main course page for information about late policies.