Due: Thursday, Feburary 23 at 11:59pm

Download: hw3.zip

LLVM Lite Specification

To Submit: Go Here


In this project you will implement a non-optimizing compiler for subset of the LLVM IR language. The source language consists of a 64-bit, simplified subset of the LLVM IR that we call LLVMlite. The target language is X86lite as defined in HW2.
Note: You may work alone or in pairs for this project.
  • Submit the backend.ml file to the CIS 341 submission website for testing.
  • Post a non-trivial LLVMlite test case to the course Piazza.

Getting Started

To get started, download the project source files (and create an Eclipse project whose executable is main.native or main.byte if choose to use Eclipse). The files included in hw3.zip are briefly described below. Those marked with [*] are the only ones you should need to modify while completing this assignment.
Note: You'll need to install Menhir for the LLVMlite parser. The easiest way to do this is to use OPAM via the command opam install menhir. You'll also need to add the nums and unix libraries to compile this project. If you use OCamlBuild, you can compile the project from the command line by running ocamlbuild -Is x86,util -libs nums,unix,str -use-menhir main.native
util/assert.ml(i) - the assertion framework

/x86/x86.ml(i)    - the X86lite in
/x86/platform.ml  - platform-specific compilation support
llprograms/*.ll   - example .ll programs used in testing 

README            - help about the main test harness
Makefile          - basic make support for invoking ocamlbuild

ll/ll.ml          - the abstract syntax for LLVMlite
ll/lllexer.mll    - lexer for LLVMlite syntax
ll/llparser.mly   - parser generator for LLVMlite syntax
ll/llruntime.c    - simple runtime support

main.ml           - main test harness
driver.ml         - utilities for invoking the compiler

cinterop.c        - c code for testing interoperability
gradedtests.ml    - graded test cases that we provide  

[*]providedtests.ml - add your own test cases here [submit one on Piazza]
[*]backend.ml       - where you implement the LLVMlite to X86 compiler

LLVM Lite Specification

The source language for this 'backend' part of the full compiler is a subset of the LLVM IR called LLVM Lite. You may find the LLVM Language Reference to be a useful resource for this project, though we are concerned with only a small portion of the full LLVM feature set.

The LLVM Lite Specification describes the behavior of LLVM programs in terms of an abstract semantics that is not target specific. This semantics is intended to be faithful to the LLVM Language Reference.

Implementing the Compiler

The code we provide in backend.ml is a minimal skeleton of the basic structure of the compiler. To a first approximation, for each part foo of the abstract syntax (such as prog or fdecl), there is a corresponding compile_foo function (i.e. compile_prog or compile_fdecl). Most of these defintions have been left unimplemented (and a few have been left out). Your job is to complete this translation. Our reference solution is well under 350 lines of documented code, so if your implementation is significantly longer than this, you may wish to seek help.

The file backend.ml contains additional hints an explanations about the compilation strategy that we suggest you use.

We suggest that you stage the development of your compiler like this:

  1. First be sure you understand the general compilation strategy, especially the way that LLVM uid's are handled.

  2. Then get a minimal implementation of compile_fdecl working so that you can compile functions with empty bodies but varying numbers of input parameters. To do this, you'll need to understand the System V AMD64 ABI calling conventions (see Wikipedia for one explanation), the understand the notion of a layout and also complete the arg_loc function. At this point, the X86 code you generate won't be able to run because the code for the compiled function does not exit propertly. (But you can still look at the generated assembly code to see whether it looks reasonable.)

  3. Next implement enough of the compile_fbody function to handle the terminator Ret case for void functions (that return no results). At this point, your compiler should be able to generate working code for an LLVM function like that found in returnvoid.ll:
        define void @main(i64 %argc, i8** %argv) {
          ret void

    (Note, this isn't part of the test suite, since the value "returned" to the shell when this program runs isn't well defined.)

  4. Double check that you understand the notion of the layout type and develop a strategy for storing uid locals. See the comments in the backend.ml file. Implement the compile_operand function.

  5. Implement the Binop case for instructions processed compile_fbody (which, if you follow the suggested method of compiling locals, will use compile_operand).

  6. At this point, you probably want to revisit compile_fdecl to appropriately compute the layout information for the function body's uids.

  7. Work on control flow. Adjust compile_fbody it to deal properly with labels so it can handle non-trivial control-flow-graphs. Implement the rest of the terminator cases for compile_fbody. At this point, your compiler should be able to handle functions that return i64 values and that contain simple arithmetic and direct jumps.

  8. Implement the translation of Icmp, followed by Alloca, Load, and Store. (You should be into the swing of things by this point...)

  9. Next tackle the Call instruction. The code you generate must properly handle the System V AMD64 ABI calling conventions (see Wikipedia for one explanation). After successfully completing this step, your compiler should be able to handle the recursive factorial function definition.

  10. Breathe a sigh of relief at how easy it is to implement Bitcast, because the target language is untyped.

  11. Finally, gather your courage, and implement the Gep (getelementptr) instruction. We have provided you with a suggested helper function; look to the documentation in backend.ml.

  12. Rejoice! You're finished.

Testing and Debugging Strategies

Testing and debugging a compiler is quite difficult. There are many correct potential translations of a given source program, and there are many incidental changes (such as the choice of label names) that do not affect the semantics of the generated code. It is also difficult to test parts of the translation independently, since simple inputs may depend on almost all of the compilation pipeline.

The test harness provided by main.ml gives several ways to assess your code. See the README file for a full description of the flags.

We have provided a (minimally-featured) parser for LLVMlite code. It is sufficiently complete to parse the examples in the llprograms directory, and we expect you to create additional test cases yourself. For examples of how to use the test driver infrastructure, see the gradedtests.ml file.

You may also find it helpful to run the LLVMlite code by compiling it via clang (with the --clang flag).

Graded Test Cases

As part of this project, you must post an interesting test case for the compiler to the course Piazza site. This test case should be non-trivial and might take the form of a .ll file along with expected outputs (as in our automated tests), or it might start from hand-generated LLVMlite abstract syntax. There are several example such test files in the llprograms subdirectory. Use them for inspiration, but don't duplicate their functionality.

The test case you submit to Piazza will not count if it is too similar to previously-posted tests! (Note that this policy encourages you to submit test cases early!) Tests that stress parts of the language that aren't well exercised by the provided tests are particularly encouraged.

Your submitted test should be easy to drop in to the testing harness: ideally, it's a small amount of OCaml code, plus a single LLVMlite file. If your test case requires supporting C code (as some of our larger tests do), you can post that to Piazza along with your test.

We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fairs against the test cases submitted by the other groups in the class.

Any ll file that you create should (1) be compilable using the llc compiler (this is a good way to check that your llvm program is well formed and (2) adhere to the LLVMlite subset of the language (i.e. don't use non-64 bit operations).


Projects that do not compile will receive no credit!

Your team's grade for this project will be based on:

Last modified: Tue Feb 14 08:42:06 EST 2017