Project 7 Index

Implement key algorithms to perform register allocation on OAT programs. You will take a program like this:

int a = 1; int b = 2; int c = 3; int d = 4; if (d) c = c + 32 else c = c + 33; a+b+c+d

Which would have ordinarily turned into something like this:

_program: pushl %ebp movl %esp, %ebp subl $24, %esp __7: call __init_consts2 addl $0, %esp movl $1, -24(%ebp) movl $2, -20(%ebp) movl $3, -16(%ebp) movl $4, -12(%ebp) cmpl $0, -12(%ebp) jNE __13 jmp __14 __13: movl -16(%ebp), %ecx movl %ecx, -8(%ebp) addl $32, -8(%ebp) movl -8(%ebp), %ecx movl %ecx, -16(%ebp) movl -16(%ebp), %ecx movl %ecx, -8(%ebp) __12: movl -24(%ebp), %ecx movl %ecx, -8(%ebp) movl -20(%ebp), %ecx addl %ecx, -8(%ebp) movl -16(%ebp), %ecx addl %ecx, -8(%ebp) movl -12(%ebp), %ecx addl %ecx, -8(%ebp) movl -8(%ebp), %eax movl %ebp, %esp popl %ebp ret __14: movl -16(%ebp), %ecx movl %ecx, -4(%ebp) addl $33, -4(%ebp) movl -4(%ebp), %ecx movl %ecx, -16(%ebp) movl -16(%ebp), %ecx movl %ecx, -8(%ebp) jmp __12

But which should now look like this:

_program: pushl %ebp movl %esp, %ebp subl $4, %esp __21: movl %ebx, -4(%ebp) call __init_consts2 movl $1, %ecx movl $2, %edx movl $3, %ebx movl $4, %eax cmpl $0, %eax jNE __19 jmp __20 __19: addl $32, %ebx __18: addl %edx, %ecx addl %ebx, %ecx addl %ecx, %eax movl -4(%ebp), %ebx movl %ebp, %esp popl %ebp ret __20: addl $33, %ebx jmp __18
project7.tar.gz contains all that you need to get started with this assignment. Extract it to your projects directory such that full.ml is located at cis341/projects/ralloc/full.ml. (All files are stored in the tarball relative to cis341, so switching to your cis341 directory and running tar xvf project7.tar.gz should do the trick.) Edit the OMakefile at cis341/projects/OMakefile by adding the new directory to the .SUBDIRS declaration: PROJECT_SUBDIRS[] = x86interp expressions controlflow arrproc full fullopt ralloc

This package depends on the files from Project 6.

Remember that you may change the provided code so long as you keep the interfaces the same (so that we can use everyone's testers on everyone's assignments).

This project adds four new modules to the OAT compiler. Together these form an optimizing backend. They are:

  • Fullilopt: Drives the optimization process. Look here to get an idea of how data moves from stage to stage and how spilling is handled.
  • Fullildom: Calculates dominance information over CFGs.
  • Fullilssa: Performs SSA transformations and graph-coloring register allocation.
  • Fullilopterase: Lowers optimized IL to X86 instructions. Note that the optimization process will change many IL instructions. For instance, the CheckedDeref instruction may now become a sequence of NullCheck, FieldDeref, BoundCheck and UncheckedDeref instructions.

This is a group project. Groups of size 1 or 2 are acceptable.

You must first implement the functions dominators and dominance_frontier in Fullildom. We suggest that you use the algorithm due to Cooper, Harvey and Kennedy that was discussed in class. The relevant paper is online.

We provide the code that makes the initial SSA transformation (using your dominance algorithms). For reference, here is what the above program looks like in fully-expanded form:

proc()[#__16, #__15, #__11, #__10, #__9, #__8] y:[entry] s:3 []{ block 0 { @Svar(#__10):3 := phi(2=>@Svar(#__10):1, 1=>@Svar(#__10):2) @Svar(#__15):4 := @Svar(#__8):0 @Svar(#__15):5 := += @Svar(#__15):4 @Svar(#__9):0 @Svar(#__15):6 := += @Svar(#__15):5 @Svar(#__10):3 @Svar(#__15):7 := += @Svar(#__15):6 @Svar(#__11):0 @EAX:2 := @Svar(#__15):7 @EDI:1 := @Svar(#saved_edi):0 @ESI:1 := @Svar(#saved_esi):0 @EBX:1 := @Svar(#saved_ebx):0 ret [void] } block 1 { @Svar(#__15):1 := @Svar(#__10):0 @Svar(#__15):2 := += @Svar(#__15):1 32 @Svar(#__10):2 := @Svar(#__15):2 @Svar(#__15):3 := @Svar(#__10):2 jump 0 } block 2 { @Svar(#__16):0 := @Svar(#__10):0 @Svar(#__16):1 := += @Svar(#__16):0 33 @Svar(#__10):1 := @Svar(#__16):1 @Svar(#__15):0 := @Svar(#__10):1 jump 0 } block 3 { @Svar(#saved_ebx):0 := @EBX:0 @Svar(#saved_esi):0 := @ESI:0 @Svar(#saved_edi):0 := @EDI:0 [cdecl] __init_consts2(0) [D:@EDX:1; @ECX:1; @EAX:1, U:] @Svar(#__8):0 := 1 @Svar(#__9):0 := 2 @Svar(#__10):0 := 3 @Svar(#__11):0 := 4 if (@Svar(#__11):0 != 0) then 1 else 2 } }

There are several things to note:

  • There is a phi-function at the beginning of node 3.
  • Machine registers are handled as SSA resources.
  • Function calls implicitly define EAX, ECX and EDX.
  • The callee-save registers EBX, ESI and EDI are explicitly saved and restored.
  • The return value is explicitly placed in EAX. ret merely exits the function.

Complete the function get_liveness in Fullilssa. This function calculates the live-in and live-out sets for CFG nodes in a particular procedure. You should use the dataflow analysis algorithm that was presented in class, though your implementation will differ because the program representation allows for more than one (non-branch) instruction in a basic block.

Handling phi-functions in get_liveness can be tricky. If you look at get_defs_uses, you will find that phi-functions are defined inside the blocks to which they are attached. Unlike a regular instruction, however, a phi-function's right-hand side does not add any uses to the def-use table. If this were the case, then it's easy to see that the liveness algorithm would float all of the phi-function's arguments to the top node of the program! To solve this problem, you should add to the live-out sets of the immediate predecessors of nodes containing phi-functions. For example, the live-in and live-out sets for the code above are:

live_in: bk0:@Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0; bk1:@Svar(#__10):0; @Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0; bk2:@Svar(#__10):0; @Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0; bk3: live_out: bk0: bk1:@Svar(#__10):2; @Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0; bk2:@Svar(#__10):1; @Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0; bk3:@Svar(#__10):0; @Svar(#__11):0; @Svar(#__8):0; @Svar(#__9):0; @Svar(#saved_ebx):0; @Svar(#saved_edi):0; @Svar(#saved_esi):0;

Notice that bk1 does not have 10:1 on its live-out set while bk2 does not have 10:2; also note that bk0 has neither 10:1 nor 10:2 on its in-set.

Once you have the live-in and live-out sets, you'll want to be able to query them to see whether a particular doperand is live at a given block:instruction pair. Complete the functions is_live_at_block and is_live_at.

Now that you can query for liveness, you can build up a graph of those doperands that interfere with one another. We provide code for doing so in build_interference_graph. The graph keeps track of doperands that cannot coexist in a register as well as doperands that are move-related. These are operands that may be simultaneously live but are only copies of each other. (Assuming that we can store both in the same place works due to SSA's guarantee that each assignment gets a unique name.)

You can see an example (colored) interference graph here. Thick edges are actual interferences and mark nodes that cannot be stored to the same physical register at the same time. Dotted edges mark nodes that are move-related and therefore may be stored in the same register simultaneously.

Complete the function color_stack, which takes a color map (a hash table with doperand keys and igraphcolor values) and a stack of graph nodes and attempts to give all nodes in the stack a color. It may be the case that you are not able to find a coloring for a particular node; you should then raise an exception with the node that you failed to color. This is caught by the spilling code, which breaks up live ranges when necessary.

You must also provide the function pick_spill_candidate, which takes an interference graph and chooses the best (for some reasonable definition of best) node to spill. Some nodes should never be spilled; you can check this by using the is_heavy_node function with the node's ig_symbol. If no spillable nodes remain, raise No_spill_candidate.

Often it is the case that the left-hand side of a phi-function is mapped to the same register as all of the possible operands on its right-hand side. In certain cases, though, the sides may make inconsistent assumptions. It is sometimes necessary to insert a small amount of bookkeeping code that shuffles operands around in registers between a predecessor node and its phi-function bearing successor. This process is made slightly difficult by the fact that these thunk routines have no scratch space.

Complete the function make_thunk_code, which takes a list of (dest-opnd,source-opnd) mappings and produces thunk code to be run on the edge between two nodes dest and source where dest contains one or more phi-functions that use nonhomogeneous registers. You will find the new instruction Il.Ext (Swap (a,b)) very useful.

Edit the included readme.txt file, making sure to fill in your name(s) and email address(es).

You should also spend some time to ensure that your code is readable and well-commented. We have some suggestions and requirements for your programming style. You will definitely lose points for the file submission requirements (code must compile, 80 column limit, no tabs); however, barring any particularly egregious cases, you may take the remaining rules as suggestions.

Part of your grade will be based on testing. Some example tests and utility functions are provided in fullilssa_test.ml and fullildom_test.ml. You must use the testing framework provided and explicitly note any modifications in your readme.txt file. We ask that you name your test cases regularly: choose a username from those in your group. All test names and files should contain that username followed by an underscore as a prefix. (For example, if your username is cat and you have written a test case called alive, that test should be called cat_alive.)

  • For this assignment, we require that you define at least two unit tests per category (where each category is one of the 20% slices of the project as outlined below).
  • Tests must be sufficiently different from one another to count.
  • All test cases that you count toward the testing portion of your grade must be fully automatic and must check for some result. The tests must run when we call omake test from the ralloc directory.
  • Please share at least one test from any category on the mailing list. You can count others' tests toward your total.

The grading breakdown for this project is as follows:

  • ~20%: Dominance/Dominance Frontier.
  • ~20%: Liveness/Liveness Queries.
  • ~20%: Graph Coloring and Spilling.
  • ~20%: Leaving SSA.
  • ~10%: Code style, readability and documentation.
  • ~10%: Testing.

In addition, you can receive extra credit for using information about move-related operands to drive graph coloring (+5%) and for improving the provided code (either by making it generate better IL/X86 or by improving the speed of the compiler itself). Contact us to see how much a change is worth and make note of it in your readme.txt.

Remember that your raw score will be scaled— that is, more difficult assignments will be worth more than less difficult assignments, even if the more difficult assignment has fewer available points.

ralloc.run, the executable the OMakefile will produce, offers several helpful command line flags that might make debugging easier. This time around, we are providing some scripts to run the compiler with debugging settings. The scripts include:
  • igraph example-name: builds, but does not execute, the program at examples/example-name.oat. Dumps intermediate graphs to the local directory as .gv files and records a large amount of debugging output to log.txt.
  • igraphr example-name: builds and runs the program at examples/example-name.oat with no graph output and no debug output.
  • igraphq example-name: builds the program at examples/example-name.oat with no graph output and no debug output.
  • gv graph-name.gv: uses Graphviz to produce a graph from the given graph description file.

Running ralloc.run --help will give you a comprehensive documented list of supported flags. Remember that the optimizing backend will not run without the -O flag being provided.

First, make sure to clean your project with omake clean. All of the files you've written should be in projects/ralloc. Submit your assignment on ENIAC. Your project must run on ENIAC, including all test cases. Be sure to submit from ENIAC and not from some other Linux machine; otherwise you will get an error.

Switch to the projects directory and type: turnin -c cis341 -p project7 ralloc Verify that you've submitted your files with: turnin -c cis341 -v -p project7 You can re-submit your work any number of times, but only the last submission will be counted. New submissions completely overwrite old submissions— so if you're missing a single file, make sure to resubmit your entire project. Check the main course page for information about late policies.