Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2006

Due: Wednesday, Nov 22 at 11:59PM

Now that you've mastered the art of assembly language programming (and no doubt improved your Snake-playing skills!), let's simplify our lives and use a "high-level" programming language, C. Although the C compiler will manage details like registers use, C still gives the programmer considerable control over the manipulation of data. For example, a C program can easily manipulate the binary representation of a program, which is exactly what we will do in this assignment!

You will write a disassembler (let's call it lc3dis). While an assembler converts ASCII assembly programs (i.e., .asm files) to binary machine language programs (i.e., .obj files), a disassembler does the reverse.

Important note: Your disassembler needs to deal with LC-3 instructions defined in the textbook as well as the MUL and SUB instructions. However, you do NOT need to deal with the other instructions that we have added (e.g., RTT and JMPT).

Functions

Just as with your Snake code, we have broken the task at hand into several manageable pieces (in this case C functions). The file lc3dis.c is a template that includes much of the code you'll need

Function: main()

This is the entry point into the disassembler. It does the following. The code for this function is pretty simple, so we provide it.

Function: get_zext_field(int bits, int hi_bit, int lo_bit)

This function gets the value of the bit field in integer bits beginning with bit hi_bit and ending with bit lo_bit. The resulting value is zero-extended. For example, to get the opcode of an instruction in ir, we would call this function as follows.

    opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based (i.e., they must be between 0 and 15). This code is really quite tricky, so we've provided it for you. Please look at the code and try to understand the logic.

Function: get_sext_field(int bits, int hi_bit, int lo_bit)

This function is very similar to get_zext_field(), except that it sign extends the resulting field. You will want to use get_zext_field() to select unsigned values like opcodes or register fields (e.g., DR, SR1, etc.), but you will want to use get_sext_field() to select signed immediate fields (e.g., imm5) or signed PC offset fields (e.g., PCoffset9). This code is also tricky. We provide this code, but take a look at it in order to understand it.

Function: get_bit(int bits, int bit_number)

This function is similar to get_zext_field() except that it selects and returns a single zero-extended bit. In fact, it's implemented by calling get_zext_field() with hi_bit and lo_bit set to the same value (bit_number). We provide this code.

Function: get_word_from_file(FILE* f)

This function extracts the next 16-bit word from the input file. We provide this code.

Function: print_instruction(int ir)

This is the core of the disassembler. This function is passed an integer (ir) that may have a value from 0x0000 to 0xffff, representing an LC-3 instruction. This function calls get_zext_field() to extract the opcode from the instruction. It then switches on that opcode. Within the switch there is a case for each opcode (e.g., ADD, AND, BR, JMP, etc.). Each case examines additional instruction bits (determined by the opcode) and prints an appropriate string representing the instruction.

For example, in the case for the AND instruction, we must call get_zext_field(ir,11,9) to get the destination register and get_zext_field(ir,8,6) to get the first source operand register. Next it must examine bit 5 (via get_bit(ir,5)) in order to determine whether the final operand is an immediate or register. If bit 5 is 0 (i.e., register operand), we call get_zext_field(ir,4,3) and we check that the result is 0 (i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is not a legal AND instruction, so we call print_fill(ir) to generate a .FILL assembler directive for this word. Otherwise, we use get_zext_field(ir,2,0) to get the second source operand register. Finally, the AND assembly instruction is printed via printf(). If bit 5 is 1, we use get_sext_field(ir,4,0) to get the imm5 field, and we print the AND instruction. Some of this code is provided to get you started.

Function: print_fill(int ir)

This function prints a .FILL assembler directive. We provide this code.

Helpful Information

Testing

We will provide a number of .obj and matching .asm files you can use to test your disassembler (but your should also generate your own test cases). Now in order to confirm that your code is correct use the Unix diff utility to compare the output of your program with the given .asm file:
    ./lc3dis t00.obj | diff -w -i - t00.asm
If diff produces no output, your program produces the same output as the given .asm file. Note that -w instructs diff to ignore whitespace and -i instructs it to ignore case. If the files are different, diff will indicate how they are different (type "man diff" for more information on diff). We will be using this testing method for our automatic testing scripts, so make sure diff produces no output. To run all the tests, we've given you a shell script in hw8code.zip. To make it executable, use the following command:
  chmod +x all-tests.sh
You only need to do that once. Now, you can run all the test by doing:
  ./all-tests.sh
Also note that the output of the disassembler cannot be directly assembled because the assembler doesn't know what to do with absolute addresses (it wants labels).

Please test your disassembler thoroughly with your own tests. The tests we provide are not at all complete, so you will have to create your own tests. Note that if your .asm files contain labels, these naturally won't appear in the corresponding disassembled code. You'll have to confirm that the addresses your disassembler generates are correct.

Submission

Please submit your code in a file called lc3dis.c in the usual way.
    turnin -c cse240 -p hw8 lc3dis.c

Due Date

Note that this assignment is due the Wednesday before Thanksgiving break. Given that this assignment only requires an addition 70 lines of code, we could have made it due on Monday. But we decided to give you a little flexibility. We suspect many of you will want to turn it in on Monday or Tuesday, so you are not working on it right before break. As always, early submissions are fine!