Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2006
Due: Wednesday, Nov 22 at 11:59PM
Now that you've mastered the art of assembly language programming (and
no doubt improved your Snake-playing skills!), let's simplify our lives
and use a "high-level" programming language, C. Although the C compiler
will manage details like registers use, C still gives the programmer
considerable control over the manipulation of data. For example, a C program
can easily manipulate the binary representation of a program, which is
exactly what we will do in this assignment!
You will write a disassembler (let's call it lc3dis). While an
assembler converts ASCII assembly programs (i.e., .asm files)
to binary machine language programs (i.e., .obj files), a
disassembler does the reverse.
Important note: Your disassembler needs to deal with LC-3 instructions defined
in the textbook as well as the MUL and SUB instructions.
However, you do NOT need to deal with the other instructions that we have added
(e.g., RTT and JMPT).
Functions
Just as with your Snake code, we have broken the task at hand into several
manageable pieces (in this case C functions). The file lc3dis.c
is a template that includes much of the code you'll need
Function: main()
This is the entry point into the disassembler. It does the following.
- Opens (for reading) a file specified on the command line.
- Reads the first 2 bytes from the file to determine the .ORIG
address (more on this later).
- Outputs a .ORIG assembler directive with the address computed
above.
- Until the end-of-file is reached, reads each 2-byte instruction from
the file and calls print_instruction() on it.
- Outputs a .END assembler directive.
- Closes the file.
The code for this function is pretty simple, so we provide it.
Function: get_zext_field(int bits, int hi_bit, int lo_bit)
This function gets the value of the bit field in integer bits
beginning with bit hi_bit and ending with bit lo_bit.
The resulting value is zero-extended. For example, to get the opcode of
an instruction in ir, we would call this function as
follows.
opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based
(i.e., they must be between 0 and 15). This code is really quite
tricky, so we've provided it for you. Please look at the code and try
to understand the logic.
Function: get_sext_field(int bits, int hi_bit, int lo_bit)
This function is very similar to get_zext_field(), except that
it sign extends the resulting field. You will want to use
get_zext_field() to select unsigned values like opcodes or
register fields (e.g., DR, SR1, etc.),
but you will want to use get_sext_field() to select signed
immediate fields (e.g., imm5) or signed PC offset fields
(e.g., PCoffset9). This code is also tricky. We
provide this code, but take a look at it in order to understand it.
Function: get_bit(int bits, int bit_number)
This function is similar to get_zext_field() except that it
selects and returns a single zero-extended bit. In fact, it's
implemented by calling get_zext_field() with hi_bit
and lo_bit set to the same value (bit_number). We
provide this code.
Function: get_word_from_file(FILE* f)
This function extracts the next 16-bit word from the input file. We
provide this code.
Function: print_instruction(int ir)
This is the core of the disassembler. This function is passed an
integer (ir) that may have a value from 0x0000 to 0xffff,
representing an LC-3 instruction. This function calls
get_zext_field() to extract the opcode from the instruction.
It then switches on that opcode. Within the switch there is a case for
each opcode (e.g., ADD, AND, BR,
JMP, etc.). Each case examines additional instruction bits
(determined by the opcode) and prints an appropriate string representing
the instruction.
For example, in the case for the AND instruction, we must call
get_zext_field(ir,11,9) to get the destination register and
get_zext_field(ir,8,6) to get the first source operand
register. Next it must examine bit 5 (via get_bit(ir,5)) in
order to determine whether the final operand is an immediate or
register. If bit 5 is 0 (i.e., register operand), we call
get_zext_field(ir,4,3) and we check that the result is 0
(i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is
not a legal AND instruction, so we call print_fill(ir)
to generate a .FILL assembler directive for this word.
Otherwise, we use get_zext_field(ir,2,0) to get the second
source operand register. Finally, the AND assembly instruction
is printed via printf(). If bit 5 is 1, we use
get_sext_field(ir,4,0) to get the imm5 field, and we
print the AND instruction. Some of this code is provided to
get you started.
Function: print_fill(int ir)
This function prints a .FILL assembler directive. We provide
this code.
Helpful Information
- Getting started. We assume most of you will want to work on
eniac.seas.upenn.edu. If you have a C compiler on your personal
machine you would like to use, that's fine. But you should confirm your code
compiles and runs on eniac because that's where we'll be testing it.
Begin by creating a directory to work in and copying the files we
provide. These files are available on the Linux machines in the lab
(or eniac.seas.upenn.edu) or they can
be found in hw8code.zip.
cd ~
mkdir cse240hw8
cd cse240hw8
cp ~cse240/public_html/handouts/hw8/* .
This will give you a bunch of .obj and corresponding.asm
files to use in testing (below). Also, it will give you a file
called lc3dis.c to use as a starting point.
- Multiply. The encoding for multiply does not appear in the book.
It is exactly the same as ADD or AND except the opcode is xD (13 in
decimal). The book describes this as a reserved opcode.
- Subtract. The encoding for subtract does also not appear in the book.
Use PennSim to figure out ("reverse engineer") the encoding of the subtract
instruction.
- Resources. Appendix A and the table on the inside back cover of your
textbook will be extremely useful! You will find all answer there!
- Immediate fields. Please output all of your immediate fields in decimal
(rather then hexadecimal). This is necessary so our automatic testing scripts
will not get confused. For example, the following is fine.
LDR R1, R2, #10
TRAP #10
While the following is equivalent to the above, it will not be accepted
by our testing scripts.
LDR R1, R2, xA
TRAP xA
- Make sure you check the fixed fields in instructions. For example,
in an AND immediate instruction, bits 4 and 3 must be 0. If
they are not, it is not an AND instruction at all. It's not
any instruction, so it must be data. Similarly, in a JMP
instruction, bits 5 to 0 must be 0. And in a NOT instruction,
bits 5 to 0 must be 1. It you discover that you are looking at data
(not an instruction), call print_fill() to generate a
.FILL assembler directive.
- One or more of the n, z, or p fields in a
BR instruction must be set. If all of n/z/p are set, the opcode
should just be BR (not BRnzp). If none of n/z/p are, it is
not a BR instruction (i.e., it must be data and
print_fill() should be called).
- PC-relative offsets. Do not try to generate assembly code that
contains labels! This would make things much harder. Instead, simply
specify your PC-relative offsets directly (in base 10, so you can
specify negatives). For example, if the PC-offset of some LD
instruction is -17 (and the destination register is R1), you would
generate the following assembly instruction.
LD R1, #-17
- Compiling. Use gcc on the Moore 100 machines
or eniac.seas.upenn.edu to build your code. You may want to use
the -o flag to specify the name of the generated program.
The -Wall turns on warnings (a very good idea), and -O1 turns
on optimizations. Here's an example:
gcc -Wall -O1 -o lc3dis lc3dis.c
To run your disassembler:
./lc3dis foo.obj
- Object file format. For the curious, we'll describe the
.obj file format. The first 2 bytes contain the .ORIG
address of the program. Subsequent byte pairs (16 bits) encode each
instruction in the program.
Testing
We will provide a number of .obj and matching .asm files you
can use to test your disassembler (but your should also generate your own test
cases). Now in order to confirm that your code is correct use the
Unix diff utility to compare the output of your program with the
given .asm file:
./lc3dis t00.obj | diff -w -i - t00.asm
If diff produces no output, your program produces the same output as
the given .asm file. Note that -w instructs diff to
ignore whitespace and -i instructs it to ignore case. If the files
are different, diff will indicate how they are different (type
"man diff" for more information on diff).
We will be using this testing method for our automatic testing scripts,
so make sure diff produces no output.
To run all the tests, we've given you a shell script
in hw8code.zip. To make it executable, use the
following command:
chmod +x all-tests.sh
You only need to do that once. Now, you can run all the test by doing:
./all-tests.sh
Also note that the output of the disassembler cannot be directly
assembled because the assembler doesn't know what to do with absolute
addresses (it wants labels).
Please test your disassembler thoroughly with your own tests.
The tests we provide are not at all complete, so you will have to create your
own tests. Note that if your .asm files contain labels, these
naturally won't appear in the corresponding disassembled code. You'll have to
confirm that the addresses your disassembler generates are correct.
Submission
Please submit your code in a file called lc3dis.c in the usual
way.
turnin -c cse240 -p hw8 lc3dis.c
Due Date
Note that this assignment is due the Wednesday before Thanksgiving
break. Given that this assignment only requires an addition 70 lines of
code, we could have made it due on Monday. But we decided to give you a
little flexibility. We suspect many of you will want to turn it in on
Monday or Tuesday, so you are not working on it right before break. As
always, early submissions are fine!