Lab 1 - Combinational Logic and ALUs

CSE 372 (Spring 2006): Digital Systems Organization and Design Lab

Preliminary Demo by 7pm Friday, February 3rd.

Final Demo by 7pm Friday, February 10th.

Writeup due before class on Monday, February 13th.

This lab is to be done individually.

This lab is worth 25 points.

Overview

In this lab, you will construct the ALU (Arithmetic/Logical Unit) for a P37X ISA processor. Before you can build the ALU, you need to create a few building blocks (4-bit adder, 16-bit adder, 16-bit multiplier, 16-bit shifter) which you will then combine to form an ALU.

Preliminaries

Before you begin, we have another tutorial for you to walk through: ModelSim simulation tutorial. This tutorial covers simulation of designs for verifying they are correct and debugging them when they are not.

Specifics

Design and test the following combinational logic structures.

Important

Before you write any Verilog code, first draw a hand-drawn schematic diagram of the circuit with all wires and input/outputs labeled. Why? When designing hardware, even when using Verilog, you need to be thinking explicitly about the structure and interconnectedness of the circuits. Only when the diagram is complete should you write the Verilog code that corresponds to the circuit. As described below, you need to turn in both the hand-drawn schematic and a printout of the Verilog code.

1. 4-bit Ripple-Carry Adder

Before creating a 16-bit adder, first create a signed 4-bit ripple-carry adder as a basic building block. It has three inputs: two 4-bit signed values and a 1-bit carry in signal. It two output values: the 4-bit output and a 1-bit carry out signal. You might want to use the 3-input, 2-output single-bit "full adder" you designed in Lab 0 (or an improved version of it) as a building block.

Testing: Test the adder both in simulation and on the board. To test the adder on the board, hook your 4-bit adder inputs to two sets of four input switches on the extension board; hook the outputs to five LEDs on the extension board.

2. 16-bit Carry-Select Adder

The 16-bit adder takes in two 16-bit signed values and a single-bit carry-in signal. It has a single 16-bit signed output.

Implementation: For comparison purposes, create three different adder implementations (using the 4-bit adder specified above):

A 16-bit ripple-carry adder made up of four 4-bit adders.
A 16-bit carry-select adder made up of two 8-bit select segments.
A 16-bit carry-select adder made up of six 4-bit select segments.

In the lab writeup, compare the delay (in nanoseconds) and area (in terms of lookup tables or "LUTs") of these three different adder implementations.

See the CSE371 lecture notes for more information on carry-select adders.

Testing: Test the adder both in simulation and on the board. Unfortunately, the extension boards do not have enough switches to represent two 16-bit inputs. As an incomplete workaround, test the adder on the board by sign extending the two sets of four input switches on the extension board; hook the eight low-order bits of the 16-bit output to the eight LEDs on the extension board. This setup will give you partial test coverage (enough to demonstrate the design is basically working).

3. 16-bit Multiplier

The 16-bit multiplier takes in two 16-bit signed values. It has a single 16-bit signed output. The multiplier is single-cycle and fully combinational (in contrast, a sequential multiplier takes multiple cycles and latches intermediate values).

Implementation: The most straightforward implementation uses a chain or tree of 15 sixteen-bit adders you just created to add up the 16 partial values. You'll also need to use some multiplexors, ranged bit selection, and/or other combinational logic. For comparison purposes, create three different multiplier implementations:

A 16-bit multiplier using the 16-bit ripple-carry adder from above.
A 16-bit multiplier using the 16-bit two-segment carry-select adder from above.
A 16-bit multiplier using the 16-bit four-segment carry-select adder from above.

Note: As you'll be including the 16-bit adder as a structural component, the textual differences between these multipliers should be minor.

In the lab writeup, explain your general multiplier design and compare its delay using these three adders.

Testing: Test the multiplier much like you tested the 16-bit adder.

4. 16-bit Shifter

The shifter unit has three inputs: a 16-bit value, a 4-bit shift amount, and a 2-bit shift type (00 is left shift, 01 is logical right shift, 10 is arithmetic right shift, 11 is no shift). It has a single 16-bit output.

Implementation: Note, there are several ways to implement this shifter. You could create three different shifters using 2-to-1 MUXes at each level. You would then use a 4-to-1 mux to select among them at the end. An alternative implementation would use four copies of the 4-to-1 MUX to select between the three kinds of shifts and no shift at all at each stage.

Testing: Test the shifters much like you tested the 16-bit adder. Use an additional two switches to specify the specific shift operation.

5. ALU

The ALU has three inputs: two 16-bit signed values and a 4-bit control signal that determines which operation the ALU should perform. The ALU has a single 16-bit signed output, which is the result of the operation. The ALU can perform ten operations:

Description	Insn	Control
Addition	ADD	0 100
Subtraction	SUB	0 101
Multiplication	MUL	0 110
Bitwise or	OR	1 000
Bitwise not	NOT	1 001
Bitwise and	AND	1 010
Bitwise xor	XOR	1 011
Shift left logical	SLL	1 100
Shift right logical	SRL	1 101
Shift right arithmetic	SRA	1 110

A few notes:

The control signal corresponds directly to the encoding of the P37X ISA for the opcode 0000 and 0001. The first control bit is the last bit of the 4-bit opcode; the remaining three bits are the last three bits in the instruction.
If any control signal other those specified is given to the ALU, the ALU should set all 16 output bits to zero.
The NOT operation returns the logical inverse of the first ALU input, and it ignores the second input.
The SUB operation computes output = input1 - input2.
For the shift operations, input1 is the value to be shifted and the four lower bits of input2 determine the amount of the shift (0 to 15 binary digits).
The arithmetic shift right performs sign extension; in contrast, the logical shift right performs zero extension.

Implementation: The ALU should instantiate a single 16-bit adder (also used for subtract), a 16-bit multiplier, and a left/right shifter. Using the outputs from these modules and some combinational logic to generate all ten possible values. Finally, use a 16-to-1 multiplexer to select the correct signal.

Testing: Test the shifters much like you tested the 16-bit adder, but use an additional four switches (the small switches on the main FGPA board) as the 4-bit input select.

Lab Logistics

Simulation Tutorial

Don't forget to walk through the ModelSim simulation tutorial before you begin.

Verilog Restrictions

This lab should be implemented using only low-level structural Verilog and the assign statement. You are not allowed to use the following Verilog operators: +, -, *, /, <<, >>, etc. However, you are allowed to use the following operators: ~, &, |, ^, ==, !=, ?:, {}, etc. If you're not sure if you're allowed to use a certain Verilog construct, just ask (post a message on the newsgroup, send an e-mail, etc.).

LEDs and Switches

We'll be using an extension board that contains additional LEDs and switches. See lab1.v and lab1.ucf for a top-level Verilog module and mappings for the LED and switch pins.

Note

The switches on the extention boards are "active high", but (as described in the lab 0), the LEDs and the switches on the main board are "active low" signals.

Delay and Resource Usage

The delay and resource usage of your design can be found in various reports:

Synthesis-only Timing: An approximate timing report can be found in the Timing Summary section of the "View Synthesis Report" information.
Post Place and Route Timing: For a more accurate timing report, in the ISE Project Navigator, double-click on the View Design Summary process. Under the Detailed Reports section, click on Post Place and Route Static Timing Report.
Timing Tuning/Debugging: To better understand the source of delay in your design, use the "Timing Analyzer" tool from within ISE.

When reporting timing results, use the "Post Place and Route Timing" information.

Demos

For this lab, there is a preliminary and final demo.

Preliminary demo: demonstrate your 4-bit adder to the TAs both in simulation and by using the switches and LEDs.
Final demo: demonstrate the entire ALU both in simulation and with the boards.

What to Turn In

For each of the designs, turn in:

Hand-drawn schematics. It is okay if these are a little messy, but they should accurately represent the Verilog code you turned in. Do not waste your time making pretty computerized schematics; the whole point of using Verilog is to save the tedium of making picture-perfect schematics).
Verilog code. Your Verilog code should be well-formatted, easy to understand, and include comments where appropriate (for example, use a comment to describe all the inputs and outputs to your Verilog modules). Some part of the project grade will be dependent on the style and readability of your Verilog, including formatting, comments, good signal names, and proper use of hierarchy.

Please interleave the schematics with the Verilog code for each module.

In addition, answer the following questions. When reporting timing results, use the information from the "Post Place and Route Timing" report.

Make a table of the delay (in nanoseconds) and resource usage (in LUTs) for the three 16-bit adder designs, the three 16-bit multiplier designs, and the entire ALU using just the fastest adder and multiplier.
How much faster is the fastest adder than the slowest adder? How much more area does it require?
How much faster is the fastest multiplier than the slowest multiplier? How much more area does it require?
Is the difference between these two speedups surprising to you? What might explain why the speedup is not more similar.
How much larger (in terms of resources consumed) is your multiplier than the corresponding adder? Is this ratio higher or lower than you expected? What might explain the difference from the expected ratio?
What problems, if any, did you encounter while doing this lab?
How many hours did it take you to complete this assignment? On a scale of 1 (least) to 5 (most), how difficult was this assignment?

Note

As part of your grade will be determined based on your lab writeups, they should be clear, concise and neat (preferably typed). You could have the greatest design in the world but if you cannot convey your idea clearly to the graders and convince them that it works you will not get good marks. Your lab writeups should include a brief explanation of what the circuits are supposed to do and how they do it.

Honor Points

To earn honors points on this assignment, design a faster adder and multiplier. You can use any of the various techniques discussed in class (e.g., non-uniform segment carry-select adders, carry-lookahead adders, carry-save tree multipliers, modified multi-bit booth multipliers, etc.). Feel free to search on-line for other ideas and techniques not discussed in CSE371 (although you're not allowed to directly copy any code found on-line). In addition, you can't use the Xilinx primitives.

Anyone that creates a faster adder and/or multiplier will earn 10 honor points, each. However, if you make a faster adder, your multiplier must be better than the original multiplier with the newer adder (that is, you have to actually improve the multiplier's design, not just the adder sub-component).

In addition, the designers of the five fastest adders and five fastest multipliers will be given an additional 5 honor points each.

As such, the 30 honor points is the maximum for this assignment.

Note

To receive these points, you must describe your implementation in the lab writeup.

Addendum

[Jan 30] The LEDs and switches on the extention board are not "active low" signals (they are the more traditional active high signals). As such, only the LEDs and switches on the main board are active low.
[Feb 1] See lab1.v and lab1.ucf for a top-level Verilog module and mappings for the LED and switch pins.
[Feb 5] Fixed typo that specified two shifters for the ALU (the shifter described above does left and right shifts, to only one shifter is needed).
[Feb 5] Clarified that the 16-bit multiplier uses 15 sixteen-bit adders as well as multiplexors, ranged bit selection, and/or other combinational logic.
[Feb 5] When reporting timing results, use the information from the "Post Place and Route Timing" report.
[Feb 7] New testbench code released: see testing, and lab1_testbench.v and lab1.input.test.

[Feb 8] To receive the honors points, you must describe how your fast adder and/or multiplier works in the lab writeup.
[In Retrospect] - Do not use the shifter module to shift the bits in the multiplier. Shifting by a constant number of bits can be much more easily be done using the bit selection and concatenation operations.
[In Retrospect] - The multiplier implementation that uses a simple chain of 15 adders works reasonably well, so using a tree-based multiplier implementation is best left for the honors points.