Instructor: | Prof. Milo Martin |
---|---|

Due: | Monday, January 25th (at the start of class) |

**Summary**: Consider the computations below. Briefly describe ways of
efficiently performing the specified computations. The primary goal is
to minimize runtime; the secondary goal is to minimize memory footprint.
**For now, focus only on sequential code**. Consider many possible
algorithmic enhancements and data structure organizations.

**To Consider...:** Consider the number of instructions executed, but
also how efficiently the compiler can translate your code into
instructions and how efficiently those instructions will be executed by
the processor. Effects to consider included memory effects (such as
cache misses, TLB misses, temporal locality, spatial locality, reducing
overall memory footprint, and prefetching) and processor execution (type
of operations, branch mis-predictions, and amount of instruction-level
parallelism). For example, consider "pointer chasing" may use few
instructions but it may have less instruction-level parallelism and
worse locality than scanning an array. Finally, consider some low-level
ways the code could be written to help the compiler generate better
code.

Not all of the ideas need to be "good" ideas. I'm just looking for some creative brainstorming as to many different ways to tackle these variants.

**What to turn in**: write up **no more than one page** of your ideas and
**why** they might speed things up. As this is mostly a planning stage,
I'm not going to read them very closely. We're going to discuss your
ideas in class on Monday, so the goal of the exercise facilitate a
lively discussion.

**Individual work**: Each student must submit their own writeup, but I
encourage you to discuss this general problem with each other outside of
class.

**Disclaimer**: There probably isn't any "right" solution. If there is,
I don't know what it is, as I haven't tried this out myself. It could
be the obvious way is the fastest way of doing it; or it could be more
subtle.

A set of N "points" is defined by three input arrays and one output array:

location: | An unsigned integer value for the one-dimensional location of the point. Each location is unique. |
---|---|

weight: | An unsigned integer weight of the point |

threshold: | An unsigned integer distance threshold for the point |

answer: | An unsigned integer output of the computation, starts initialized to all zeros |

The above information for N points is specified as four size N arrays. The distance between two points is calculated as:

distance(i, j): return absolute_value(location[i] - location[j])

**Computation 1**:

foreach i in [0 ... N-1] foreach j in [0 ... N-1] if (distance(i, j) < threshold[i]) answer[i] += weight[j]

Computation 1 has two variants:

**Variant 1A**: In variant A, the locations space is sparsely packed. That is, the locations are unique and uniformly distributed from 1 and MAX_LOCATION, where MAX_LOCATION is much larger than N.**Variant 1B**: In variant B, the locations space is densely packed. That is, the locations are unique and uniformly distributed from 1 and MAX_LOCATION, where MAX_LOCATION is a small multiple of N.

**Computation 2**:

foreach i in [0 ... N-1] foreach j in [0 ... N-1] if (distance(i, j) < threshold[i]) answer[i] += (weight[i] + weight[j]) * distance(i, j)

Computation 2 has two variants:

**Variant 2A**: In variant A, the thresholds range widely.**Variant 2B**: In variant B, the minimum threshold for any point is at least the maximum location divided by some medium-sized constant (for example, larger than 100 but smaller than 10,000).