Homework 1 - CIS534 Spring 2010

Instructor:Prof. Milo Martin
Due:Monday, January 25th (at the start of class)

Summary: Consider the computations below. Briefly describe ways of efficiently performing the specified computations. The primary goal is to minimize runtime; the secondary goal is to minimize memory footprint. For now, focus only on sequential code. Consider many possible algorithmic enhancements and data structure organizations.

To Consider...: Consider the number of instructions executed, but also how efficiently the compiler can translate your code into instructions and how efficiently those instructions will be executed by the processor. Effects to consider included memory effects (such as cache misses, TLB misses, temporal locality, spatial locality, reducing overall memory footprint, and prefetching) and processor execution (type of operations, branch mis-predictions, and amount of instruction-level parallelism). For example, consider "pointer chasing" may use few instructions but it may have less instruction-level parallelism and worse locality than scanning an array. Finally, consider some low-level ways the code could be written to help the compiler generate better code.

Not all of the ideas need to be "good" ideas. I'm just looking for some creative brainstorming as to many different ways to tackle these variants.

What to turn in: write up no more than one page of your ideas and why they might speed things up. As this is mostly a planning stage, I'm not going to read them very closely. We're going to discuss your ideas in class on Monday, so the goal of the exercise facilitate a lively discussion.

Individual work: Each student must submit their own writeup, but I encourage you to discuss this general problem with each other outside of class.

Disclaimer: There probably isn't any "right" solution. If there is, I don't know what it is, as I haven't tried this out myself. It could be the obvious way is the fastest way of doing it; or it could be more subtle.

A set of N "points" is defined by three input arrays and one output array:

location:An unsigned integer value for the one-dimensional location of the point. Each location is unique.
weight:An unsigned integer weight of the point
threshold:An unsigned integer distance threshold for the point
answer:An unsigned integer output of the computation, starts initialized to all zeros

The above information for N points is specified as four size N arrays. The distance between two points is calculated as:

distance(i, j):
  return absolute_value(location[i] - location[j])

Computation 1:

foreach i in [0 ... N-1]
  foreach j in [0 ... N-1]
    if (distance(i, j) < threshold[i])
      answer[i] += weight[j]

Computation 1 has two variants:

Computation 2:

foreach i in [0 ... N-1]
  foreach j in [0 ... N-1]
    if (distance(i, j) < threshold[i])
      answer[i] += (weight[i] + weight[j]) * distance(i, j)

Computation 2 has two variants: