CS294-7 Reconfigurable Computing -- Lecture #28 (5/1/97), Programming Styles

CS294-7 Reconfigurable Computing -- 5/1/97
Programming Styles

This lecture looked at different programming styles for reconfigurable devices. We began by enumerating a list of desirable properties for a programming style, pointing out how programming reconfigurable devices differs from programming on conventional, sequential machines, and commenting on why programming for these types of devices is hard. We then looked at a variety of styles and examined pros and cons in the context of reconfigurable computing, beginning with low-level netlist/RTL programming and working our way up towards higher levels of programming abstraction.

There are two classes of goals for a programming style for reconfigurable devices.

The first class includes properties of a programming style that are independent of reconfigurable computing. These include properties such as ease of development, portability, and debugging support. Fast translation is an issue for any kind of programming, but is particularly important for reconfigurable computing, where mapping from a specification to a bitstream is typically a time-consuming process (compare compilers for conventional processors to the time spent during mapping with CAD tools).

Being able to adapt to varying performance requirements and resource availability is also important. For example, suppose a program is written for a device that consists of 100 CLBs; being able to use the same program, unmodified or with little change, for a device that scales up and uses 1000 CLBs is desirable. Being able to take advantage of reconfigurable devices' high computational density by achieving a fraction of a device's density is also important. Finally, it would be convenient to have an abstract performance model that allows us to achieve predictable performance. Having the option to make the reconfigurable hardware transparent to the programmer might also be desirable in some cases; in many cases, though, we will probably want some aspects of the hardware exposed, either for the programmer or the compiler writer.

Before diving into issues that make programming reconfigurable devices difficult, we first reflected back on the topics which we've already covered in the course, are well studied, and where we already have adequate solutions.

Overspecification occurs when the programmer specifies too much of a problem. This places a limit on the types of transformations a tool can easily apply. For example, in the case of a bubble sort, where many faster sorting algorithms exist, it might be difficult for the tools to be able to abstract away the specifics the programmer has provided to uncover the actual task it is supposed to perform (i.e. instead of sort this list of elements using a bubble sort, sort this list of elements in an efficient way).

At other times, a programmer might provide too little information, underspecifying the problem making it difficult for tools to produce an efficient solution (in reasonable time).

Some example of overspecialization

Sequentialization -- undoing sequentialization of conventional programming languages and machines and trying to find opportunities to take advantage of reconfigurable logic (spatial computing, specialization, etc.)

Segregating memory pools

Changing visible timing -- pipelining, etc.

Excessive data width -- data widths specificed by the application don't match actual usage []. Example: compute using 4 bit wide data, but specify 32 bit wide variables.

and underspecialization:

Picking adequate data size -- no information from the programmer -> how many bits to assign to data? Probably better to overspecify and work backwards.
Locality clustering -- no information from the programmer -> how to place computation to get locality (e.g. how to co-locate producers and consumers).
Discover structure/commonality
FSM partitioning -- optimally partitioning a large, central control unit into a set of smaller, cooperating control units is still appears to be a hard problem; there's evidence that a collection of smaller control units can do better than building a single, monolithich controller, but how do we find the decomposition? It is feasible to composite a collection of controllers into a single description for a cannonical, though perhaps not efficient, representation of the control state space. However, it's unclear how to best partition this central control into subcontrollers.

Time constraints prevented us from exploring the full list presented here. Styles 1-4 and 9 are commented on.

User specifies a netlist (via text file or schematic capture), which are mapping to LUTs, scheduled, placed and routed. Project 1 is an example of programming at this level. Those who did their own floorplanning or used placement directives went one level down.

Some pros and cons of programming at this level.

Moving one level up in abstraction, we could have the user specify the behavior of the circuit using a language like Verilog or VHDL.

Some pros and cons of programming using a behavior HDL.

Christoforos likes the idea of being able to make RC transparent by programming in a "high-level" language like C. ;) This is a nice goal. Extracting any parallelism by programming this way is probably going to be hard though. Consider that, even after many years of research in the parallel computing community, we still don't have ways to extract substantial parallelism from sequential applications. I can't imagine the situation being much easier for RC.

With no objections, André notes that, in this context, there are no significant differences between programming in C or x86 assembly, so 3 and 4 in our original list are merged.

Summary: Hard to improve substantially from pure sequential execution (note the "subtle" imbalance of bullets in the Bad category). With no programmer assistance and no exposure of the hardware to the programmer, overspecification and finding parallelism are challenging problems.

Libraries of solutions for different problem domains are provided. The user specifies an instance of a supported problem domain , which is then mapped directly to an implementation.

Assuming enough people write mappers for important problems that show up frequently enough, this seems like a reasonable approach to for programming certain applications.

5/1/97, BNC <bnc@cs.berkeley.edu>