CS294-7 Scribe Notes

Day 17: Tuesday, March 18

Interconnect

Philip Chong

Routing

The routing problem for FPGAs is the assignment of physical wires on the FPGA to individual nets in a design. Recall that a number of the goals of placement are to help minimize or simplify the routing problem; it is thus natural to consider routing after placement is performed. One of the goals of placement was to minimize the amount of interconnect (wiring) needed; by placing connected logic blocks close together, we hope to reduce the area consumed by the wiring. Another goal is to try to evenly distribute the routing across the chip, so that the largest channel required is as small as possible. Note that the FPGA channel capacity is fixed on fabrication, so the placement and routing tools must ensure that no channel is overused for any design which might be implemented.

Much of the work in routing has been empirical in nature, with heuristics chosen out of intuitive thinking.

To simplify the routing process, we can identify three subproblems which can be independently dealt with:

The Shortest Path problem is to find the shortest path between two endpoints of a net; the distance metric here is some weighting assigned to each routing resource on the FPGA, related to the size (length) of the wire, the delay associated with the wire, or both. Note that this problem can be solved in polynomial time using standard graph algorithms (e.g. Floyd-Warshall for all-pairs shortest paths).
The Steiner Tree problem is to find a minimum-length routing network between all endpoints of a given net. Note that this problem in NP-complete in general; however for small nets (with 5 or fewer endpoints), the problem is fairly easy. Note that for a k-LUT based FPGA, the average fanout is only k, and so we expect to be able to solve the average case of this problem instantiated on such an FPGA. Also note that most FPGAs have dedicated resources for high-fanout signals (clocks, resets, etc.), so we do not have to build Steiner Trees for these signals.
The Compatibility problem is to find satisfactory routes for several nets simultaneously; this problem is NP-complete.

We can see an example of a compatibility problem in the following slide. Here, we must satisify three simultaneous net connections (A-A, B-B, C-C) given three tracks in a hypothetical FPGA structure. Note that the A-A route cannot use the middle track, as this will block either B-B or C-C.

In general, note that a purely greedy approach will not be sufficient; at the least, we should try to reduce the utilization of routing resources which are shared (i.e. could be used in many other possible routes), as this may block another route. Also note that the consequences for having a connection blocked are much more dire for an FPGA than for a standard cell/custom implementation; in the latter case we may simply add more channels at a relatively small penalty in chip area. For the FPGA, no such luxury exists; such a design simply cannot be implemented on an architecture without sufficiently wide channels.

A standard approach to routing only considers the minimum length routes as possible candidates for a solution. This reduces the search space to a manageable size; however, since this approach only considers locally optimal solutions, some global optimum may be missed. That is, the best overall solution may have some nets of longer than minimum length. The number of bends in the routes can also be taken into consideration here; we will see shortly how minimizing the number of bends may help improve the quality of the routes.

Often, nets with more than two endpoints will be routed by taking the vertices in a successive, pairwise fashion. Again, this is a step taken primarily to simplify the routing problem.

Iterative improvement techniques are used to further reduce the search space; only simple transformations on an existing routing are allowed. One possibility is to deal with single nets at a time, ripping up existing routes and rerouting to try reduce the cost of the solution. Another method is to use a simulated annealing algorithm to swap pairs of nets.

The routing task can be divided into a global routing stage and a detailed routing stage.

Global Routing

Global routing is an abstraction; channels are considered to have some capacity (number of wires) associated with them; global routing assigns nets to channels, but not to individual wires within those channels. Switching resources are also abstracted; these are considered to be nonblocking crossbar networks, allowing a signal on any incoming channel to be routed to any outgoing channel.

Minimizing the number of bends in a routed net made good sense when dealing with printed circuit boards or custom/semi-custom IC designs; bends often require that vias be inserted, since routing is typically directional on each signal layer and thus changing directions requires changing signal layers. For a segmented FPGA, bend reduction is beneficial is a different manner. Longlines are a typical routing resource available in such devices, and making full use of these requires that a net be routed in a single direction for a relatively long distance. Thus, reducing bends allows longlines to be used more effectively.

Bend reduction does have some pitfalls, however. The routing congestion might increase when we minimize bends, since this imposes ``artificial'' routing constraints. This might force routes to use already congested channels, instead of adding a few bends to avoid these situations.

An iterative approach to global routing is shown in the slide below. Here, we try to route single nets one at a time, ordered based on some criteria (such as criticality or a congestion measure based on the network's current routing). Critical nets can thus get priority to use certain limited resources, such as longlines, which might benefit performance. When all nets are routed, the entire process is repeated to try to improve the results; in repeating, we may choose just to ripup and reroute individual nets sequentially, or reroute the entire design at once. Of course, some information must be kept between passes, as we do not want to find the same routes each iteration. Should no suitable global route be found (i.e. due to channel capacity limits), the placement process must be redone. Again, some of the information obtained in the routing procedure should be used to avoid implementing the same placement again.

It was noted that three or four complete rerouting iterations can acheive about a 10% improvement over the original routing. Also, it was pointed out that global routing is simple given an FPGA with a hierarchical network; given a placement, there exists only a single minimal-length global route between any two logic cells.

Another approach to global routing is to use simulated annealing techniques. Allowed moves for the annealing are pairwise swaps of nets, while the cost of a network could be measured in terms of channel densities, delays, or some combination of both.

An example of this approach can be found in [1]; see especially Figures 6--8. The authors demonstrate an example where simulated annealing gave better than 45% reduction in routing over a purely random process.

Detailed Routing

Detailed routing entails taking the nets in a design and assigning them to physical wires, based on the channel assignments obtained during global routing. Thus, the global routing serves to reduce the search space of the detailed router.

The SEGA detailed router [2] takes single connections (paths) and routes them sequentially, in order of some cost function. The cost can be related to routability or delay. Since paths are routed sequentially, it is possible that a particular path may block the routing for other subsequent paths. This is true if the routing network has restrictive switch matrices (e.g. the XC4000 switch box); the universal switch matrix grants more flexibility to prevent blocking.

The measure of congestion used in SEGA is based of the number of alternate routes available for a given path. Let k be the number of unused wires which run in parallel with a given wire segment. The demand on a wire segment is defined as the sum of (1/k) over all connections which use that wire segment. The demand cost function for a path is the sum of the demands of the wire segments which make up that path. Note that as wires are routed, the demand of the wire segments used will go up, since there will be fewer free wires running parallel to that segment. Thus a high demand indicates fewer routing alternatives and increased congestion.

The delay of a path is measured in SEGA using two metrics:

The number of segments used for the path. This is actually normalized, so that this measure is actually the percentage of the number of segments actually used which exceed the minimum required number of segments; this is 1-(minimum required segments)/(actual segments used).
The segment lengths used for the path. Again this is a normalized figure, being the ratio of the wasted segment lengths (i.e. the portions of the wire segments used which do not contribute to getting the signal from its source to the destination) divided by the total of the segment lengths used by the route.

In SEGA, the overall measure for delay on a path can be taken as either of the two metrics above, or the sum of the two.

The slide below shows the results obtained using the various metrics for SEGA; the delays listed in the table correspond to an RC delay model of the routed network. The first row corresponds to using just the demand cost function described above. Results from using the Segment Length, Number of Segments and the combination of these two metrics are listed in the table as well. The penultimate row shows the results obtained when the analytic RC model is used directly for delay estimation, and the last row gives the results when the RC delay model is used along with some consideration for reusing wire segments which belong to the same net; this gets some of the benefits of routing entire nets at once, rather than single two-point paths. Note that simply using the number of wire segments as the cost function gives results comparable to using a more complex RC delay model.

The FPGA structure assumed here consists of length 1, length 2, and length 3 wires. Channels are assumed to have 30 tracks each. Also, the figures in the table below correspond to the average delay across a set of benchmarks; nothing is said about the critical path delay.

Figures 6 and 7 and Tables 5 and 6 from [2] were presented. Figure 6 shows the effects of applying bend reduction when routing on FPGA structures with homogeneous (i.e. all equal) wire segment lengths; as segment length increases, applying bend reduction seems to have greater benefit. This seems in concordance with the intuition that reducing bends allows better use of longlines.

Table 5 shows the effects of varying the relative distribution of length 1, length 2 and length 3 segment lengths. The table suggests that, for minimizing delay, the best mixture of segment lengths is to have mostly length 3 segments, with some (about 30%) length 2 segments, and no length 1 segments. Also, it is clear from this table that routing using the path cost function for delay does indeed give better delay figures over using the area (congestion) metric.

Figure 7 and Table 6 illustrate the area costs associated with increasing wire segment lengths. Here, the measure of area inefficiency is the number of tracks used in excess of the number estimated by the global router. It is fairly clear that having longer wire segments extracts a penalty in that more tracks per channel are required. Also, the excess tracks required when routing using the delay metric seems to grow more quickly compared to using the area/congestion metric as the segment lengths increase.

A possible scheme for balancing area and delay considerations would be to route the delay-critical nets using the delay metric, while routing the remaining nets using the congestion metric. This should keep the track counts low while not exacting a performance penalty on the overall design.

Summary

References

S. Kirkpatrick, C.D. Gelatt, Jr., M.P. Vecchi, ``Optimization by Simulated Annealing'', Science, v. 220, n. 4598, pp. 671--680, May 1983.
S. Brown, G. Lemieux, M. Khellah, ``Segmented Routing for Speed-Performance and Routability in Field-Programmable Gate Arrays, Journal of VLSI Design, v. 4, n. 4, pp. 275--291, 1996. Online version

pchong@cad.eecs.berkeley.edu
$Id: index.html,v 1.1 1997/03/11 18:00:54 pchong Exp pchong $