Computer Science 294-7 Lecture #14

Notes by Christoforos Kozyrakis and William Tsu

Slide 3: Long lines: highly capacitive (2-4 pf - from prof. Jan Rabaey), as a lot of source/drain junction capacitances attached to it.
Capacitive stubs: mostly source/drain capacitances of the attached transistors.
Resistive switches: the channel of a transistor is resistive.

Slide 6: Model drive transistor as a turn-on PMOS connecting Vdd to the output line. Receiving gate presents only input capacitance. Thus, interconnect exhibits RC delay.
R(driver transistor) = 27.5KOhm * 0.6u / Wp (= 12u)= 1.38KOhm
C(receiver) = C(gate) = C(gate of NMOS) + C(gate of PMOS) = (3u * 0.6u + 6u * 0.6u) * 0.0038pf / um^2 = 0.0205pf
RC time constant = 0.028ns
Note: the lecture notes has a different C(receiver), as it calculates the driver gate capacitance instead.

Slide 7: A wire can be modeled by distributed RCs. The bottom-most schematic shows the model for it.

Slide 8: The Elmore delay model. k denotes # of segments. As k approaches infinity, the time constant = .5RC * k^2. The time constant grows quadratically with the length of the wire.
Note that the lumped model would predict RC * k^2, and is wrong, as it is too pessimistic.

Slide 9: Again, some typical values of sheet resistance and capacitance. for the HP CMOS 14 process.

Slide 10: The 0.4 part represents the distributed interconnect delay model.
The 0.7 part is the conventional lumped delay model.

Slide 11: The diagram shows some details about the connection and the switch block.

Slide 12: A configuration memory bit controls whether the pass transistor is on or off.

Slide 13: Open switches connected to a wire introduce capacitance. This is their drain (or source) capacitance.

Slide 14: Drain capacitance has two basic components. Junction capacitance, that grows with the drain area, and sidewall capacitance, that grows with drain perimeter.

Slide 15: While the drain capacitance of a single open transistor is insignificant and has a very small affect on interconnection delay, having many of them connected to a wire does matter. Assuming Nsw/ch open switches per channel, the interconnect delay is that of the given equation. The open switch capacitance is multiplied with the quadratic l^2.

Slide 16: On the other hand, closed switches introduce series resistance along the signal path.

Slide 17: This the new delay model, accounting for the closed switch resistance as well. Nsw closed switches from one logic block to the other assumed.

Slide 18: Increasing the switch W increases the drain capacitance but reduces the resistance. Since the capacitance is multiplied with l^2, while the resistance with l, for large values of l, minimizing C is more important.

Slide 19: For connecting distant logic blocks, one can use segmented long wires. In this case you don't attach the wire to a switch matrix at every logic block. Reducing the number of switch matrices per segmented line (increasing number of logic blocks between two switch matrices) reduces delay.

Slide 20: Delay on segmented lines can be further reduced by reducing the number of switches connected to each one. After all, according to Rent's rule, you don't need that many number of connections to wires interconnecting distant components on a die. Or, one can have a single connection on the long segmented wire and then use that wire to distribute the signal (through a buffer) to more places locally.

Slide 21: The Altera interconnection model is to provide local connections for elements within LAB (Logic Array Block) and the arrays and columns of fast interconnect for inter-LAB connections. Connections within single row or column are much faster than connection that involve both rows and commons. One can use the reported delays for Altera components in order to verify the validity of the delay model we used so far, which proves to be quite accurate.

Slide 22: Up to now, we assumed that each signal has just one receiver. Yet, in the case of fanout larger than 1, the signal has to go through multiple branches. The resistance and capacitance of these branches increases delay.

Slide 23: Interconnection delay can be reduced by adding buffers along the way. By buffering signals at each switch matrix, we eliminate some RC affects. Instead of having the signal go through l pass transistors from driver to receiver, it only has to go through one from buffer to buffer.

Slide 24: Since RC affect are isolated in between buffers, the delay is l times that from from one switch matrix to the other. The signal goes just 1 pass transistor and is affected by the stub capacitance of Nsw pass transistors. Therefore, the quadratic factor l^2 is avoided.

Slide 24: One can see from the graph that buffering makes delay grow linearly with l instead of quadratically. Yet, for signals that just have to travel along a small number of logic blocks (l<4), buffering increases delay. This is because for such small l, the quadratic factor is not important.

Slide 28: Here we try to calculate the optimal number of buffers (k) to use across a path. Starting with the path delay T and assuming that we add k buffers, the new total delay is Tbuf. Minimizing Tbuf with respect to k, lead to the optimal umber of buffers as a function of l and the RC characteristics of the design.

Slide 27: Buffers could be added to the outputs of each switch matrix. Yet, adding a buffer turns a wire from bidirectional to unidirectional. Xilinx (in newer designs) provides buffering of switch matrix outputs which can be bypassed. In other words one could select to use the matrix as in older designs (ignoring the existence of buffers) or route signals through them.