Slide 3:
Long lines: highly capacitive (2-4 pf - from prof. Jan Rabaey), as a lot of
source/drain junction capacitances attached to it.
Capacitive stubs: mostly
source/drain capacitances of the attached transistors.
Resistive switches: the channel of a transistor is resistive.
Slide 6:
Model drive transistor as a turn-on PMOS connecting Vdd to the output line.
Receiving gate presents only input capacitance.
Thus, interconnect exhibits RC delay.
R(driver transistor) = 27.5KOhm * 0.6u / Wp (= 12u)= 1.38KOhm
C(receiver) = C(gate) = C(gate of NMOS) + C(gate of PMOS)
= (3u * 0.6u + 6u * 0.6u) * 0.0038pf / um^2
= 0.0205pf
RC time constant = 0.028ns
Note: the lecture notes has a different C(receiver), as it calculates the
driver gate capacitance instead.
Slide 7:
A wire can be modeled by distributed RCs.
The bottom-most schematic shows the model for it.
Slide 8:
The Elmore delay model.
k denotes # of segments.
As k approaches infinity, the time constant = .5RC * k^2.
The time constant grows quadratically with the length of the wire.
Note that the lumped model would predict RC * k^2, and is wrong, as it is
too pessimistic.
Slide 9:
Again, some typical values of sheet resistance and capacitance. for the
HP CMOS 14 process.
Slide 10:
The 0.4 part represents the distributed interconnect delay model.
The 0.7 part is the conventional lumped delay model.
Slide 11:
The diagram shows some details about the connection and the switch block.
Slide 12:
A configuration memory bit controls whether the pass transistor is on or
off.
Slide 13:
Open switches connected to a wire introduce capacitance. This is their
drain (or source) capacitance.
Slide 14:
Drain capacitance has two basic components. Junction capacitance, that grows
with the drain area, and sidewall capacitance, that grows with drain perimeter.
Slide 15:
While the drain capacitance of a single open transistor is insignificant and
has a very small affect on interconnection delay, having many of them
connected to a wire does matter. Assuming Nsw/ch open switches per channel,
the interconnect delay is that of the given equation. The open switch
capacitance is multiplied with the quadratic l^2.
Slide 16:
On the other hand, closed switches introduce series resistance along the
signal path.
Slide 17:
This the new delay model, accounting for the closed switch resistance as
well. Nsw closed switches from one logic block to the other assumed.
Slide 18:
Increasing the switch W increases the drain capacitance but reduces the
resistance. Since the capacitance is multiplied with l^2, while the
resistance with l, for large values of l, minimizing C is more important.
Slide 19:
For connecting distant logic blocks, one can use segmented long wires. In this
case you don't attach the wire to a switch matrix at every logic block.
Reducing the number of switch matrices per segmented line (increasing
number of logic blocks between two switch matrices) reduces delay.
Slide 20:
Delay on segmented lines can be further reduced by reducing the number of
switches connected to each one. After all, according to Rent's rule, you
don't need that many number of connections to wires interconnecting distant
components on a die. Or, one can have a single connection on the long
segmented wire and then use that wire to distribute the signal (through
a buffer) to more places locally.
Slide 21:
The Altera interconnection model is to provide local connections for elements
within LAB (Logic Array Block) and the arrays and columns of fast interconnect
for inter-LAB connections. Connections within single row or column are much
faster than connection that involve both rows and commons. One can use the
reported delays for Altera
components in order to verify the validity of the delay model we used so
far, which proves to be quite accurate.
Slide 22:
Up to now, we assumed that each signal has just one receiver. Yet, in the
case of fanout larger than 1, the signal has to go through multiple
branches. The resistance and capacitance of these branches increases delay.
Slide 23:
Interconnection delay can be reduced by adding buffers along the way.
By buffering signals at each switch matrix, we eliminate some RC affects.
Instead of having the signal go through l pass transistors from driver
to receiver, it only has to go through one from buffer to buffer.
Slide 24:
Since RC affect are isolated in between buffers, the delay is l times that
from from one switch matrix to the other. The signal goes just 1
pass transistor and is affected by the stub capacitance of Nsw pass
transistors. Therefore, the quadratic factor l^2 is avoided.
Slide 24:
One can see from the graph that buffering makes delay grow linearly with l
instead of quadratically. Yet, for signals that just have to travel along a small
number of logic blocks (l<4), buffering increases delay. This is because
for such small l, the quadratic factor is not important.
Slide 28:
Here we try to calculate the optimal number of buffers (k) to use across
a path. Starting with the path delay T and assuming that we add k buffers,
the new total delay is Tbuf. Minimizing Tbuf with respect to k, lead to the
optimal umber of buffers as a function of l and the RC characteristics of
the design.
Slide 27:
Buffers could be added to the outputs of each switch matrix. Yet, adding a
buffer turns a wire from bidirectional to unidirectional. Xilinx (in newer
designs) provides buffering of switch matrix outputs which can be bypassed. In
other words one could select to use the matrix as in older designs (ignoring
the existence of buffers) or route signals through them.