# ESE535: Electronic Design Automation

Day 19: April 8, 2009 Placement (Intro, Constructive)

enn ESE535 Spring 2009 -- DeHor



# Today

- Placement Problem
- Partitioning→Placement
- Quadrisection
- Refinement

enn ESE535 Spring 2009 -- DeHon

#### **Placement**

- Problem: Pick locations for all building blocks
  - minimizing energy, delay, area
  - really:
    - minimize wire length
    - · minimize channel density

enn ESE535 Spring 2009 -- DeHon

#### **Bad Placement**

- How bad can it be?
  - Area
  - Delay
  - Energy

Penn ESE535 Spring 2009 -- DeHon

#### Bad: Area

- All wires cross bisection
- O(N2) area
- good: O(N)



enn ESE535 Spring 2009 -- DeHor

## Bad: Delay

- · All critical path wires cross chip
- Delay =O(|PATH|\*2\*L<sub>side</sub>)
   [and L<sub>side</sub> is O(N)]
- good: O(|PATH|\* L<sub>cell</sub>)
- compare 50ps gates to many nanoseconds to cross chip

Penn ESE535 Spring 2009 -- DeHon



# Bad: Energy

- All wires cross chip:
  - $O(L_{side})$  long  $\rightarrow O(L_{side})$  capacitance per wire
  - Recall Area→O(N²)
  - So L<sub>side</sub> → O(N)
  - $\times O(N)$  wires  $\rightarrow O(N^2)$  capacitance
- Good:

O(1) long wires  $\rightarrow O(N)$  capacitance

enn ESE535 Spring 2009 -- DeHon

#### Distance

• Can we place everything close?



Penn ESE535 Spring 2009 -- DeHon

## "Closeness"

• Try placing "everything" close

| Manhattan<br>Distance | Places     | Transitive<br>Fanin   |
|-----------------------|------------|-----------------------|
| 1                     | 4          | 4                     |
| 2                     | 8          | 16                    |
| 3                     | 12         | 64                    |
| i                     | ı          | ı                     |
| n                     | <b>4</b> n | <b>4</b> <sup>n</sup> |
|                       |            |                       |
|                       |            |                       |



333333333333 2222 1

#### Illustration

- Consider a complete tree
  - nand2's, no fanout
  - N nodes
- · Logical circuit depth?
- · Circuit Area?
- Side Length?
- Average wire length between nand gates? (lower bound)

enn ESE535 Spring 2009 -- DeHon

# Another Example

- Consider a cut size  $F(N) > \sqrt{N}$
- If optimally place all F(N) producers right next to bisection
  - How many cells deep is producer farthest from the bisection?
- Lower bound on wire length?

enn ESE535 Spring 2009 -- DeHon

11

## **Problem Characteristics**

- Familiar
  - NP Complete
  - local, greedy not work
  - greedy gets stuck in local minima

Penn ESE535 Spring 2009 -- DeHor

13

#### Constructive Placement

14

Penn ESE535 Spring 2009 -- DeHor

#### Basic Idea

- Partition (bisect) to define halves of chip

   minimize wire crossing
- · Recurse to refine
- When get down to single component, done





# Adequate?

 Does recursive bisection capture the primary constraints of two-dimensional placement?

Penn ESE535 Spring 2009 -- DeHon

## **Problems**

- · Greedy, top-down cuts
  - maybe better pay cost early?
- Two-dimensional problem
  - (often) no real cost difference between H and V cuts
- Interaction between subtrees
  - not modeled by recursive bisect

Penn ESE535 Spring 2009 -- DeHon







#### **Problem**

- Need to keep track of where things are
  - outside of current partition
  - include costs induced by above
- ...but don't necessarily know where things are
  - still solving problem

enn ESE535 Spring 2009 -- DeHon

# Improvement: Ordered

- · Order operations
- Keep track of existing solution
- Use to constrain or pass costs to next subproblem



Penn ESE535 Spring 2009 -- DeHon

# Improvement: Ordered

- · Order operations
- · Keep track of existing solution
- Use to constrain or pass costs to next subproblem
- Flow cut
  - use existing in src/sink
  - -A nets = src, B nets = sink



enn ESE535 Spring 2009 -- DeHon

# Improvement: Ordered

- Order operations
- Keep track of existing solution
- Use to constrain or pass costs to next subproblem
- Flow cut
  - use existing in src/sink
  - A nets = src, B nets = sink
- FM: start with fixed, unmovable nets for side-biased inputs

Penn ESE535 Spring 2009 -- DeHon



# Improvement: Constrain

- · Partition once
- Constrain movement within existing partitions
- Account for both H and V crossings
- · Partition next
  - (simultaneously work parallel problems)
  - easy modification to FM

enn ESE535 Spring 2009 -- DeHon

25



# Improvement: Quadrisect

- · Solve more of problem at once
- · Quadrisection:
  - partition into 4 bins simultaneously
  - keep track of costs all around

enn ESE535 Spring 2009 -- DeHon

27

#### Quadrisect

- Modify FM to work on multiple buckets
- k-way has:
  - k(k-1) buckets
  - $\ |\mathsf{from}| \times |\mathsf{to}|$
  - quad→ 12
- · reformulate gains
- update still O(1)

Penn ESE535 Spring 2009 -- DeHon

20

#### Quadrisect

- Cases (15):
  - $-(1 partition) \rightarrow 4$
  - $-(2 part) \rightarrow 6 = (4 choose 2)$
  - $-(3 \text{ part}) \rightarrow 4 = (4 \text{ choose } 3)$
  - (4 part) → 1

Penn ESE535 Spring 2009 -- DeHon











# Iteration/Cycling • General technique to deal with phase-ordering problem – what order do we perform transformations, make decisions? – How get accurate information to everyone • Still basically greedy



#### Possible Refinement

- · Allow unbalanced cuts
  - most things still work
  - just distort refinement groups
  - allowing unbalance using FM quadrisection looks a bit tricky
  - gives another 5-10% improvement

Penn ESE535 Spring 2009 -- DeHo

37

#### Runtime

- Each gain update still O(1)
  - (bigger constants)
  - so, FM partition pass still O(N)
- O(1) iterations expected
- assume O(1) overlaps exploited
- O(log(N)) levels
- Total: O(N log(N))
  - very fast compared to typical annealing
    - (annealing next time)

Penn ESE535 Spring 2009 -- DeHon

# Quality: Area

|        | GORD-L  | DOMINO  | QUAD   | Impr. | Impr. |
|--------|---------|---------|--------|-------|-------|
| Case   | MSTx100 |         |        | GOR-L | DOMI  |
| prim1  | 10500   | 10059   | 10208  | 2.8%  | -1.5% |
| prim2  | 45994   | 43705   | 44478  | 3.3%  | -1.8% |
| ind2   | 436300  | 417264  | 380194 | 12.9% | 8.9%  |
| ind3   | 1121000 | 1048673 | 970068 | 13.5% | 7.5%  |
| fract  | 400     | 383     | 380    | 5.0%  | 0.8%  |
| C1908  | 1858    | 1767    | 1830   | 1.5%  | -3.6% |
| C5315  | 6220    | 5922    | 6185   | 0.6%  | -4.4% |
| C6288  | 8794    | 8339    | 8312   | 5.5%  | 0.3%  |
| s1423  | 2334    | 2208    | 2265   | 3.0%  | -2.6% |
| s1488  | 2680    | 2558    | 2470   | 7.8%  | 3.4%  |
| s5378  | 8609    | 8182    | 8208   | 4.7%  | -0.3% |
| s9234  | 14848   | 14023   | 13848  | 6.7%  | 1.3%  |
| s13207 | 31284   | 29995   | 28161  | 9.9%  | 6.1%  |
| s15850 | 37020   | 35591   | 33625  | 9.2%  | 5.5%  |
| struct | 4160    | 3967    | 4196   | -0.9% | -5.8% |
| biomed | 34677   | 33712   | 33787  | 2.6%  | -0.2% |
| avq_s  | 95648   | 92355   | 95867  | -0.2% | -3.8% |
| avq_l  | 100650  | 97825   | 101930 | -1.3% | -4.2% |
| Impr.  |         |         |        | 4.8%  | 0.3%  |

Penn ESE535 Spring 2009 -- DeHon

[Huang&Kahng/ISPD1997]

# Quality: Delay

- Weight edges based on criticality
  - Periodic, interleaved timing analysis

| Case   | Measure          | Max Intrinsic<br>Path Delay | TW7.0           | Timing-<br>QUAD     |
|--------|------------------|-----------------------------|-----------------|---------------------|
| fract  | Delay<br>MSTx100 | 10.6                        | 17.9<br>349     | $\frac{18.1}{347}$  |
| struct | Delay<br>MSTx100 | 40.0                        | $78.8 \\ 5130$  | $\frac{79.3}{5103}$ |
| avq_s  | Delay<br>MSTx100 | 37.3                        | $61.4 \\ 46763$ | $60.9 \\ 47153$     |

Penn ESE535 Spring 2009 -- DeHon

40

#### Uses

- · Good by self
- Starting point for simulated annealing
  - speed convergence
- With synthesis (both high level and logic)
- get a quick estimate of physical effects
- (play role in estimation/refinement at larger level)
- Early/fast placement
  - before willing to spend time looking for best
- For fast placement where time matters
  - FPGAs, online placement?

Penn ESE535 Spring 2009 -- DeHon

41

# Summary

- · Partition to minimize cut size
- · Additional constraints to do well
  - Improving constant factors
- · Quadrisection
- · Keep track of estimated placement
- Relax/iterate/Refine

enn ESE535 Spring 2009 -- DeHon

# Admin

- Reading for Monday
  - Online (JSTOR): classic paper on Simulated Annealing
- Assignment 5 out
  - Retiming
  - Programming: 1D Placement
    - Channel width optimization

Penn ESE535 Spring 2009 -- DeHon

43

# Big Ideas:

- Potential dominance of interconnect
- Divide-and-conquer
- Successive Refinement
- Phase ordering: estimate/relax/iterate

nn ESE535 Spring 2009 -- DeHon