# ESE535: Electronic Design Automation

Day 9: February 14, 2011
Placement
(Intro, Constructive)

Penn ESE535 Spring 2011 – DeHon



#### Behavioral (C, MATLAB, ...) Today Arch. Select RTL FSM assign • 2D Placement Problem Two-level, Multilevel opt. Partitioning→Placement Covering Quadrisection Retiming Gate Netlist Refinement Placement Routing Layout Masks enn ESE535 Spring 2011 -- DeHon

### Placement

- Problem: Pick locations for all building blocks
  - minimizing energy, delay, area
  - really:
    - · minimize wire length
    - · minimize channel density

Penn ESE535 Spring 2011 – DeHon

# **Bad Placement**

- · How bad can it be?
  - Area
  - Delay
  - Energy

Penn ESE535 Spring 2011 -- DeHon

# **Preclass Channel Widths**

- Channel Width for Problem 1?
- Channel Width for Problem 2?

Penn ESE535 Spring 2011 -- DeHon



# Bad: Delay

- · All critical path wires cross chip
- Delay =O(|PATH|\*2\*L<sub>side</sub>) - [and L<sub>side</sub> is O(N)]
- good: O(|PATH|\* L<sub>q</sub>)
- · compare 20ps gates to many nanoseconds to cross chip

enn ESE535 Spring 2011 -- DeHon



# Bad: Energy

- · All wires cross chip:
  - $O(L_{side})$  long  $\rightarrow O(L_{side})$  capacitance per wire
  - Recall Area→O(N²)
  - So L<sub>side</sub> → O(N)
  - $\times O(N)$  wires  $\rightarrow O(N^2)$  capacitance
- · Good:

O(1) long wires  $\rightarrow O(N)$  capacitance

Penn ESE535 Spring 2011 – DeHon



# Manhattan Distance



 $|X_i-X_j|+|Y_i-Y_j|$ 

• Contrast: Euclidean distance



SE534 -- Spring 2010 -- DeHon



# Distance · Can we place everything close?

enn ESE535 Spring 2011 -- DeHon



# Illustration

- · Consider a complete tree
  - nand2's, no fanout
  - N nodes
- · Logical circuit depth?
- · Circuit Area?
- · Side Length?
- Average wire length between nand gates? (lower bound)

Penn ESE535 Spring 2011 -- DeHon

1/





# Placement Problem Characteristics

- Familiar
  - NP Complete
  - local, greedy not work
  - greedy gets stuck in local minima

Penn ESE535 Spring 2011 -- DeHon

17

# Constructive Placement

# Basic Idea

- Partition (bisect) to define halves of chip
   -minimize wire crossing
- · Recurse to refine
- When get down to single component, done



# Adequate?

 Does recursive bisection capture the primary constraints of two-dimensional placement?

Penn ESE535 Spring 2011 -- DeHon

20

# **Problems**

- · Greedy, top-down cuts
  - maybe better pay cost early?
- · Two-dimensional problem
  - (often) no real cost difference between H and V cuts

21

- · Interaction between subtrees
  - not modeled by recursive bisect

Penn ESE535 Spring 2011 -- DeHon







# Problem

- · Need to keep track of where things are
  - outside of current partition
  - include costs induced by above
- ...but don't necessarily know where things are
  - still solving problem

Penn ESE535 Spring 2011 - DeHon

25

# Improvement: Ordered

- · Order operations
- · Keep track of existing solution
- Use to constrain or pass costs to next subproblem



Penn ESE535 Spring 2011 -- DeHon

# Improvement: Ordered

- · Order operations
- · Keep track of existing solution
- Use to constrain or pass costs to next subproblem
- Flow cut
  - use existing in src/sink
  - A nets = src, B nets = sink



Penn ESE535 Spring 2011 - DeHon

# Improvement: Ordered

- · Order operations
- · Keep track of existing solution
- Use to constrain or pass costs to next subproblem
- · Flow cut
  - use existing in src/sink
  - A nets = src, B nets = sink
- FM: start with fixed, unmovable nets for side-biased inputs

Penn ESE535 Spring 2011 -- DeHon



2

# Improvement: Constrain

- Partition once
- Constrain movement within existing partitions
- · Account for both H and V crossings
- · Partition next
  - (simultaneously work parallel problems)
  - easy modification to FM

Penn ESE535 Spring 2011 - DeHon

29

# Constrain Partition Constrain Partition Constrain Partition B Solve AB and CD concurrently.

# Improvement: Quadrisect

- · Solve more of problem at once
- · Quadrisection:
  - partition into 4 bins simultaneously
  - keep track of costs all around

Penn ESE535 Spring 2011 – DeHon



# Quadrisect • Cases (15): - (1 partition) → 4 - (2 part) → 6 = (4 choose 2) - (3 part) → 4 = (4 choose 3) - (4 part) → 1











# Iteration/Cycling · General technique to deal with phaseordering problem - what order do we perform transformations. make decisions? - How get accurate information to everyone

· Still basically greedy

Penn ESE535 Spring 2011 - DeHon



# Possible Refinement

- · Allow unbalanced cuts
  - most things still work
  - just distort refinement groups
  - allowing unbalance using FM quadrisection looks a bit tricky
  - gives another 5-10% improvement

Penn ESE535 Spring 2011 - DeHon

41

# Runtime

- Each gain update still O(1)
  - (bigger constants)
  - so, FM partition pass still O(N)
- O(1) iterations expected
- · assume O(1) overlaps exploited
- O(log(N)) levels
- Total: O(N log(N))
  - very fast compared to typical annealing

(annealing next time)
enn ESE535 Spring 2011 -- DeHon

42

#### Gordian-L: Analytic global placer DOMINO: network flow detail Quality: Area prim1 prim2 ind2 ind3 fract C1908 C5315 C6288 s1423 s1488 s5378 s9234 s13207 s15850 struct biomed 10208 44478 380194 970068 380 1830 6185 8312 2265 2470 8208 13848 10500 45994 436300 1121000 400 1858 6220 8794 2334 2680 8609 43705 417264 1048673 148673 383 1767 5922 8339 2208 2558 8182 14023 14848 $\begin{array}{r} 14848 \\ 31284 \\ 37020 \\ 4160 \\ 34677 \\ 95648 \\ 100650 \end{array}$ 13848 28161 33625 4196 33787 95867 101930 29995 35591 3967 33712 92355 97825 avq\_s avq\_l [Huang&Kahng/ISPD1997] enn ESE535 Spring 2011 - DeHon 43

# Quality: Delay

Weight edges based on criticality
 Periodic, interleaved timing analysis

| Case   | Measure          | Max Intrinsic<br>Path Delay | TW7.0           | Timing-<br>QUAD    |
|--------|------------------|-----------------------------|-----------------|--------------------|
| fract  | Delay<br>MSTx100 | 10.6                        | $17.9 \\ 349$   | $\frac{18.1}{347}$ |
| struct | Delay<br>MSTx100 | 40.0                        | $78.8 \\ 5130$  | 79.3<br>5103       |
| avq_s  | Delay<br>MSTx100 | 37.3                        | $61.4 \\ 46763$ | $60.9 \\ 47153$    |

Penn ESE535 Spring 2011 -- DeHon

4.4

### Uses

- · Good by self
- · Starting point for simulated annealing
  - speed convergence
- With synthesis (both high level and logic)
  - get a quick estimate of physical effects
  - (play role in estimation/refinement at larger level)
- · Early/fast placement
  - before willing to spend time looking for best
- For fast placement where time matters
  - FPGAs, online placement?

Penn ESE535 Spring 2011 - DeHon

# Summary

- · Partition to minimize cut size
- · Additional constraints to do well
  - Improving constant factors
- Quadrisection
- · Keep track of estimated placement
- · Relax/iterate/Refine

Penn ESE535 Spring 2011 -- DeHon

46

# Admin

- · Reading for Wednesday
  - Online (JSTOR): classic paper on Simulated Annealing
- · Assignment 3 out
- · Assignment 2b
  - Don't expect graded as fast as 2a
- · Drop Day is Friday
  - I leave Thurs. aft., out Friday
- Office Hours Tuesday

Penn ESE535 Spring 2011 - DeHon

47

45

# Big Ideas:

- · Potential dominance of interconnect
- · Divide-and-conquer
- · Successive Refinement
- Phase ordering: estimate/relax/iterate

Penn ESE535 Spring 2011 -- DeHon

48