### ESE5320: Today System-on-a-Chip Architecture **Dataflow Process Model** • Terms (part 1) Issues Abstraction Day 5: September 19, 2022 • Performance Prospects (part 2) **Dataflow Process Model** Basic Approach • As time permits (part 3) - Dataflow variants Penn - Motivations/demands for variants 2 ESE5320 Fall 2022 -- DeH 1 2



3







- · Has own state, including
  - Program Counter (PC)
  - Memory
  - Input/output
- May not actually run on processor
  - Could be specialized hardware block
- May share a processor

## Thread

- Has a separate control location (PC)
- May share memory (contrast process)

   Run in common address space with other threads
- May not actually run on processor
- Could be specialized hardware block
  May share a processor

Penn ESE5320 Fall 2022 -- DeHon

7



8

7



9



10



If run together in lock step

 Either can stall: P=P<sub>f</sub>+P<sub>g</sub>-P<sub>f</sub>P<sub>g</sub>
 T~= 1\*(1-P)+(P)\*100



# Model (from Day 4) Communicating Threads

- Computation is a collection of sequential/control-flow "threads"
- Threads may communicate

   Through dataflow I/O
   (Through shared variables)
- View as hybrid or generalization
- CSP Communicating Sequential Processes → canonical model example
   <sup>13</sup>

13



nn ESE5320 Fall 2022 -- DeHo



15





















**Dataflow Abstracts Timing**  Doesn't say - on which cycle calculation occurs Does say - What order operations occur in - How data interacts · i.e. which inputs get mixed together Permits - Scheduling on different # and types of resources - Operators with variable delay - Variable delay in interconnect 27

27

























**Refine Pipeline** • If operation internally pipelineable, break out pipeline into separate tasks 6,000 6,000 Select Freq. 6,000 6,000 6,000 2,000 Windowed FFT1 Windowed FFT2 - Windowed FFT3 Windowed FFT4 Windowed Select Quantize Entropy Encode 7,500 3,000 Performance with one processor per operation? Achieve same performance with how many processors? ESE5320 Fall 2022 -- DeHon

39







| Heterogeneous Processor  |       |          |          |  |  |  |  |  |
|--------------------------|-------|----------|----------|--|--|--|--|--|
|                          | GPU   | Fast CPU | Slow CPU |  |  |  |  |  |
| Windowed FFT             | 3,000 | 15,000   | 30,000   |  |  |  |  |  |
| Select Freq. 1           |       | 3,750    | 7,500    |  |  |  |  |  |
| Select Freq. 2           |       | 3,750    | 7,500    |  |  |  |  |  |
| Quantize                 |       | 1,500    | 3,000    |  |  |  |  |  |
| Entropy Encode           |       | 1,000    | 2,000    |  |  |  |  |  |
| Penn ESE5320 Fall 2022 D | eHon  |          | 43       |  |  |  |  |  |

































Motivations and Demands for Dataflow Options

**Time Permitting** 

56

Penn ESE5320 Fall 2022 -- DeHor 56









Penn ESE5320 Fall 2022 -- DeHor 59



## Non-Blocking Stream Operations

Blocking

- only operations are read, write
- If data not present, block for data to be available
- · Non-blocking
  - Add operations to ask if data is available (if stream ready for write)

if (not(empty(in1)) next\_pkt=in1.read()
else if (not(empty(in2)) next\_pkt=in2.read()

nn ESE5320 F

62



63



64

# Non-Blocking Removed model restriction Can ask if token present Gained expressive power Can grab data as shows up Weaken our guarantees Possible to get non-deterministic behavior Depends on timing

- -Which we've said may vary with mapping
- Use when necessary, avoid if possible

## Penn ESE5320 Fall 2022 -- DeHor





| Process Network Roundup |                                                     |                               |                          |                                     |     |  |  |
|-------------------------|-----------------------------------------------------|-------------------------------|--------------------------|-------------------------------------|-----|--|--|
|                         | Model                                               | Deterministic<br>Result       | Deterministic<br>Timing  | Turing<br>Complete                  |     |  |  |
|                         | SDF+fixed-delay operators                           | Y                             | Y                        | Ν                                   |     |  |  |
|                         | SDF+variable<br>(data-dependent)<br>delay operators | Y                             | N                        | N                                   |     |  |  |
|                         | Dynamic Rate<br>DF blocking                         | Y                             | N                        | Y                                   |     |  |  |
|                         | Dynamic Rate<br>DF non-blocking                     | Ν                             | Ν                        | Y                                   |     |  |  |
| Penn E                  | ESE5320 Fall 2022 De                                | Good<br>For<br>Horcorrectness | Good<br>For<br>Real-Time | Completene<br>(Compute<br>anything) | ess |  |  |

## Admin • Remember feedback – Today's lecture and HW2 • Reading for Day 6 on web • HW3 due Friday – Implementing multiprocessor solutions on homogeneous (ARM) processor cores

69

69

ESE5320 Fall 2022 -- DeHor

# Big Ideas • Capture gross parallel structure with Process Network • Use dataflow synchronization for determinism • Abstract out timing of implementations • Give freedom of implementation • Exploit freedom to refine mapping to optimize performance • Minimally use non-determinism as necessary