









## Previously

- Want data in small memories

   Low latency, high bandwidth
- FPGA has many memories all over fabric
- · Want C arrays in small memories
  - Partitioned so can perform enough reads (writes) in a cycle to avoid memory bottleneck

7

9

Penn ESE532 Fall 2018 -- DeHon

ESE532 Fall 2018 -- DeHor





- · Need to move data
- Shared interconnect to make physical connections
- Useful to move data as separate thread of control
  - Dedicating a processor is inefficient
  - Useful to have dedicated data-movement hardware: Direct Memory Access (DMA)





















































# Day 3, Preclass 2

- 10 cycle latency to memory
- If must wait for data return, latency can degrade throughput
- 10 cycle latency + 10 op + (assorted)
   More than 20 cycles / result
   for(i=0;i<MAX;i++) {
   in=a[i]; // memory read
   out=f(in); // 10 cycle compute
   b[i]=out;
   }
  ESE532 Fall 2018 DeHon</pre>















usb

A/D

HDMI

ethernet



















# Programmable DMA Engine

55

- · What copy from?
- How much?
- Where copy to?
- Stride?
- · What size data?
- Loop?
- Transfer Rate?

nn ESE532 Fall 2018 -- DeHon



56

• 1, 2, 3, ... K, 1, 2, 3, ... K, 1, ....



# <section-header><list-item><list-item>





## **Big Ideas**

- · Need to move data
- Shared Interconnect to make physical connections – can tune area/bw/locality
- Useful to
  - move data as separate thread of control
  - Have dedicated data-movement hardware: DMA

Penn ESE532 Fall 2018 -- DeHon

61

## Admin

### • Day 13

 Chapter nine of *Parallel Programming for FPGAs* (available on web)

62

- DRAM reading if not read on Day 3
- HW5 due Friday
- HW6 out
- Clear room for recitation at noon
- Turn in feedback sheets

nn ESE532 Fall 2018 -- DeHon