# ESE532: System-on-a-Chip Architecture

Day 8: February 8, 2017

Data Movement

(Interconnect, DMA)

Penn ESE532 Spring 2017 -- DeHon



# Today

- · Interconnect Infrastructure
- · Data Movement Threads
- Peripherals
- DMA

Penn ESE532 Spring 2017 -- DeHon

2

# Message

- · Need to move data
- Shared Interconnect to make physical connections
- Useful to move data as separate thread of control
- Dedicating a processor to move data is inefficient
- Useful to have dedicated datamovement hardware: DMA

Penn ESE532 Spring 2017 -- DeHon

3

# Memory and I/O Organization

- · Architecture contains
  - Large memories
    - · For density, necessary sharing
  - Small memories local to compute
    - · For high bandwidth, low latency, low energy
  - Peripherals for I/O
- · Need to move data
  - Among memories and I/O
    - · Large to small and back
    - · Among small
    - From Inputs, To Outputs

Penn ESE532 Spring 2017 -- DeHo

.

#### How move data?

- · Abstractly, using stream links.
- Connect stream between producer and consumer.
- · Ideally: dedicated wires

Penn ESE532 Spring 2017 -- DeHon

5

#### **Dedicated Wires?**

 Why might we not be able to have dedicated wires?

Penn ESE532 Spring 2017 -- DeHon

# **Making Connections**

- · Cannot always be dedicated wires
  - Programmable
  - Wires take up area
  - Don't always have enough traffic to consume the bandwidth of point-to-point wire
  - May need to serialize use of resource
    - E.g. one memory read per cycle

Penn ESE532 Spring 2017 -- DeHon

7



# Simple Realization Shared Bus • Write to bus with address of destination • When address match, take value off bus • Pros? • Cons?



















# Locality in Interconnect

 How allow physically local items to be closer?

Penn ESE532 Spring 2017 -- DeHon







#### Masters and Slaves

- Regardless of form, potentially have two kinds of entities on interconnect
- Master can initiate requests
  - E.g. processor that can perform a read or write
- Slaves can only respond to requests
  - E.g. memory that can return the read data from a read requset

Penn ESE532 Spring 2017 -- DeHon

23

19

# Long Latency Memory Operations

Penn ESE532 Spring 2017 -- DeHon

#### Last Time

- · Large memories are slow
  - Latency increases with memory size
- · Distant memories are high latency
  - Multiple clock-cycles to cross chip
  - Off-chip memories even higher latency

enn ESE532 Spring 2017 -- DeHon

25

### Day 7, Preclass 4

- 10 cycle latency to memory
- If must wait for data return, latency can degrade throughput
- 10 cycle latency + 10 op + (assorted)
  - More than 20 cycles / result

```
for(i=0;i<MAX;i++) {
  in=a[i]; // memory read
  out=f(in); // 10 cycle compute
  b[i]=out;
}</pre>
```

Penn ESE532 Spring 2017 -- DeHon

#### Preclass 2

· Throughput using 3 threads?

```
P1: for(i=0;i<MAX;i++) write_fifoA(a[i]);
P2: while(1) write_fifoB(f(read_fifoA()))
P3: for(i=0;i<MAX;i++) b[i]=read_fifoB();
```

Penn ESE532 Spring 2017 -- DeHor

27

# Fetch (Write) Threads

- Potentially useful to move data in separate thread
- · Especially when
  - Long (potentially variable) latency to data source (memory)
- · Useful to split request/response

Penn ESE532 Spring 2017 -- DeHon

28

# Peripherals

enn ESE532 Spring 2017 -- DeHon

29

# Input and Output • Typical SoC has I/O with external world - Sensors - Actuators - Keyboard/mouse, display - Communications • Also accessible from interconnect and ESE532 Spring 2017 - DeHon









#### Preclass 3

- How much hardware to support fetch thread:
  - Counter bits?
  - Registers?
  - Comparators?
  - Other gates?
- · Compare to MicroBlaze
  - (minimum config 630 6-LUTs)

Penn ESE532 Spring 2017 -- DeHo

35

#### Observe

- Modest hardware can serve as data movement thread
  - Much less hardware than a processor
  - Offload work from processors
- Small hardware allow peripherals to be Master devices on interconnect

Penn ESE532 Spring 2017 -- DeHon



# **DMA Engine**

- Data Movement Thread
   Specialized Processor that moves data
- Act independently
- Implement data movement
- Can build to move data between memories (Slave devices)
- E.g., Implement P1, P3 in Preclass 3

enn ESE532 Spring 2017 -- DeHon

38



# Programmable DMA Engine

- · What copy from?
- · Where copy to?
- Stride?
- · How much?
- · What size data?
- · Loop?
- Transfer Rate?

enn ESE532 Spring 2017 -- DeHon

40

# Multithreaded DMA Engine

 One copy task not necessarily saturate bandwidth of DMA Engine

41

- Share engine performing many transfers (channels)
- Separate transfer state for each
   Hence thread
- · Swap among threads
  - E.g., round-robin

nn ESE532 Spring 2017 -- DeHon



# Hardwired and Programmable

- Zynq has hardwired DMA engine
- Can also add data movement engines (Data Movers) in FPGA fabric

enn ESE532 Spring 2017 -- DeHon

43

# Big Ideas

- · Need to move data
- Shared Interconnect to make physical connections can tune area/bw
- · Useful to
  - move data as separate thread of control
  - Have dedicated data-movement hardware: DMA

Penn ESE532 Spring 2017 -- DeHon

44

#### Admin

- Reading for Day 9 on web
- HW4 due Friday

Penn ESE532 Spring 2017 -- DeHor