





# Coding Accelerators

- · Want to exploit FPGA logic on Zyng to accelerate computations
- · Traditionally has meant develop accelerators in
  - Hardware Description Language (HDL) E.g. Verilog → undergrads see in CIS371

nn ESE532 Fall 2018 -- DeHor

5

- Generator language (constructs logic)

### Course "Hypothesis"

- · C-to-gates synthesis mature enough to use to specify hardware
  - Leverage fact everyone knows C
  - (must, at least, know C to develop embedded code)
  - Avoid taking time to teach Verilog or VHDL · Or making Verilog a pre-req.
  - Focus on teaching how to craft hardware
    - · Using the C already know
    - ...may require thinking about the C differently

nn ESE532 Fall 2018 -- DeHon

# Discussion [open] • Is it obvious we can write C to describe hardware? · What parts of C translate naturally to hardware? • What parts of C might be problematic? • What parts of hardware design might be hard to describe in C?











| C Primitives<br>Comparison Operators                   |        |    |
|--------------------------------------------------------|--------|----|
| <ul> <li>Less Than</li> </ul>                          | a < b  |    |
| <ul> <li>Less Than or Equal To</li> </ul>              | a <= b |    |
| <ul> <li>Greater Than</li> </ul>                       | a > b  |    |
| <ul> <li>Greater Than or Equal To a &gt;= b</li> </ul> |        |    |
| <ul> <li>Not Equal To</li> </ul>                       | a != b |    |
| Equal To                                               | a == b |    |
| <ul> <li>Logical Negation</li> </ul>                   | !a     |    |
| <ul> <li>Logical AND</li> </ul>                        | a && b |    |
| <ul> <li>Logical OR</li> </ul>                         | a    b |    |
| Things might have a hardware operator for              |        |    |
| Penn ESE532 Fall 2018 DeHon                            |        | 13 |























- Copy propagation: a=b; c=a+d; → c=b+d;
- Constant folding:  $c[10*10+4]; \rightarrow c[104];$
- Identity Simplification: c=1\*a+0;  $\rightarrow c=a$ ;
- Strength Reduction:  $c=b^{*}2$ ;  $\rightarrow c=b<<1$ ;
- Dead code elimination
- Common Subexpression Elimination:
  - C[x\*100+y]=A[x\*100+y]+B[x\*100+y]
  - t=x\*100+y; C[t]=A[t]+B[t];
- Operator sizing: for (i=0; i<100; i++) b[i]=(a&0xff+i);

```
nn ESE532 Fall 2018 -- DeHon
```

























# Loop Compact Expression • What express? - Sequential, fully unrolled, partially unrolled? sum=0; for (i=0;i<32;i++) { sum+=(0-(b%2)) & a; b=b>>1; a=a<<1; } mtext{sum}</pre>









#### Compact Expression: Arrays

- Useful to be able to refer to different values (a large number of values) with the same code.
- Arrays + Loops: give us a way to do that
- Useful: general expression, hardware description

Penn ESE532 Fall 2018 – DeHon

## Compact Expression: Arrays+Logic

- Vector sum:
  - c3=a3+b3; c2=a2+b2; c1=a1+b1; c0=a0+b0;
  - for(i=0;i<3;i++) c[i]=a[i]+b[i];</pre>
- Chose small length to fit non-array on slide – #define K 16

44

- for(i=0;i<K;i++) c[i]=a[i]+b[i];

nn ESE532 Fall 2018 -- DeHon

enn ESE532 Fall 2018 -- DeHon

47



43

# Foreshadowing: C Array Challenge

- C programmers think of arrays as memory (or memory as arrays)

   ...and sometimes we will want to
- Be careful understanding (and expressing) arrays that don't have to be memories
  - ... and treated with memory semantics

enn ESE532 Fall 2018 – DeHon

Loop Interpretations
What does a loop describe?

Sequential behavior [when to execute]
Spatial construction [when create HW]
Data Parallelism [sameness of compute]

We will want to use for all 3
Sometimes need to help the compiler understand which we want

#### 8











































#### Dependence in Loops

for(i=0;i<K;i++)
Y[i]=a[i]\*Y[i-1];</pre>

If a value needed by one instance of the loop is written by another instance, can create cyclic dependence.

74

→ limit parallelism (pipeline II)













# Use of malloc()

- Data-dependent object (array) size
- Data-dependent number of objects
- Processing data-dependent sizes or objects not consistent with Real Time
- For Real Time
  - Statically allocate maximum size will need

81

nn ESE532 Fall 2018 -- DeHon









### Big Ideas:

- C (any prog lang) specifies a computation
- Can describe spatial computation
  - Has some capabilities that don't make sense in hardware
    - Shared memory pool, malloc, recursion
  - Watch for unintended sequentialization
- C for spatial is coded differently from C for processor
  - ...but can still run on processor
- Good for leaf functions (operations)

- Limiting for full task

Admin

Reading for Monday on Web

Xilinx HLS documents

No homework due Eriday (10/5)

87

- No homework due Friday (10/5)

   Enjoy Fall Break
- HW5 due next Friday (10/12)
- Return feedback
- Class in here at noon

enn ESE532 Fall 2018 -- DeHon