#### ESE535: Electronic Design Automation

Day 16: March 25, 2015 C→RTL

Penn ESE535 Spring 2015 -- DeHon







### C Primitives Arithmetic Operators

Unary Minus (Negation) -a
Addition (Sum) a + b
Subtraction (Difference) a - b
Multiplication (Product) a \* b
Division (Quotient) a / b
Modulus (Remainder) a % b

Things might have a hardware operator for...

enn ESE535 Spring 2015 - DeHon

# C Primitives Bitwise Operators

• Bitwise Left Shift a << b

Bitwise Right Shift a >> b

• Bitwise One's Complement ~a

Bitwise AND a & b

• Bitwise XOR a ^ b

Things might have a hardware operator for...

Penn ESE535 Spring 2015 -- DeHon

## C Primitives Comparison Operators

· Less Than a < b Less Than or Equal To a <= b **Greater Than** a > b Greater Than or Equal To a >= b Not Equal To a != b · Equal To a == b · Logical Negation !a · Logical AND a && b Logical OR a || b

Things might have a hardware operator for...

enn ESE535 Spring 2015 – DeHon

1

### Expressions: combine operators

a\*x+b



A connected set of operators

→ Graph of operators

Penn ESE535 Spring 2015 -- DeHon

### Expressions: combine operators

- a\*x+b
- a\*x\*x+b\*x+c
- a\*(x+b)\*x+c
- ((a+10)\*b < 100)

A connected set of operators

→ Graph of operators

Penn ESE535 Spring 2015 - DeHon

#### C Assignment

- Basic assignment statement is: Location = expression
- f=a\*x+b



Penn ESE535 Spring 2015 -- DeHon

#### Straight-line code

- · a sequence of assignments
- · What does this mean?

g=a\*x; h=b+g; i=h\*x; j=i+c;



Penn ESE535 Spring 2015 – DeHon

#### Variable Reuse

- Variables (locations) define flow between computations
- Locations (variables) are reusable

t=a\*x;

r=t\*x;

t=b\*x;

r=r+t;

r=r+c;

Penn ESE535 Spring 2015 -- DeHon

#### Variable Reuse

- Variables (locations) define flow between computations
- · Locations (variables) are reusable

t=a\*x; t=a\*x; r=t\*x; r=t\*x; t=b\*x; t=b\*x; r=r+t; r=r+t; r=r+c; r=r+c;

- Sequential assignment semantics tell us which definition goes with which use.
  - Use gets most recent preceding definition.

Penn ESE535 Spring 2015 - DeHon

11



13

r=r+t;

r=r+c;

enn ESE535 Spring 2015 -- DeHon







#### C Memory Operations Read/Use Write/Def a=\*p; \*p=2\*a+b; a=p[0] • p[0]=23; a=p[c\*10+d] p[c\*10+d]=a\*x+b; Penn ESE535 Spring 2015 -- DeHon 17

#### Memory Operation Challenge · Memory is just a set of location • But memory expressions can refer to variable locations - Does \*q and \*p refer to same location? -p[0] and p[c\*10+d]? - \*p and q[c\*10+d]? -p[f(a)] and p[g(b)]?

Penn ESE535 Spring 2015 - DeHon

#### Pitfall

- P[i]=23
- r=10+P[i]
- P[j]=17
- s=P[j]\*12
- Value of r and s?

· Could do: P[i]=23; P[j]=17; r=10+P[i]; s=P[j]\*12

19

21

23

....unless i==j Value of r and s?

Penn ESE535 Spring 2015 -- DeHon

#### C Pointer Pitfalls

- \*p=23
- r=10+\*p;
- \*q=17
- s=\*q\*12;
- Similar limit if p==q

enn ESE535 Spring 2015 - DeHon

20

#### C Memory/Pointer Sequentialization

- · Must preserve ordering of memory operations
  - A read cannot be moved before write to memory which may redefine the location of the read
    - · Conservative: any write to memory
    - · Sophisticated analysis may allow us to prove independence of read and write
  - Writes which may redefine the same location cannot be reordered

Penn ESE535 Spring 2015 -- DeHon

#### Consequence

- Expressions and operations through variables (whose address is never taken) can be executed at any time
  - Just preserve the dataflow
- Memory assignments must execute in strict order
  - Ideally: partial order
  - Conservatively: strict sequential order of C

#### Forcing Sequencing

- · Demands we introduce some discipline for deciding when operations occur
  - Could be a FSM
  - Could be an explicit dataflow token
  - Callahan uses control register
- Other uses for timing control
  - Control
  - Variable delay blocks
  - Looping

Penn ESE535 Spring 2015 -- DeHon

**Scheduled Memory Operations** input p input q load\_a load\_d pns \*q = \*p + 1; ddress regR store Source: Callahan (etc.) enn ESE535 Spring 2015 - DeHon

# Control 25 Penn ESE535 Spring 2015 -- DeHon

#### Conditions If (cond) · No longer straightline code · Code selectively executed Data determines which computation · While (cond) to perform - DoBody

26

**Basic Blocks** · Sequence of operations with - Single entry point - Once enter execute all operations in block - Set of exits at end х=у; BB1: BB0: y++; x=y; z=y; br BB2 y++; t=z>20;z=y; brfalse t, finish t=z>20y=4BB2: br(t,BB1,BB2) finish: x=x\*y; x=x\*yBasic Blocks? n ESE535 Spring 2015 -- DeHon 27

#### **Basic Blocks**

- · Sequence of operations with
  - Single entry point

– DoA

- DoB

enn ESE535 Spring 2015 - DeHon

Else

- Once enter execute all operations in block
- Set of exits at end
- · Can dataflow schedule operations within a basic block
  - As long as preserve memory ordering

enn ESE535 Spring 2015 - DeHon 28

#### Connecting Basic Blocks

- · Connect up basic blocks by routing control flow token
  - May enter from several places
  - May leave to one of several places

Penn ESE535 Spring 2015 -- DeHon 29







#### **Lecture Checkpoint**

- · Happy with
  - Straight-line code
  - Variables
  - Memory
  - Control
- Q: Satisfied with implementation this is producing?

Penn ESE535 Spring 2015 -- DeHon

#### **Beyond Basic Blocks**

- · Basic blocks tend to be limiting
- Runs of straight-line code are not long
- · For good hardware implementation
  - Want more parallelism

Penn ESE535 Spring 2015 -- DeHon

33

35

34

#### Simple Control Flow

- If (cond) { ... } else { ...}
- · Assignments become conditional
- In simplest cases (no memory ops), can treat as dataflow node



Penn ESE535 Spring 2015 -- DeHon

Simple Conditionals

if (a>b)
c=b\*c;
else
c=a\*c;

Penn ESE535 Spring 2015 - DeHon

Simple Conditionals

a>b
b\*c
a\*c
c
c
36





# Preclass G • Finish drawing graph for preclass g







# Height Reduction Mux converted version has shorter path (lower latency) Can execute condition in parallel with

then and else clauses

#### Mux Conversion and Memory

- What might go wrong if we muxconverted the following:
- If (cond)
  - \*a=0
- Else
  - -\*b=0

Penn ESE535 Spring 2015 - DeHon

43

45

11

#### Mux Conversion and Memory

- What might go wrong if we muxconverted the following:
- If (cond)

enn ESE535 Spring 2015 -- DeHon

- \*a=0
- Else
- -\*b=0
- Don't want memory operations in nontaken branch to occur.

Penn ESE535 Spring 2015 -- DeHon

#### Mux Conversion and Memory

- If (cond)
  - \*a=0
- Else
  - \*b=0
- Don't want memory operations in nontaken branch to occur.
- Conclude: cannot mux-convert blocks with branches (without additional care)

Penn ESE535 Spring 2015 - DeHon

46

# Hyperblocks • Can convert if/then/else into dataflow - If/mux-conversion • Hyperblock - Single entry point - No internal branches - Internal control flow provided by mux conversion - May exit at multiple points



#### Hyperblock Benefits

- More code → typically more parallelism
  - Shorter critical path
- · Optimization opportunities
  - Reduce work in common flow path
  - Move logic for uncommon case out of path
    - · Makes smaller faster

Penn ESE535 Spring 2015 -- DeHon

49





#### **Optimizations**

- Constant propagation: a=10; b=c[a];
- Copy propagation: a=b; c=a+d; → c=b+d;
- Constant folding: c[10\*10+4]; → c[104];
- Identity Simplification: c=1\*a+0; → c=a;
- Strength Reduction: c=b\*2; → c=b<<1;
- · Dead code elimination
- Common Subexpression Elimination:
  - C[x\*100+y]=A[x\*100+y]+B[x\*100+y]
  - t=x\*100+y; C[t]=A[t]+B[t];
- Operator sizing: for (i=0; i<100; i++) b[i]=(a&0xff+i);

enn ESE535 Spring 2015 – DeHon

52

#### Additional Concerns?

#### What are we still not satisfied with?

- · Parallelism in hyperblock
  - Especially if memory sequentialized
    - Disambiguate memories?
    - Allow multiple memory banks?
- · Only one hyperblock active at a time
  - Share hardware between blocks?
- · Data only used from one side of mux
  - Share hardware between sides?
- Most logic in hyperblock idle?
  - Couldn't we pipeline execution?

Penn ESE535 Spring 2015 -- DeHon

53

# Pipelining for (i=0;i<MAX;i++) o[i]=(a\*x[i]+b)\*x[i]+c; • If know memory operations independent

#### Unrolling

 Put several (all?) executions of loop into straight-line code in the body.

for (i=0;i<MAX;i++)o[i]=(a\*x[i]+b)\*x[i]+c;

for (i=0;i<MAX;i+=2)o[i]=(a\*x[i]+b)\*x[i]+c;o[i+1]=(a\*x[i+1]+b)\*x[i+1]+c;

Penn ESE535 Spring 2015 -- DeHon

55

#### Unrolling

• If MAX=4: o[0]=(a\*x[0]+b)\*x[0]+c; for (i=0;i<MAX;i++)o[i]=(a\*x[i]+b)\*x[i]+c;

o[1]=(a\*x[1]+b)\*x[1]+c; o[2]=(a\*x[2]+b)\*x[2]+c;

for (i=0;i<MAX;i+=2)o[3]=(a\*x[3]+b)\*x[3]+c;o[i]=(a\*x[i]+b)\*x[i]+c;

o[i+1]=(a\*x[i+1]+b)\*x[i+1]+c;

enn ESE535 Spring 2015 - DeHon

56

#### Unrolling

• If MAX=4: o[0]=(a\*x[0]+b)\*x[0]+c;

o[1]=(a\*x[1]+b)\*x[1]+c;

o[2]=(a\*x[2]+b)\*x[2]+c;o[3]=(a\*x[3]+b)\*x[3]+c;

for (i=0;i<MAX;i++)

Benefits?

Penn ESE535 Spring 2015 -- DeHon

o[i]=(a\*x[i]+b)\*x[i]+c;

for (i=0;i<MAX;i+=2)o[i]=(a\*x[i]+b)\*x[i]+c;

o[i+1]=(a\*x[i+1]+b)\*x[i+1]+c;

57

enn ESE535 Spring 2015 – DeHon

Unrolling

for (i=0;i<MAX;i++)o[0]=(a\*x[0]+b)\*x[0]+c;o[i]=(a\*x[i]+b)\*x[i]+c;

o[1]=(a\*x[1]+b)\*x[1]+c;

• If MAX=4:

o[2]=(a\*x[2]+b)\*x[2]+c;for (i=0;i<MAX;i+=2)o[3]=(a\*x[3]+b)\*x[3]+c;o[i]=(a\*x[i]+b)\*x[i]+c;

o[i+1]=(a\*x[i+1]+b)\*x[i+1]+c;

Create larger basic block. More scheduling freedom. More parallelism.

58

### Flow Review (A) (B) (D) Penn ESE535 Spring 2015 -- DeHon 59

#### Summary

- · Language (here C) defines meaning of operations
- · Dataflow connection of computations
- · Sequential precedents constraints to preserve
- · Create basic blocks
- · Link together
- · Optimize
  - Merge into hyperblocks with if-conversion
  - Pipeline, unroll
- · Result is dataflow graph
  - (can schedule to RTL)

enn ESE535 Spring 2015 - DeHon

#### Big Ideas:

- Semantics
- Dataflow
- Mux-conversion
- Specialization
- Common-case optimization

Penn ESE535 Spring 2015 -- DeHon

61

#### Admin

- Project Assignment
- HW8
- Reading for Monday on web

Penn ESE535 Spring 2015 -- DeHon