# ESE532: System-on-a-Chip Architecture

Day 25: December 1, 2021 Real-Time Scheduling

Penn ESE532 Fall 2021 -- DeHon



#### Today

#### Real Time

- Part 1: Synchronous Reactive Model
- Part 2: Interrupts and IO
  - Polling alternative
  - Timer?
- · Part 3: Resource Scheduling Graphs

enn ESE532 Fall 2021 -- DeHon

2

#### Message

- · Scheduling is key to real time
  - Analysis
  - Guarantees

Penn ESE532 Fall 2021 -- DeHon

3

# Synchronous Circuit Model

- A simple synchronous circuit is a good "model" for real-time task
  - Run at fixed clock rate
  - Take input every cycle
  - Produce output every cycle
  - Complete computation between input and output
  - Designed to run at fixed-frequency
    - Critical path meets frequency requirement

Penn ESE532 Fall 2021 -- DeHon

4

# Synchronous Reactive Model

- · Discipline for Real-Time tasks
- Embodies "synchronous circuit model"

Penn ESE532 Fall 2021 -- DeHon

5

# Synchronous Reactive

- There is a rate for interaction with external world (like the clock)
- Computation scheduled around these clock ticks (or time-slices)
  - Continuously running threads
  - Each thread performs action per tick
- · Inputs and outputs processed at this rate
- · Computation can "react" to events
  - Reactions finite and processed before next tick

Penn ESE532 Fall 2021 -- DeHon

#### Thread Form

while (1) { tick(); }

- tick() -- yields after doing its work
  - Until next master cycle
  - May be state machine
    - May change state and have different behavior based on state
  - May trigger actions to respond to events (inputs)

Penn ESE532 Fall 2021 -- DeHon



#### Tick Rate

- Driven by application demands of external control
  - Control loop 100 Hz
    - · Robot, airplane, car, manufacturing plant
  - Video at 33 fps
  - Game with 20ms response
  - Router with 1ms packet latency
    - 12µs

Penn ESE532 Fall 2021 -- DeHor

#### Tick Rate

- · Multiple rates
  - May need master tick as least-common multiple of set of interaction rates
    - ...and lower freq. events scheduled less frequently
  - E.g. 100Hz control loop and 33Hz video
    - · Master at 10ms
    - Schedule video over 3 10ms time-slots
      - May force decompose into tasks fit into smaller time window since must schedule as the property of the pr

Penn ESE532 Fall 2021 - Devents at highest frequency

10

## Synchronous Reactive

- Ideal model
  - Per tick reaction (task processing) instantaneous
- · Separate function from compute time
- Separate function from technology
  - Feature size, processor mapped to
- · Like synchronous circuit
  - If logic correct, works when run clock slow enough
  - Works functionally when change technology
  - Then focus on reducing critical path

Penn ESE532 Fall 2021 making timing work

11

# Timing and Function

- Why want to separate function from technology and timing?
- Move to slower processor(s):
  - What would happen if just moved?
  - What needs to happen?
- Move to faster processor(s):
  - What would happen if just moved?
  - What want to happen?

Penn ESE532 Fall 2021 -- DeHon

## Synchronous Reactive Timing

- · Once functional,
  - need to guarantee all tasks (in all states)
    - Can complete in tick time-slot
    - · On particular target architecture
- Identify WCET (worst-case execution time)
  - Like critical path in FSM circuit
  - Time of task on processor target

Penn ESE532 Fall 2021 -- DeHon

13

```
Preclass 1

• Time available to process objects?

tick() {
    for(i=0;i<MAX_OBJECTS;i++) {
        obj[i].inputs(); // see below
        obj[i].updatePositionState(); // 1,000 cycles
        obj[i].collide(); // 9,000 cycles
        obj[i].render(); // 1,000 cycles
    }
    updateScreen(); // takes 10 ms
}
```

# tick() { for(i=0;i<MAX\_OBJECTS;i++) { obj[i].inputs(); // see below obj[i].objects;i// see below obj[i].objects;i// see below obj[i].objects;i// see below obj[i].objects;i// see below obj[i].collide(); // 9,000 cycles obj[i].render(); // 1,000 cycles obj[i].render(); // 100 cycles } // for object class inputs() { int inove=getMoveInput(); // 10 int fire=getFireInput(); // 10 switch (aove){ case LEFT: moveleft(); break; // 10 case FORWARD: thrustIncrease(); break; // 5,000 case BGRX: thrustDecrease(); break; // 4,000 default: } if (fire) processFire(); // 10,000</pre>

#### Preclass 1

 Maximum number of objects on single GHz processor?

Penn ESE532 Fall 2021 -- DeHon

16

# Synchronous Reactive Timing

- · Once functional,
  - need to guarantee all tasks (in all states)
     can complete in tick time-slot
  - On particular target architecture
- Identify WCET
  - Like critical path in FSM circuit
  - Time of task on processor target
- · Schedule onto platform
  - Threads onto processor(s)

17





# Synchronous Reactive Model

- · Discipline for Real-time tasks
- Embodies the "synchronous circuit model"
  - Master clock rate
  - Computation decomposed per clock
  - Functionality assuming instantaneous compute
  - On platform, guarantee runs fast enough to complete critical path at "clock" rate

Interrupt

• External event that redirects processor

· Typically forces a thread switch

Penn ESE532 Fall 2021 -- DeHon

20

#### Interrupts and IO

Part 2

Penn ESE532 Fall 2021 -- DeHon

21

23

Indicate a need for attention

· Common for I/O, Timers

flow of control

Penn ESE532 Fall 2021 -- DeHon

22

## Interrupts

• Why would we use interrupts for I/O?

SE532 Fall 2021 -- DeHon

#### Interrupts: Good

- Allow processor to run some other work
- Infrequent, irregular task service with low response service latency
  - Low latency
  - Ok when low throughput inputs
    - So infrequent interrupts...

Penn ESE532 Fall 2021 -- DeHon

#### Interrupts: Bad

- · Time predictability
  - Real-time for computing tasks interrupted
- · Processor usage
  - Costs time to switch contexts
- Concurrency management
  - Must deal with tasks executing nonatomically
    - · Interleave of interrupted service tasks
    - · Perhaps interleave of any task

Penn ESE532 Fall 2021 -- DeHon

25

27



#### What can happen? · Add to list · Sequence removed=a->value atmp=a new->next =atmp rtmp=a->next a=new - <interrupt> atmp=a Remove from list new->next=atmp removed=a->value a=new rtmp=a->next - <return> a=rtmp a=rtmp What goes wrong?









## Interrupts: Bad

- · Time predictability
  - Real-time for computing tasks interrupted
- · Processor usage
  - Costs time to switch contexts
- · Concurrency management
  - Must deal with tasks executing nonatomically
    - Interleave of interrupted service tasks
    - · Perhaps interleave of any task

1 Chiaps interieuve of

33

#### Polling Discipline

- · Alternate to I/O interrupts
- · Every I/O task is a thread
- · Budget time and rate it needs to run
  - E.g. 10,000 cycles every 5ms
  - Likely tied to
    - · Buffer sizes
    - Response latency
- Schedule I/O threads as real-time tasks
  - Some can be DMA channels

n ESE532 Fall 2021 -- DeHon

#### **IO Thread**

while (1) { process\_input(); }

- Like tick() -- yields after doing its work
  - Wait for next master cycle

Penn ESE532 Fall 2021 -- DeHon

2/

#### Preclass 2

- Input at 100KB/s
- · 30ms time-slot window
- · Size of buffer?
- 100 cycles/byte, GHz processor runtime of service routine?
  - Fraction of processor capacity?

Penn ESE532 Fall 2021 -- DeHon

35



# **Timer Interrupts**

- Why do we have timer interrupts in conventional operating systems?
  - E.g. in linux?

Penn ESE532 Fall 2021 -- DeHon

27

#### **Timer Interrupts**

- · Best effort tasks (i.e. non-real-time tasks)
  - Have no guarantee to finish in bounded time
  - Timer interrupts necessary
    - · to allow other threads to run
    - · fairness
    - to switch to real-time service tasks
- Need timer interrupts if need to share processor with best-effort and real-time threads
  - Alternate: Easier to segregate real-time and best-effort threads onto different processors

38

#### Timer Interrupts?

- · Bounded-time tasks
  - E.g. reactive tasks in real-time
  - Task has guarantee to release processor within time window
  - Not need timer interrupts to regain control from task
  - (Maybe use deadline operations [Day24] for timer)

Penn ESE532 Fall 2021 -- DeHon

39

#### **Greedy Strategy**

- · Schedule real-time tasks
  - Scheduled based on worst-case, so may not use all time allocated
- Run best-effort tasks at end of timeslice after complete real-time tasks
  - Timer-interrupt to recover processor in time for start of next scheduling time slot
- (adds complexity)

Penn ESE532 Fall 2021 -- DeHon

40

#### **Real-Time Tasks**

- · Interrupts less attractive
  - More disruptive
- Scheduled polling better predictability
- · Fits with Synchronous Reactive Model

Penn ESE532 Fall 2021 -- DeHon

41

# Resource Scheduling Graphs

Part 3

Penn ESE532 Fall 2021 -- DeHon

# Scheduling

- Useful to think about scheduling a processor by task usage
- Useful to budget and co-schedule required resources
  - Bus
  - Memory port
  - DMA channel

enn ESE532 Fall 2021 -- DeHon

43

47







# Resource Schedule Graph • Extend as necessary to capture potentially limiting resources and usage – Regions in memories – Memory ports – I/O channels





#### Approach

- Ideal/initial look at processing requirements
  - Resource bound on processing
- Look for bottlenecks / limits with Resource Bounds independently
  - Add buses, memories, etc.
- Plan/schedule with Resource Schedule Graph

Penn ESE532 Fall 2021 -- DeHon

50

#### Preclass 3a

- · Resource Bound
  - Data movement over bus?
  - Compute on 2 processors?
  - Compute on 2 processors when processor must wait while local memory is written?

| Data (bytes) | Compute cycles            | Data+Compute<br>Work                       |
|--------------|---------------------------|--------------------------------------------|
| 1600         | 1600                      |                                            |
| 200          | 600                       |                                            |
| 800          | 3200                      |                                            |
| 200          | 600                       |                                            |
| 400          | 400                       |                                            |
|              | 1600<br>200<br>800<br>200 | cycles  1600 1600 200 600 800 3200 200 600 |

#### Resource Bound wait Transfer

- Total processor cycles when processor must idle during transfer
  - $\text{Cycles}_{\text{proc}} = \sum (Comp[i] + Bytes[i])$
- RB<sub>proc</sub>=(Cycles<sub>proc</sub>)/2
- RB<sub>bus</sub>=  $\sum (Bytes[i])$
- RB=max(Rb<sub>bus</sub>, RB<sub>proc</sub>)

Penn ESE532 Fall 2021 -- DeHon

51

52

#### Preclass 3b Schedule

· Processor wait for data load

200 cycle intervals



#### **Double Buffering**

- Common trick to overlap compute and communication
- Reserve two buffers input (output)
- · Alternate buffer use for input
- Producer fills one buffer while consumer working from the other
- · Swap between tasks
- Tradeoff memory for concurrency
- Sub-buffers in Vitis clEnqueueMigrateObjects

Penn ESE532 Fall 2021 -- DeHon





Resource Schedule Graphs

Useful to plan/visualize resource sharing and bottlenecks in SoC

Supports scheduling

Necessary for real-time scheduling

# Big Ideas:

- · Scheduling is key to real time
  - Analysis, Guarantees
- · Synchronous reactive
  - Scheduling worst-case tasks "reactions" into master time-slice matching rate
- · Schedule I/O with polling threads
  - Avoid interrupts
- Schedule dependent resources
  - Buses, memory ports, memory regions...

n ESE532 Fall 2021 -- DeHon 59

#### Admin

- Feedback
- · Reading for Monday online
- · P4 due Friday

ann ESE522 Fall 2021 -- DeHon

58