Units: 1.0 CU
Terms: Fall 2020
When: MW 10:30am--12:00pm (First Lecture W 9/2/2020)
Instructor: DeHon (office hours
TA: Syed Ahmed
|Office Hours: ||TBD Zoom|
| ||TBD Zoom|
ESE350, (CIS371 helpful)|
|Graduate||working knowledge of
[Covid-19 Fall 2020]
[Relation to other courses]
Catalog Level Description:
Motivation, design, programming, optimization, and use of modern
System-on-a-Chip (SoC) architectures. Hands-on coverage of the breadth of
computer engineering within the context of SoC platforms from gates to
application software, including on-chip memories and communication
networks, I/O interfacing, RTL design of accelerators, processors,
concurrency, firmware and OS/infrastructure software. Formulating parallel
decompositions, hardware and software solutions, hardware/software
tradeoffs, and hardware/software codesign. Attention to real-time
Covid-19 Fall 2020
Fall 2020 offering will plan to accommodate the ongoing Covid-19 challenge
which includes, students that cannot be physically present on campus at
Penn (due to travel restrictions, housing restrictions, and personal safety
concerns) and the need for social distancing even for those that are on
campus. Our plan will evolve with
and guidance from the University. Our current plan includes:
See the Provosts Covid-19
Academic Information and Resources for further University-wide
information and guidance.
- Online (Zoom) lectures
- Synchronous delivery that includes interactions with students who
can attend; we recommend students attend the synchronous delivery.
- Recordings of synchronous delivered lecture available on Canvas
for asynchronous viewing by those who cannot make the lecture or wish
to get a refresher.
- Online office hours (Zoom, Google Meet)
- Teams as per usual (pairs for homework) that may collaborate
remotely -- we expect remote collaboration over Zoom or Google Meet.
- Remote and cloud-based homeworks -- we are making plans for the
exercise in the first half to use F1 on Amazon EC2, including both cloud
access to Xilinx tools and to a server-with-FPGA platform.
- Providing students with equipment for remote use:
- about half-way through the course, we plan to
transition to embedded SoC FPGA platforms for the lab (most likely
- we will make plans to provide (ship) boards to remote students.
- students will need to run the tools either.
- remotely using Penn computers
- on their own computers -- the experience will
generally be better if you can run the software on your own computer,
but it does demand a certain level of computing power, memory, and
space from your computer and the installation will take some time and
possibly fiddling given the variety of computers student's usually have.
- we expect no need for physical lab access. Given a suitable
network link and computer, all lab work can be completed remotely.
- Open-book, flexible-time, honor-system exams within a fixed time-window.
By the end of the course, you will be able to:
- design, optimize, and program a modern System-on-a-Chip.
- (i) analyze a computational task, (ii) characterize its computational requirements, (iii) identify performance bottlenecks, (iv) identify, explore, and evaluate a rich design space of solutions, and (v) select and implement a design that meets engineering requirements.
- decompose the task into parallel components that cooperate to solve the problem.
- characterize and develop real-time solutions.
- implement both hardware and software solutions, formulate hardware/software tradeoffs, and perform hardware/software codesign.
- understand the system on a chip from gates to application software, including on-chip memories and communication networks, I/O interfacing, RTL design of accelerators, processors, firmware and OS/infrastructure software.
- understand and estimate key design metrics and requirements including area, latency, throughput, energy, power, predictability, and reliability.
Architectural building blocks and heterogeneous architecture,
Hardware-Software Codesign, Embedded Software, Interfacing, Computational
requirements and system analysis, Concurrency, Real Time, Design-space
formulation and exploration, Costs and metrics (energy, area, runtime,
reliability, predictability), Quantitative design and analysis.
Rough Syllabus Plan
- Overview, scope, methodology
- Metrics and bottlenecks
- Computational models
- Data parallel microarchitectures (SIMD, Vector, GPU)
- Thread-level Parallelism and virtualization
- Real-time, reactive
- Spatial computations, basic mapping from high-level
- Fine-grained parallelism microarchitectures (FSMD, VLIW)
- High-level synthesis (C-to-gates, resource selection and
- On-chip networking / Network-on-Chip
- VLSI technology and scaling
- Defect and fault tolerance
Detailed Fall 2020 schedule coming soon, but you can consult
Syllabus for reference until then.
This course will include a substantial project running throughout term.
Students work in groups of 2. Platform will be an SoC-FPGA (e.g.,
Xilinx Zynq or Intel/Altera Arria), allowing the provisioning of soft-core
processors, accelerators, and memory in addition to the use of the
embedded SoC logic. It will start with a significant task (like video
acquisition, processing, compression, networking). Course starts by
running the task on single processor and identifying resource requirements.
Then, it will deal with I/O for task.
It then migrates the task to multiple processors to accelerate. After
that, it develops custom accelerators for task and integrate with
networked processor. The final half of the course is an open-ended
optimization project using the techniques and design options introduced in the course.
Grading is based on:
- Design Project [50%]
- Weekly Assignments [20%]
- Midterm [10%]
- Final [20%]
Writeups must be done in electronic form and submitted through
Canvas (below). Use CAD or drawing
tools where appropriate. Handwritten assignments and
hand-drawn figures are not acceptable.
The specific homework assignments will specify what portion of the writeup
can be performed jointly and what part should be individual.
See the course Writeup
Guidlines for full details.
Portions of the project milestones and final will be per group. Look for
specific instructions associated with the project.
All assignments will be turned in electronically through the Penn Canvas
website. Log in to canvas with your PennKey and password, then select ESE 532 from the Courses and Groups dropdown menu.
Select Assignments from the links on the left and select the assignment you
wish to submit for. Submission should be as a single file (preferably
Assignments must be turned in by the published due date to receive credit.
We will grant each student 3 free late days for the course of the entire
term (homework and project milestones) for individual turn-in assignments
or assignment components.
That means you could, for example, turn in three assignments one day late
each or one assignment 3 days late and still receive full credit. The
quantum for free late days is a day, so you cannot turn in every assignment
6 hours late and receive full credit. There are no free late days for
Students are allowed and encouraged to help each other with the Xilinx
tools (SDSoC, SDK, Vivado, Vivado HLS, Windows, Linux) used for the course, but are disallowed from developing collaborative
design solutions (C-code, pragmas, design and analysis) outside of
identified project groups. Each team must develop its own design solution;
collaborating across teams is a violation of the collaboration policy.
Within a project group,
the assignment will specify what part should be done as a group and what
part should be done individually.
- Tools---We know the tools are complex and the documentation
often dense or inadequate, and we won't be surprised if they are buggy.
It will likely be necessary to collaborate as a class on figuring out how
to best use the tools for the term. We encourage students to help each
other and share what they learned. We will award bonus points for
student-developed instructions and tutorials on how to solve common
tasks that arise for the tools.
- Design Solutions---Each team (or individual where specified)
should develop their own solutions to the design problem and their own
implementations. You are taking this class to develop these skills, and
we believe you need to work out the solutions on your own to master the
skills. You cannot share code, diagrams, specific pragma settings,
plots, analysis, metrics, or other results. You cannot share problem
- HLS Pragmas---HLS Pragmas sit at the border between
where collaboration is allowed and not allowed. You are allowed to help
make each other aware of the existence of pragmas and the syntax for
pragmas. You are not allowed to tell each other what pragma values and
settings best solves the problem---you should be reasoning through what
the settings mean and how they impact the code mapping, and you should be
performing your own experiments in your project teams. You are
allowed to say where a pragmas goes syntatically (e.g., relative to function
header, relative to loop header), but are not allowed to suggest which
function or loop would benefit from a specific pragma.
In general, you are expected to abide by Penn's
Code of Academic Integrity. If there is any uncertainty, please ask.
Use the Penn
Course Absence Report (CAR) in Penn-in-Touch to report absences.
Preclass worksheets will be available for a period of time before the
lecture and at least 24 hours after the lecture. After that, we do not
promise they are available.
You are responsible for keeping up with the course as it happens,
collecting them, and keeping them to use for review.
Comparison to ESE534
This course inherited less than 25% of the material from the
last offering of
ESE534. This course does not go deep into how to design a spatial
substrate (compute, interconnect), nor go deep into processor--FPGA
continuum and instruction design. If offered again (no current plans), ESE534 would likely
evolve to take this course as a pre-requisite. Possibly ESE534 and 535
will merge into a single advanced, follow-on course. Note that ESE534 did
not have the kind of hands-on project that becomes a key component of this
Comparison to CIS501
This course is complementary to CIS501. This course is more focused on
custom, application-oriented design with real-time concerns, while CIS501
focuses on ISA compatibility and best-effort designs. This course assumes
you are willing to recompile and, typically, rewrite your application code; as
a result, it does not touch upon the ISA abstraction and compatibility and
will have almost nothing on dynamic ILP and pipelining of a general-purpose
processor. This course will be driven more by real-time concerns rather
than best-effort tasks, whereas CIS501 is more focused on best-effort.
This course will spend one day on the high-level benefits of memory
hierarchy, but will not dive deep into automatically hardware-managed cache-design and cache-hierarchies,
which is a major component of CIS501. This course will mostly look at
non-shared memory models and architectures with, at most, a small nod to
the existence and challenges in shared memory, whereas CIS501 is mostly
focused on shared-memory models and architectures.
Last modified: Fri Jul 3 11:42:48 EDT 2020