Spring 2021: Data Science for Computer Systems
ESE 680, MW 3:00-4:30PM
[Syllabus]
This course covers advanced topics at the
interface between data science and computer
systems. It surveys recent advances in data
science and their use in modeling, designing, and
managing complex software systems and hardware
architectures such as datacenter and cloud
computing. Discussion-oriented classes focus on
in-depth analysis of readings. Students will learn
to operationalize methods in data science and
statistical machine learning to complete a
collaborative research project. Prerequisites
include introductory statistics or data science
(STAT 431, ESE 305, ESE 402), and computer systems
(CIS 380).
Faculty
Professor Benjamin Lee
leebcc@seas.upenn.edu
Office Hours: TBD
Syllabus
Grading
Participation/Discussion: 25%; Response Papers: 25%; Project/Paper: 50%
Academic Policy
Students are expected to follow the Code of Academic Integrity
of the University of Pennsylvania.
Participation / Discussion
This course uses a seminar, not a lecture, format. Each class covers particular
topics from assigned papers. Students are expected to read the assigned papers
and to prepare for course discussions. A student will be assigned to lead the
discussion for each paper.
Response Papers
The students should prepare an insightful critique of the assigned papers due at the
beginning of class. These response papers should take the form of a constructive review,
including (1) summary, (2) strengths, (3) weaknesses, (4) directions for future work.
These response papers should be no longer than one page per class. Papers will be evaluated
per brevity and depth of insight.
Project / Paper
The course ends with a research project. Intermediate deliverables
include a research statement, research plan, extended abstract,
final paper, and oral presentation.
Readings
Schedule
Students will read and discuss three research papers per week.
The complete reading list and schedule is pending. The draft syllabus includes
a few representative papers below.
| Accurate and efficient regression modeling for microarchitectural performance and power prediction (ASPLOS'06). [paper] |
| CherryPick: Adaptively unearthing the best cloud configurations for big data analytics (NSDI'17) [paper] |
| Learning scheduling algorithms for data processing clusters (SIGCOMM'19) [paper] |
| Hound: Causal learning for datacenter-scale straggler diagnosis (SIGMETRICS'18) [paper] |
| A survey of machine learning for big code naturalness (ACM Computer Surveys'18) [paper] |