Spring 2021: Data Science for Computer Systems
ESE 680, MW 3:00-4:30PM

[Syllabus]
This course covers advanced topics at the interface between data science and computer systems. It surveys recent advances in data science and their use in modeling, designing, and managing complex software systems and hardware architectures such as datacenter and cloud computing. Discussion-oriented classes focus on in-depth analysis of readings. Students will learn to operationalize methods in data science and statistical machine learning to complete a collaborative research project. Prerequisites include introductory statistics or data science (STAT 431, ESE 305, ESE 402), and computer systems (CIS 380).


Faculty

Professor Benjamin Lee
leebcc@seas.upenn.edu
Office Hours: TBD





Syllabus

Grading
Participation/Discussion: 25%; Response Papers: 25%; Project/Paper: 50%


Academic Policy
Students are expected to follow the Code of Academic Integrity of the University of Pennsylvania.


Participation / Discussion
This course uses a seminar, not a lecture, format. Each class covers particular topics from assigned papers. Students are expected to read the assigned papers and to prepare for course discussions. A student will be assigned to lead the discussion for each paper.


Response Papers
The students should prepare an insightful critique of the assigned papers due at the beginning of class. These response papers should take the form of a constructive review, including (1) summary, (2) strengths, (3) weaknesses, (4) directions for future work. These response papers should be no longer than one page per class. Papers will be evaluated per brevity and depth of insight.


Project / Paper
The course ends with a research project. Intermediate deliverables include a research statement, research plan, extended abstract, final paper, and oral presentation.




Readings

Schedule
Students will read and discuss three research papers per week. The complete reading list and schedule is pending. The draft syllabus includes a few representative papers below.

Accurate and efficient regression modeling for microarchitectural performance and power prediction (ASPLOS'06). [paper]

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics (NSDI'17) [paper]

Learning scheduling algorithms for data processing clusters (SIGCOMM'19) [paper]

Hound: Causal learning for datacenter-scale straggler diagnosis (SIGMETRICS'18) [paper]

A survey of machine learning for big code naturalness (ACM Computer Surveys'18) [paper]