CIS 700: Software Analysis and Testing (Fall 2018)
[Administrivia] [Syllabus] [Schedule] [Reading] [Project]
Time: |
Mondays and Wednesdays, 1:30-3:00 pm |
Location: |
LRSM 112B |
Instructor: |
Mayur Naik
Email: mhnaik@cis.upenn.edu
Office: Levine 610
Office hours: 2-3 pm Tuesdays (or by appointment) |
Teaching Assistant: |
Xujie Si
Email: xsi@cis.upenn.edu
Office: Levine 613
Office hours: 3-4 pm Thursdays (or by appointment) |
All announcements will be made on Piazza.
Course Overview: Aspects of software development besides programming, such as diagnosing bugs, testing, and
debugging, comprise over 50% of development costs. Modern technology has come a long way in aiding programmers with
these aspects of development, and at the heart of this technology lies software analysis: a body of work that concerns
discovering facts about a given program. Many diverse software analysis approaches exist, each with their own strengths
and weaknesses. In this course, you will learn the principles underlying these approaches and gain hands-on experience
applying them to automate testing software and finding bugs in complex, real-world programs. Some common examples
include:
- automated source code analysis to help discover entire classes of programming errors in a large application before
the code is even executed;
- exploiting the power of randomization to uncover insidious security and concurrency bugs in program runs;
- helping a library developer to automatically generate a test suite that exercises the library's API adequately
before publishing it;
- enabling an application developer to leverage the collective power of the deployed application's users in order to
uncover the bugs that affect the most users; and
- enabling a QA engineer to automatically reduce a complex crashing test case to help isolate the cause of the
crash.
The course will involve three major concurrent activities:
- A series of hands-on labs (homeworks) that will introduce techniques and tools to improve software quality and
programmer productivity. These involve programming in a mix of Java and C/C++.
- A series of lectures that elucidate the concepts and principles of software analysis.
Additional reading material is provided for students to further study the topics in teams of two and present it in class.
- A team project involving 2-3 students to build a real-world software analysis tool. The project will span problem
formulation, algorithm design, implementation, and empirical evaluation with realistic programs. The team will report
progress to the course staff at regular milestones, and the project will culminate with a final team presentation and
demo.
Course Objectives: Upon successfully completing the course, you will be able to:
- Describe the abstract properties of different techniques for analyzing and testing software.
- Compute the outcome of the techniques on concrete examples.
- Evaluate the suitability of different techniques for a given software and set of constraints.
Evaluation: Evaluation will be based on four components: homeworks, student presentations, and the project.
Collaboration Policy: Discussing homework solutions is allowed but each individual's submission should be their own.
Prerequisites: The course has two prerequisites:
- Significant programming experience. Some homeworks involve a modest amount of programming in Java or
C/C++. The course project involves substantial programming in a language of your choice.
- Basic knowledge of discrete mathematics (e.g., basics of set theory, logic, and probability).
Course Material: All relevant materials will be made available online. The course textbook is Static Program Analysis by Moeller and Schwartzbach.
Date |
Lesson |
Slides/Video |
Book Chapter(s) |
Student Presentation |
Homework Due |
Introduction to software analysis/testing. Presents landscape of different techniques, basic concepts, and capabilities/limitations. |
Aug 29 (Wed) |
L1: Introduction to Software Analysis |
[pdf] [pptx] [video] |
Ch1, Ch2 |
|
|
Sep 03 (Mon) |
No lecture - Labor Day Holiday |
|
|
|
|
Sep 05 (Wed) |
L1 contd. |
|
|
P1 (based on L1) [pdf] [pptx] |
|
Sep 10 (Mon) |
L2: Introduction to Software Testing |
[pdf] [pptx] [video] |
|
|
HW1 (Lab: L1 - Racket) |
Presents techniques for automated testing in dominant application domains and languages. |
Sep 12 (Wed) |
L3: Random Testing |
[pdf] [pptx] [video] |
|
|
|
Sep 17 (Mon) |
L4: Automated Test Generation |
[pdf] [pptx] [video] |
|
P2 (based on L3) [pdf] [pptx] [code] |
HW2 (Lab: L3 - AFL) |
Sep 19 (Wed) |
L4 contd. |
|
|
|
|
Sep 24 (Mon) |
L5: Differential Testing |
[pdf] [pptx] |
|
|
HW3 (Lab: L4 - Korat) |
Sep 26 (Wed) |
L5 contd. |
|
|
|
|
Presents building blocks of dataflow analysis. Covers abstractions and algorithms for analyzing structured language constructs. |
Oct 01 (Mon) |
L6: Dataflow Analysis |
[pdf] [pptx] [video] |
Ch4, Ch5, Ch6 |
P3 (based on L5) [pdf] [pptx] |
HW4 (Lab: L4: Randoop) |
Oct 03 (Wed) |
L6 contd. |
|
|
|
|
Oct 08 (Mon) |
L7: Pointer Analysis |
[pdf] [pptx] [video] |
Ch9 |
P4 (based on L6) [pdf] [pptx] |
Project Proposals Due |
Oct 10 (Wed) |
L7 contd. |
|
|
|
|
Oct 15 (Mon) |
L8: Inter-Procedural Analysis |
[pdf] |
Ch7 |
P5 (based on L7) [pdf] [pptx] [demo files] [demo website] |
|
Oct 17 (Wed) |
Guest Lecture by Mukund Raghothaman |
[pdf] [pptx] |
|
|
|
Oct 22 (Mon) |
L8 contd. |
|
|
P6 (based on L8) [pdf] |
HW5 (Lab: L5 - LLVM) |
Oct 24 (Wed) |
L8 contd. |
|
|
|
|
Presents theoretical foundations of static analysis via topics of constraint-based analysis, type systems, and abstract interpretation. |
Oct 26 (Fri) |
|
|
|
|
Project Milestone 1 |
Oct 29 (Mon) |
L9: Constraint-Based Analysis |
[pdf] [pptx] [video] |
Ch8 |
|
|
Oct 31 (Wed) |
L9 contd. |
|
|
P7 (based on L9) [pdf] [pptx] |
|
Nov 05 (Mon) |
L10: Type Systems |
[pdf] [pptx] [video] |
Ch3 |
|
|
Nov 07 (Wed) |
L11: Abstract Interpretation |
[pdf part 1] [pdf part 2] |
Ch10 |
|
HW6 (Lab: L9 - Datalog) |
Presents techniques to automate debugging. Covers algorithms for test-input minimization and fault localization. |
Nov 09 (Fri) |
|
|
|
|
Project Milestone 2 |
Nov 12 (Mon) |
L12: Delta Debugging |
[pdf] [pptx] [video] |
|
P8 (based on L10) [demo files] [demo website] |
|
Nov 14 (Wed) |
L12 contd. |
|
|
P9 (based on L10) |
|
Presents path-sensitive analysis and its incarnation in dynamic symbolic execution and software model checking. |
Nov 19 (Mon) |
L12 contd. |
|
|
|
|
Nov 21 (Wed) |
No lecture - Thanksgiving Holiday |
|
|
|
|
Nov 26 (Mon) |
L13: Statistical Debugging |
[pdf] [pptx] [video] |
|
|
HW7 (Lab: L12 - Delta) |
Nov 28 (Wed) |
L14: Dynamic Symbolic Execution |
[pdf] [pptx] [video] |
|
P10 (based on L12) |
HW8 (Lab: L13 - CBI) |
Nov 30 (Fri) |
|
|
|
|
Project Milestone 3 |
Dec 03 (Mon) |
L14 contd. |
|
|
P11 (based on L14) |
HW9 (Lab: L14 - KLEE) |
Dec 05 (Wed) |
Project presentations |
|
|
|
|
Dec 10 (Mon) |
Project presentations |
|
|
|
|
-
L1: Introduction to Software Analysis
Reading material related to this lesson is organized in three categories:
Part 1: Introductory articles on program analysis concepts.
- What is soundness (in static analysis)? by Michael Hicks.
Relates soundness/completeness in program analysis to precision/recall and discusses "soundiness".
- What is static program analysis?
[talk] by Matthew Might.
Explains why program analysis is undecidable and develops a static analysis to play with in Racket.
Part 2: Surveys of using program analysis tools in industry.
- From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis, 2018.
Describes experiences developing and deploying program analysis tools at Facebook.
- Lessons from Building Static Analysis Tools at Google, 2018.
Describes experiences developing and deploying program analysis tools at Google.
- What Developers Want and Need from Program Analysis: An Empirical Study, 2016.
One of the best empirical studies about program analysis.
- A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs, 2010.
Describes experiences applying a commercial static analysis tool by Coverity to large C/C++ programs.
- Righting Software, 2004.
Describes two generations of static analysis tools developed by Microsoft Research.
Part 3: Consequences and causes of famous software failures.
- The Worst Computer Bugs in History: The Ariane 5 Disaster
Describes famous software bugs including the Ariane Rocket Disaster from the lesson.
- The Coming Software Apocalypse
Overview of the state of software reliability problems and solutions to overcome them.
-
L2: Introduction to Software Testing
- Pex and Moles
Unit test generation tools in Visual Studio for .NET programs.
- Hints on Test Data Selection: Help for the Practicing Programmer
Original paper that introduced the idea of mutation testing.
- A Theory of Predicate-Complete Test Coverage and Generation [slides]
Introduces a new code coverage metric based on predicates.
-
L3: Random Testing
Reading material related to this lesson is organized in two categories:
Part 1: Technical articles on random testing.
- A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs, ASPLOS 2010
Describes fuzz testing in Microsoft's Cuzz tool to find concurrency bugs.
- QuickCheck: A Lightweight Tool for Random Testing of Haskell
Programs, ICFP 2000 [video]
Describes fuzz testing in the QuickCheck tool to test properties of Haskell programs.
- Evaluating Fuzz Testing, CCS 2018 [blog post]
Describes flaws in past evaluations of fuzz testing and gives guidelines going forward.
Part 2: Case studies involving random testing.
- A Report on Random Testing, ICSE 1981
Original paper that introduced the idea of random testing.
- Webpage describing fuzz testing case studies (1988-2008) by Bart Miller of Univ. of Wisconsin.
- A blog post by Google's Project Zero team on fuzz testing popular web browsers' DOM engines.
- A talk describing work of the OpenBSD team on using the syzkaller fuzzer to fuzz their kernel.
-
L4: Automated Test Generation
- Korat: Automated Testing Based on Java Predicates, ISSTA 2002
Paper that introduced Korat.
- Feedback-Directed Random
Test Generation, ICSE 2007 [Randoop webpage]
Paper that introduced Randoop.
- Korat vs. Grammar-Based Test Generation by Aalok Thakkar
Note justifying need for Korat-style search over grammar-based test generation.
-
L5: Differential Testing
- NEZHA: Efficient Domain-Independent Differential Testing, IEEE S & P 2017 [code] [video]
A general differential testing approach to find bugs in binaries.
- Finding and Understanding Bugs in C Compilers, PLDI 2011 [CSmith]
Adapts differential testing to find bugs in various C compilers.
- DeepXplore: Automated Whitebox Testing of Deep Learning Systems, SOSP 2017 [code] [video]
Adapts differential testing to find bugs in deep neural networks.
-
L7: Pointer Analysis
- Pointer Analysis, Foundations and Trends in Programming Languages, 2015
Recent survey of pointer analysis.
-
L8: Inter-Procedural Analysis
- Undecidability of Context-Sensitive Data-Dependence Analysis, ACM TOPLAS, 2000
Proof of undecidability of tracking calls/returns and reads/writes simultaneously.
-
L10: Type Systems
- Type Systems, CRC Handbook, 2004
-
L15: Software Model Checking
- Software Model Checking, ACM Computing Surveys, 2009
Recent survey of software model checking.
Project Overview: A hands-on project will span the duration of the course. Students will form teams of upto two and pick a topic in consultation with the instructor. There will be three milestones to provide feedback on each team's progress. Each team will develop their implementation on github and maintain a technical report to be submitted at each milestone. The project will conclude with an in-class presentation by each team at the end of the semester.
Each project must encompass a problem statement, a formal/conceptual formulation of your proposed solution, a realistic implementation, and an experimental evaluation.
The project will be evaluated based on creativity, correctness, comprehensiveness, and clarity.
Project Milestones: All project-related deliverables must be submitted through Canvas by midnight on the below dates.
When | What | Deliverable |
Oct 08 | Propose project topic | Project topic description |
Oct 15 | Finalize project topic |
Oct 26 | Project milestone 1 | Implementation and report |
Nov 09 | Project milestone 2 |
Nov 30 | Project milestone 3 |
Dec 3, 5, 10 | In-class project presentations | Presentation slides |
Sample Projects:
- Comparing different industrial static analyzers for finding security vulnerabilities [github link] [presentation]
- A fuzzer for the .NET compiler developed as a project for a 2018 Language-Based Security course at Aarhus University.
[github link] [blog post]