CIS 7000: Trustworthy Machine Learning (Spring 2024)
syllabus      schedule      reading


  • Benchmarking Neural Network Robustness to Common Corruptions and Perturbations; 2019 [pdf]
    Dan Hendrycks, Thomas Dietterich
  • Unsupervised Domain Adaptation by Backpropagation; https://arxiv.org/abs/1409.7495 [pdf]
    Yaroslav Ganin, Victor Lempitsky
  • Detecting and Correcting for Label Shift with Black Box Predictors; https://arxiv.org/abs/1409.7495 [pdf]
    Zachary C. Lipton, Yu-Xiang Wang, Alex Smola
  • Intriguing properties of neural networks; 2014 [pdf]
    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus
  • Explaining and Harnessing Adversarial Examples; 2015 [pdf]
    Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
  • Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks; CAV 2017 [pdf]
    Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer
  • An abstract domain for certifying neural networks; POPL 2019 [ pdf]
    Gagandeep Singh, Timon Gehr, Markus Püschel, Martin Vechev
  • Provable defenses against adversarial examples via the convex outer adversarial polytope; ICML 2018 [pdf]
    Eric Wong, J. Zico Kolter
  • Certified adversarial robustness via randomized smoothing; ICML 2019 [pdf]
    Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter
  • Jailbreaking Black Box Large Language Models in Twenty Queries; 2023 [pdf]
    Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
  • SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks; 2023 [pdf]
    Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
  • On Calibration of Modern Neural Networks; 2017 [pdf]
    Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger
  • Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation; 2020 [pdf]
    Sangdon Park, Osbert Bastani, Jim Weimer, Insup Lee
  • A tutorial on conformal prediction; 2007 [pdf]
    Glenn Shafer, Vladimir Vovk
  • PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction; 2020 [pdf]
    Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee
  • PAC Prediction Sets Under Label Shift; 2024 [pdf]
    Wenwen Si, Sangdon Park, Insup Lee, Edgar Dobriban, Osbert Bastani
  • PAC Prediction Sets Under Covariate Shift; 2022 [pdf]
    Sangdon Park, Edgar Dobriban, Insup Lee, and Osbert Bastani
  • PAC Prediction Sets for Large Language Models of Code; 2023 [pdf]
    Adam Khakhar, Stephen Mell, Osbert Bastani
  • TRAC: Trustworthy Retrieval Augmented Chatbot; 2024 [pdf]
    Shuo Li, Sangdon Park, Insup Lee, Osbert Bastani
  • Distinguishing Two Dimensions of Uncertainty; 2011 [pdf]
    Craig Fox, Gülden Ülkümen
  • Deep Exploration via Bootstrapped DQN; 2016 [pdf]
    Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
  • Simple and scalable predictive uncertainty estimation using deep ensembles; 2017 [pdf]
    Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
  • Fairness Through Awareness; 2012 [pdf]
    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel
  • Equality of Opportunity in Supervised Learning; 2016 [pdf]
    Moritz Hardt, Eric Price, Nathan Srebro
  • Inherent Trade-Offs in the Fair Determination of Risk Scores; 2016 [pdf]
    Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan
  • Counterfactual Fairness; 2017 [pdf]
    Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva
  • Calibration for the (Computationally-Identifiable) Masses; 2016 [pdf]
    Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, Guy N. Rothblum
  • FairSquare: probabilistic verification of program fairness; 2017 [pdf]
    Aws Albarghouthi, Loris D'Antoni, Samuel Drews, Aditya Nori
  • Verifying Fairness Properties via Concentration; 2019 [pdf]
    Osbert Bastani, Xin Zhang, Armando Solar-Lezama
  • Algorithms for Fairness in Sequential Decision Making; 2021 [pdf]
    Min Wen, Osbert Bastani, Ufuk Topcu
  • Rethinking Fairness for Human-AI Collaboration; 2024 [pdf]
    Haosen Ge, Hamsa Bastani, Osbert Bastani
  • SmoothGrad: removing noise by adding noise; Workshop on Visualization for Deep Learning, ICML 2017 [pdf]
    Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
  • A Unified Approach to Interpreting Model Predictions; NeurIPS 2017 [pdf]
    Scott Lundberg, Su-In Lee
  • "Why Should I Trust You?": Explaining the Predictions of Any Classifier; KDD 2016 [pdf]
    Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
  • Stability Guarantees for Feature Attributions with Multiplicative Smoothing; NeurIPS 2023 [pdf]
    Anton Xue, Rajeev Alur, Eric Wong
  • Counterfactual Visual Explanations; ICML 2019 [pdf]
    Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee
  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV); ICML 2018 [pdf]
    Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
  • Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR; Harvard Journal of Law and Technology 2018 [pdf]
    Sandra Wachter, Brent Mittelstadt, Chris Russell
  • Understanding Black-box Predictions via Influence Functions; ICML 2017 [pdf]
    Pang Wei Koh, Percy Liang
  • Datamodels: Predicting Predictions from Training Data; ICML 2022 [pdf]
    Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, Aleksander Madry
  • TRAK: Attributing Model Behavior at Scale; ICML 2023 [pdf]
    Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry
  • Scallop: A Language for Neurosymbolic Programming; PLDI 2023 [pdf]
    Ziyang Li, Jiani Huang, Mayur Naik
  • Relational Programming with Foundation Models; AAAI 2024 [pdf]
    Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik