About Me

Welcome to my corner of the internet! I am a 4th year PhD student at the University of Pennsylvania advised by Prof. Mayur Naik. My research interests span Programming Languages and Machine Learning. Specifically, my research leverages techniques from program synthesis and analysis to build tools and frameworks to enable machine learning practitioners effectively understand where their models fail, and ways to fix them. My other research interests include developing program synthesis techniques to streamline software analysis, bug finding, and code generation.


News

  • I am very grateful to be awarded the 2023 Google PhD Fellowship in Programming Technology and Software Engineering.
  • Our paper, titled "Relational Query Synthesis ⨝ Decision Tree Learning" is accepted to VLDB 2024.

Research

While machine learning has seen several advances in recent years, with models achieving state-of-the-art performance on a variety of tasks, analyzing and understanding these models and their failures is an ad-hoc and often chaotic process. This is exacerbated by the lack of tools and frameworks that allow practitioners to interactively explore their models in a manner that is intuitive and easily accessible.

My research aims to bridge this gap by developing novel techniques and tools to allow the systemic analysis and debugging of machine learning models. To this end, my framework, SQRL (pronounced squirrel) uses data-driven program synthesis techniques to characterize the errors in machine learning models in terms of grounded concepts and relations intuitive to practitioners. You can read more about SQRL in our blog post here.

I am also working on a novel querying langauge that allows practitioners to directly query their models and datasets in a uniform manner in a style akin to querying frameworks like MongoDB and SQL. This allows practitioners to craft intricate and complex queries that characterize the errors in their models and can generalize to identify similar errors in unseen data as well as in other models. These queries can identify a range of issues, from simple classification errors to violations of domain knowledge, biases and labeling errors in the training data, distribution shift, and more. While this is still work-in-progress, you can read more about it in our manuscript here.

If any of this interests you, I am actively looking for collaborators and would love to chat! Feel free to reach out to me by email here.


Publications

Recent Manuscripts
MDB: Interactively Querying Datasets and Models
Aaditya Naik, Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik
Interactive Code Generation via Test-Driven User-Intent Formalization
Shuvendu K. Lahiri*, Aaditya Naik*, Georgios Sakkas*, Piali Choudhury, Curtis von Veh, Madanlal Musuvathi, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao

Conference Papers
Relational Query Synthesis ⨝ Decision Tree Learning
Aaditya Naik, Aalok Thakkar, Adam Stein, Mayur Naik, Rajeev Alur
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation.
Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, Mayur Naik
Sporq: An Interactive Environment for Exploring Code Using Query-by-Example.
Aaditya Naik, Jonathan Mendelson, Nathaniel Sands, Yuepeng Wang, Mayur Naik, Mukund Raghothaman
Example-Guided Synthesis of Relational Queries.
Aalok Thakkar, Aaditya Naik, Nate Sands, Mukund Raghothaman, Mayur Naik, Rajeev Alur
GenSynth: Synthesizing Datalog Programs without Language Bias.
Jonathan Mendelson*, Aaditya Naik*, Mukund Ragothaman, Mayur Naik
Code2Inv: A Deep Learning Framework for Program Verification.
Xujie Si*, Aaditya Naik*, Hanjun Dai, Mayur Naik, Le Song

Workshop Papers
Learning to Walk over Relational Graphs of Source Code
Pardis Pashakhanloo, Aaditya Naik, Hanjun Dai, Petros Maniatis, Mayur Naik