Breaking New Ground

In the digital age, information is a double-edged sword. The ability to collect and study troves of user data helps companies to build better services and enables scientists to make important discoveries. But when mining large datasets, researchers risk exposing sensitive personal information.

Aaron Roth, the Raj and Neera Singh Assistant Professor in the Department of Computer and Information Science (CIS), believes that with smarter tools, we can do “big data” science while protecting individual privacy. “We’re used to thinking of privacy as this goal that’s inherently at odds with analyzing data,” notes Roth. “But privacy and data usefulness are surprisingly aligned with each other.” Through his foundational work on “differential privacy,” Roth is building algorithms that researchers can use to analyze large datasets without revealing the identities of individuals. He’s discovering that these algorithms don’t just ensure privacy, they help researchers arrive at more robust conclusions.

GETTING STARTED

When Roth began graduate school at Carnegie Mellon University, the issue of online data privacy was just beginning to garner national attention. One of the first major security breaches occurred in 2006, when AOL released "de-identified" search logs of hundreds of thousands of users to benefit academic researchers. The well-intentioned move became a scandal when The New York Times tracked down one of the supposedly anonymous web searchers, and showed how individual users could be identified from the data. As the number of large data breaches grew, security researchers began asking what could be done.

"People wanted a way of sharing data while having a guarantee that someone wasn't going to come along and re-identify the data later," Roth says.

Several research groups at the time then proposed the idea of differential privacy, which aimed to address these exact problems.
Roth’s doctoral thesis, the first on the subject of differential privacy, attracted the attention of CIS faculty member Michael Kearns, who encouraged him to apply for a position at Penn.

"I likened our recruitment of Aaron to recruiting Kobe Bryant out of high school," says Kearns, National Center Professor of Management and Technology. “I knew it would be fantastic to have him here."

If anything, Roth has managed to exceed those high expectations since arriving at Penn in 2011, collaborating with researchers across academia and industry to lay the theoretical groundwork for the nascent field of differential privacy. Roth’s achievements have earned him numerous honors, including a Sloan Research Fellowship, an NSF CAREER Award and a Yahoo Career Advancement award.

“It’s so rare to find someone at such a young age who’s not just doing great research, but doing great research in a field he helped to create," notes Kearns. "I'm having the time of my life working with him."


SAFER AND SMARTER

When a researcher wants to answer a large number of questions about a dataset, the obvious thing to do is to query that dataset many times. But this strategy comes with a risk.

"If I’m an algorithm, and I compute an answer on a dataset exactly, every time you ask a question, you’d eventually be able to reconstruct the dataset and find every individual," Roth says.

Instead, a differentially private algorithm will try to predict accurate responses by looking at the questions that the researcher has already posed and the answers that have been given, and guessing based on a large number of other datasets consistent with the results so far.

"It turns out, from the answers I’ve given previously, information about the dataset is implied that might already determine the answer to the question you’ve asked next," Roth says. This strategy ends up giving a privacy guarantee, because the algorithm only has to look back at the dataset a small number of times.

While differentially private algorithms were conceived as a tool to protect individuals, Roth and his colleagues have made a surprising discovery during the course of their research: these algorithms can also protect researchers from the “false discoveries” that occur when a dataset is repeatedly mined for correlations.

One can imagine a medical researcher looking for correlations between smoking and lung cancer in a dataset of 250 individuals. Rather than learning idiosyncratic things about her study group, she’d like to use the data to make inferences about the general population. That goal turns out to be aligned with the goal of data privacy.

"In data privacy, what you want to do is learn facts about the underlying population while provably not learning very much about individual members of the population," Roth says.

Together with researchers at the University of Toronto, Microsoft Research, IBM and Google, Roth published a study in Science this year outlining a method for testing successive hypotheses on the same dataset using differentially private algorithms. Disciplines ranging from cancer research to economics might in the future use similar methods to protect the privacy of study participants while ensuring statistically robust conclusions.

BUILDING NETWORKS


Roth's efforts to build a research field that bridges academia, technology and society are mirrored by his teaching and mentoring activities at Penn. During his short time at the University, Roth has built foundational courses for the Networked & Social Systems Engineering (NETS) program, an interdisciplinary major that teaches undergraduates to think scientifically about topics like the viral spread of content on Facebook and Search Engine Optimization.

“Normally, a junior faculty member would come into a traditional existing department and teach courses that were already designed," Kearns says. "Aaron came in and designed an algorithmic game theory class for undergraduates from scratch. There existed no such course in the world at that level."

Roth enjoys teaching undergraduates, and also finds it gratifying to watch the intellectual development of his doctoral students.

"All of my graduate students, by the time they are several years in, have become real colleagues and independent researchers," he remarks. “The transformation from when they started graduate school is really remarkable to see.


View the original article in Penn Engineering magazine "Breaking New Ground" by Madeleine Stone.

Return to News Features