List of Current Senior Projects - 2015/2016
1. Beryl: Motion Capture Analysis
Members: Cezar Babin, Fahim Abouelfadl
Advisors: Hyun Soo Park
Abstract: The goal of this project was to perform analysis on human motion capture data and provide useful insight for applications necessitating movement correction. We built a library of tools that could parse .bvh files and then visualize, filter, compress and structure the data so that it can be
analyzed with traditional data analysis techniques. Equipped with this custom set of tools, we seeked to demonstrate the application of motion
capture to the field of sports analytics. We built several classifiers that could determine the
success of a freethrow shot in basketball by making use of input parameters that consisted of
key features related to the shooting motion. These tools allowed us to investigate the various
attributes that lead to a shot’s success and assess their importance based on the individual that
performs the shot. The work conducted here serves as a strong base for developing programs that
can provide customized coaching feedback for individuals attempting to strengthen their
shot, and can be used as a framework for conducting motion capture analysis in other
areas relating to sports.
2. Private Surveys for the Public Good
Members: Bianca Pham, Emmanuel Genene, Ethan Abramson, and Gaston P Montemayor Olaizola
Advisors: Aaron Roth, Andreas Haeberlen, Dennis Culhane
Abstract: Protecting the privacy of individuals' data collected while taking a survey has always been a challenging task. To date, survey data holders remove what is referred to as Personally Identifiable Information (PII). Despite these efforts, it has been repeatedly shown possible to uniquely reidentify people in the dataset and recover all of their data. The goal of our project is to create the first ever consumer facing application that uses a controlled amount of randomness to induce anonymity in online surveys. This means that any individual may vote on a particular poll and his/her identity and associated responses will be probabilistically protected. To accomplish this, we built on the theoretical results associated with Differential Privacy. Differential Privacy is a statistical technique that allows data holders to guarantee the privacy of those in the dataset. It works by adding a calculated amount of random noise to the results of every query on the dataset. When the result of a query on the dataset depends significantly on a very small subset of individuals in the dataset, then the amount of noise added masks their responses keeping the survey results anonymous. When the result of a query on the dataset does not depend on a small subset of responses, then the added noise becomes negligible allowing us to learn about the population as a whole. We created a free online application, where survey respondents should feel comfortable answering truthfully, which in turn allows survey creators to ask personal and/or sensitive questions to gather important insights.
3. P.D.A.T. (Piazza Data Analysis Tool)
Members: Aashish Lalani, Varun Agarwal
Advisors: Swapneel Sheth, Arvind Bhusnurmath, Benedict Brown
Abstract: Piazza is a platform used by University courses where students can ask questions and get answers. Our project has created generalizable parsing tools to extract class relevant data formatted into statistically usable datasets. It also provides a suite of statistical tools by which this data may be appropriately analyzed. Analysis includes the use of several different distributions including Poisson, Beta-Geometric and Negative Binomial. We modeled behaviors expounded from the extracted data. For example the answering patterns of teaching assistants versus students, newly hired teaching assistants versus older teaching assistants, the effect of question length on answer time, redundant questions by specific students, the small number of students who answer most of the questions and the spike of questions before milestones. We provide relevant information and tools to our project advisors for the betterment of their courses in particular Introduction to Computer Programming. Results show that recruiting practices may be improved by using models trained from past semesters to model TA involvement in later semesters.
4. Surgery Concierge: Improving Clarity and Delivery of Surgery Instructions
Members: Chris Akatsuka, Tadas Antanavicius, Rosmary George, Joyce Lee
Advisor: Ani Nenkova
Abstract: THERE are 53 million outpatient surgeries per year in
the United States. For each of these surgeries, patients
receive some form of instructions for what to do in the
perioperative stages of surgery as part of a holistic health care
plan. Oftentimes, these instructions are presented through disorganized,
generic, and vague documentation, or even verbal
The lack of specificity and clarity in these instructions
leads to surgery cancellations and suboptimal care, often with
significant impact on health outcomes. We present a HIPAA
compliant service that aims to begin to tackle these issues by
providing targeted instructions via digital means such as SMS,
PDF, and Calendar events.
This service was created under the consultation of medical
experts, mock user testing, and field patient user testing. It
is an end to end solution offering interfaces for both doctors
office scheduler employees and end user patients. Creating an
effective solution involved aggregating data and templates to
ease the process of integrating new protocols and to ensure
only relevant data were given focus.
We launched a pilot version of our service that allowed
patients to download their customized PDF, subscribe to
receive SMS reminders, and integrate Calendar events into
their schedules. While the pilot helped us determine that all
three services were used, the participants did not reply to
feedback requests. In its place, we conducted a peer survey and
CrowdFlower survey. Of our 124 peers and 200 CrowdFlower
respondents, most indicated that they would prefer our services
over existing solutions.
5. WonderWall: A Machine-Learned Network Filtering Engine
Members: Kruesit Upatising, Christian Barcenas, Yesha Ouyang, Scott Collins
Advisor: Jonathan M. Smith
Abstract: Wonderwall is a proofofconcept
network filtering engine utilizing machine learning to identify
malicious network packets. Trained using modern profilegenerated
datasets, Wonderwall aims to
augment human reaction and rulesbased
IDSs to respond to attacks in realtime.
It is designed to be
integrated into a virtual network such as one built with OpenFlow to scalably handle malicious network
6. Roshi : Machine Learning Powered Job Recommendations
Members: Dhrupad Bhardwaj, Shreshth Khilani
Advisors: Andreas Haeberlen
Abstract: Recruiting portals today are exclusively focused on
job search and filtering. Roshi is a smart web application that
uses Machine Learning to learn a student's preferences and
accordingly recommend employment opportunities. Roshi uses
a combination of revealed preferences and implicit choices to
train an ML engine and iteratively update recommendations.
7. Minimizing Bias in Residency Matching: A Study in Non-Standard Random Walks
Members: Rebecca Baumher, Jeremy Bierema, Scott Buchanan, Meryem Essaidi
Advisor: Sampath Kannan
Abstract: For problems with multiple solutions, it is often desirable to select a solution uniformly at random.
This gives all possible solutions a chance of being selected, whereas deterministic algorithms might
always bias the outcome towards one solution or one family of solutions. In particular, we focus on the
problem of matching residents to hospitals, where deterministic algorithms are inherently biased to favor
one side over the other. An unbiased solution is to pick a matching at random. We can represent all
possible matchings as nodes in a graph and perform a random walk on this graph. We then output the
matching represented by the node that the random walk terminates on, after a successcient number of steps.
However, a standard random walk can take very long to reach the stationary distribution. Thus, our
group investigated various non-standard random walks in order to nd one that would be more rapidly
mixing, which would give a time improvement over standard random walks. As a result, we have been
able to reduce the maximum possible deviation from the standard distribution on certain input graphs.
Our results show a consistent improvement by a factor of 3 over that achieved using standard random
8. A Tool for Visualizing Gun Violence Data
Members: Mike Browne, Nina Illeva, Lexi Selldorff, Fabian Wikstrom
Advisor: Chris Callison-Burch, H. Andrew Schwartz
Abstract: Guns were the cause of 12,942 deaths in America in 2015. Most Americans underestimate the number of gun-related incidents in their country. In the last year, guns have killed more people than drug and alcohol overdoses, Parkinson's Disease, and war. We built a website that allows users to look at specific incidents or graphs of gun violence, helping them grasp the breadth of America's gun addiction. Our solution uses data from news articles because the US government has no comprehensive record of gun violence incidents. These articles include details that are not covered in most analyses of gun violence. Our interactive website allows users to create and manipulate their own graphs enabling them to unlock insights that were previously unreachable.
9. Predicting Non-Seizure Regions to Expedite EEG Analysis
Members: Isobeye Daso, Eliana Mason, Grace Wu
Advisor: Sanjeev Khanna
Abstract: An electroencephalogram (EEG) is a test used to evaluate electrical activity in the brain. Neurologists use EEG data to detect abnormal brain activity that may be associated with certain brain disorders such as epilepsy, tumor and stroke. Doing so requires reviewing all hours of a patient's EEG recording to identify and diagnose seizure activities. Since the length of the recordings can range from several hours to weeks, the demand for specialized expertise often exceeds the supply. Our project assists neurologists in monitoring epileptic patients for seizure. We focus on identifying regions of the EEG recording that contain no seizure activity and do not need to be reviewed. Doing so will drastically decrease diagnostic variance and neurologists' time spent on reviewing data. To do this, we identified mathematical features in the EEG data (such as line length, energy, and correlation) and trained multiple classifiers with patient data. Our primary classifier identifies areas of non seizure and decreases the time a neurologist needs to spend reading EEG data by 90%. The classified EEG data is displayed on a web application, allowing users to focus only on possible seizure regions, providing a streamlined process for reviewing large volumes of EEG data.
10. Fugue: Musical Programming on the Web
Members: Philip Del Vecchio
Advisor: Steven Zdancewic
Abstract: Music is an inherently functional domain - it is built on ab-
stractions and functions of notes, themselves abstractions of sound waves.
Music is a regular language, i.e., one that can be expressed as a regular
expression (sheet music) and recognized by a nite automata (the musician). This project presents a domain-specic language for ClojureScript,
sic composition, production, and performance on the web. By leveraging
the Web Audio and Web MIDI APIs, the portability of web applications,
exibility and power of Lisp, this project gives programmers and
musicians the ability to write and perform music using written code, a
command-line interface, MIDI instruments, or real instruments natively
from a web browser.
11. The Social Shopping Platform
Members: John Earle
Advisors: Swapneel Sheth
Abstract: Social marketing is a powerful tool used by online storefronts in a variety of
industries. It is costefficient
and effective by relying on viral sharing for exposure. Social
and buzz marketing, however, is much more difficult to employ seamlessly in an instore
shopping environment. The Social Shopping Platform aims to improve the shopping
experience for customers and provide a realtime
social marketing platform for business
The system allows users to save products to a mobile app as they browse by
scanning QR codes registered with each product. By forming a connection between the
customer and individual products, the platform provides a comprehensive product
history and detailed product suggestions based on browsing patterns to the consumer.
In turn, businesses owners have access to targeted browsing analytics and integrated
social marketing on a productbyproduct
In particular, using the platform, consumers are able to share their browsing
history with friends simply by saving products as they browse. The storefront has
access to product views and custom demographic information. Finally, alternative
products are suggested based on learned association rules from prior browsing
sessions and RGB values scraped from uploaded product images.
12. Smart Prescription Handling Clarifying Prescriptions To Prevent Medication Errors
Members: Brenden Guthrie, Guillermo Gutierrez, Andrew Shichman
Advisor: Mitchell P. Marcus
Abstract: Medication errors are a major cost to health care systems.
Two major causes of these error are mis-written prescriptions
and inaccurate interpretation of prescriptions by patients.
We created a two-sided iOS application, which encourages
accurate prescription writing without the use of
forms, and provides explicit medication instructions to patients
through a customized medication schedule. At the
heart of this solution exists an algorithm which converts raw
prescription text into a structured form.
13. Identikey: A Distributed Social Network
Members: Drew Fisher, Jacob Henner, Vamsi Jandhayala
Advisor: Jonathan Smith
Abstract: Online social networks such as Facebook and Twitter
have become an essential tool for communication. Their
ease-of-use and the associated network ect have made them
ubiquitous. Unfortunately, they often do little to protect user
privacy. Instead, they frequently sell user information to
advertisers, and provide it to governments when requested.
They comply with government requests for censorship, and
in authoritarian states, they may be blocked entirely. This
is clearly unacceptable, especially since the communication
these networks facilitate often inspires needed social change.
We propose Identikey, an encrypted and distributed social
network. Instead of the conventional model where user
data is stored by a central entity (e.g. Facebook) Identikey
stores user data across the machines of network participants.
Because the data is distributed, it cannot be easily censored
or modied by the central entity. Since it is encrypted, only
users intended to receive content are able to access it. We
believe this system is much more censorship resistant than
other social networks, while preserving user privacy.
14. Determining If A Coffee Chat Has Been Scheduled
Member: Hong Kim, Minsu Kim
Advisor: Chris Callison-Burch
Abstract: We schedule to meet people everyday, whether it is to
pick up keys from an Airbnb host or to go on a date
with your match on OkCupid. Service providers such
as Airbnb or OkCupid have great incentive to know if
the two parties involved have scheduled to meet
because failure to do so often indicates a fail rate of
the service itself. However, there is no easy way to
automatically determine this because scheduling
often happens over free-form text messages. The goal
of our project was to build a classifier that would
determine if a meeting has been scheduled given a
text message exchange as input.
We ran a service called FreeForCoffee where users
were paired with other people to schedule coffee via
SMS. Messages that were exchanged on this platform
were used as the data source of our project. We built
an anonymized labeling web interface where we
labeled text messages with four labels: time, location,
agreement, and cancel. With the labeled data, we
worked on label prediction and ultimately trained a
decision tree classifier that could accurately predict
whether a coffee chat was scheduled given a text
15. Counterfactuals in the Language of Social Media: A Natural Language Processing Project in Conjunction with the World Well Being Project
Members: Anthony Janocko, Allegra Larche, Joseph Raso, Kevin Zembroski
Advisor: Lyle Ungar
Abstract: Certain aspects of natural language can give clues to a person's personality and other traits.
One such aspect is the use of counterfactual expressions. Counterfactuals are statements that
examine how a hypothetical change in a past experience could have affected the outcome of
that experience. Counterfactuals have been shown to bring meaning to people's lives, alter
behavior related to planning, and achieve affect and emotional management. Most
counterfactuals are in the form of a conditional statement such as "if..then.." and contain
modal verbs. Hypothetical statements regarding future events are not considered
counterfactuals. After investigating patterns in counterfactual usage, we created a natural
language processing model that tags tweets as either counterfactual or not. We predict that in
United States counties, tweet counterfactual usage as a percentage of total tweets will be
directly related to life satisfaction in a given county. Our model achieves 90.24% accuracy in
identifying counterfactuals from our test set. The experiment determines a tweet's location,
updates a total tweet count, and updates a counterfactual count using the model for each
county. The percentage counterfactual usage is used in a linear regression analysis to
determine the validity of our hypothesis. Finally, a web application visualizes counterfactual
usage by county and uses the Twitter API to allow users to set a time period and view
counterfactuals in a given location. The application also allows users to view the
counterfactuals in their personal profile.
16. CPR Connect: Integrated Health System to Notify and Assist First Responders
Members: Richard Kitain, Kevin Lei, Vivek Panyam, Shichao Wang
Advisor: Chris Murphy
Abstract: When a cardiac arrest incident occurs, every minute counts. The average national response time for cardiac arrest incidents outside of the hospital is 9 minutes, but according to the American Heart Association, brain death starts to occur in 4 to 6 minutes. As a result, only 8% cardiac arrest victims outside of the hospital survive. However, if bystanders give CPR and use an AED, survival rates can increase to 38%.
Our team created a mobile application that alerts CPR certified volunteers to nearby cardiac arrest incidents. The application leverages nearby bystanders to reduce the wait time until help arrives, thus improving patient outcomes. It allows two types of users to sign up: (1) users who are at risk of cardiac arrest; (2) CPR-certified responders. If a patient undergoes cardiac arrest, the application notifies nearby responders. Once a responder agrees to help, the application guides that responder to the patient's location. Finally, the application also informs any emergency contacts that the patient has listed.
17. POWerNAV: Map Data in Augmented Reality for Better Pedestrian Navigation
Members: Anthony Hsieh, Michael Li, Darren Yin
Advisor: Stephen H. Lane
Abstract: Interpreting a map in real time, whether
it be on paper or a smartphone, is often
confusing and time-consuming. POWerNAV
(Pedestrian Overlay onto World of Navigational
Augmented View) is designed to improve the
pedestrian navigation process by overlaying
data from a map, like building names and streets,
onto the user view via augmented reality (AR).
The overlay integrates map data with the real
world, removing the need to consult a separate
The Epson Moverio BT-200 smartglasses,
which run Android 4.0.4, serve as the main
development hardware. GPS and sensor data is
used to detect user location and head
orientation. Data is imported from Google Maps
to create a model of the world, which serves as
the basis for overlays on buildings, streets, and
points of interest. Virtual objects like
destination beacons and animated path arrows
are also supported. To demonstrate the viability
of POWerNAV as a platform for more complex,
content-rich use, a Penn campus tour is
included with the system.
It is evident that AR-based navigation
has many uses beyond a campus tour that can
be developed further, like in the indoor and
automobile domains. However, current
hardware suffers from relatively poor
performance and needs to advance more. The
most prominent challenges of developing
POWerNAV involve grappling with imperfect
sensors. The user's location is approximated to
the nearest pathway or road; a complex sensor
fusion of the accelerometer, gyroscope, and
compass, as well as computer vision analysis, is
needed to ensure proper alignment of the
overlay. The barrier to entry for designing a seamless AR experience will decrease as location
and orientation sensors continue to improve.
18. SmartGrow -A Personalized Plant Monitor
Members: Alexander Little, Martin Greenberg
Advisor: Jorge Santiago-Aviles
Abstract: The robotic agriculture space has recently become a popular area for innovation. However, most existing systems in this space focus only on irrigation. Existing solutions for robotic agriculture are typically intended to be deployed on a large scale and are inappropriate for home-scale use. Finally, there are very few systems able to adapt to the particular plant being supported by the system. SmartGrow is an electronic plant ecosystem, designed for home use, to give the user an in-depth look at their plant's health.
The system uses an Arduino Uno and its I²C bus to accurately measure factors that determine plant health such as soil moisture, temperature, and lighting. The Arduino automates some parts of the plant care, such as watering and lighting. The processing power of the SmartGrow system is provided by a Raspberry Pi 2. The Raspberry Pi is responsible for reading in the raw data from the Arduino, and then making decisions on what needs to be done for the plant.
19. Generalized Recommendation Platform
Members: Alex Harelick, Corey Loman
Advisors: Zachary Ives
Abstract: Recommendations pervade our online experience. Whether
users are shopping for books on Amazon or watching movies
ix, many websites rely on good recommendations as
a means to improve the user experience and increase user en-
gagement. Creating useful recommendations requires knowledge of recommendation algorithms and the ability to run
large chunks of data on a distributed system. While most
websites have a large amount of user behavior data, some
lack the technical expertise or time to provide the useful
recommendations needed by users.
A generalized recommendation platform changes that expe-
rience. Developers can sign up for our site, upload a data
le, and receive recommendations for their users. This is
achieved through an item-based collaborative ltering algo-
rithm. The algorithm relies on nding similar items in the
user behavior data and is run through Apache Mahout on
top of Amazon's Elastic MapReduce. We've also created a
Penn course recommendation web app, a proof-of-concept
website for our platform, to demonstrate the end-to-end developer experience.
20. GOLD: GPS and Optic Landing of Drones A Hybrid Approach
Members: Stefania Maiman, Matt Schulman, Josh Pearlstein
Advisors: Jonathan M. Smith
Abstract: The successful use of drones for package delivery
offers revolutionary cost savings for logistics providers. Today,
the main barrier to drone delivery is the accuracy of drones
during automated landings. GPS navigation accuracy is limited
to a four meter radius, so getting drones to accurately and
inexpensively navigate to the last few meters of precision is
a major obstacle. This project creates a new accurate and
cost-effective solution for hyper-accurate drone landing. GOLD
combines GPS with optical navigation to iteratively recognize
and descend towards a target landing spot.
The system works by using the GPS navigation to send the
drone to a high altitude above the target. The target landing
spot is denoted on the ground with a QR-code poster. Next,
the drone iteratively photographs the landscape beneath it,
processes the image onboard and gradually descends towards
the ground based on the sublocation of the QR-code in each
photo taken. This hybrid approach offers an inexpensive,
scalable, and accurate drone delivery system.This project also
creates a web interface to manage all drone flight, delivery
orders, and allocation of drones. For the proof of this concept,
the website enables order fulfillment and allocation with a
dynamic simulation interface.
21. Collaborative Web Page Change Detection
Members: Sebastian Messier, Isha Bajekal
Advisor: Zachary Ives
Abstract: To our surprise, with the growing responsiveness and dynamicity of the web,
there are few solutions to track content changes on a page. We sought to address this
by building a collaborative, non intrusive tool which would rely on users, rather than
computers, to detect web page changes as close to real time as possible. We figured
that most people would be interested in tracking sites that others will be visiting later in
the day. Leveraging this thought into a solution was particularly interesting to us, so we
sought to conceptualize an implementation which eventually became this project.
Existing technology is more rigid and less creative. If you search for a
pagechange detector, most sites offer cheap services which simply wget a page
periodically, and compare it with some preceding or initial result. Some, going a step
further, compare screenshots of specific areas of a website, which are taken periodically
as well. All of these sites have a limit of once, if not twice a day checks. When one
considers that many other individuals will visit this same page throughout the day, the
idea of using their visit to another's advantage gains traction. It was these conditions
which finally convinced us to pursue this idea as our senior design project.
Members: Zach Krasner, Kate Miller, Alex Whitaker
Advisor: Chris Callison-Burch
We present a system for completing tasks on Mechanical Turk, a webbased
crowdsourcing platform, in an Android application. This addresses the gap between earned
hourly wage, which does not take into account time spent navigating poorlydesigned
interfaces or searching for completable tasks for which the worker is qualified, and the effective hourly
wage: what a worker actually earns per hour. Our proofofconcept
system demonstrates that certain types of tasks are ideally completed on mobile, while others are merely possible or
entirely infeasible. We also draw conclusions about the scalability of our system and its
relationship to API or internal access to crowdsourcing platforms like Mechanical Turk and
23. HelpDesk- Optimal Shift Scheduling and Support Case Management
Member: Michael Molisani
Advisor: Benedict J Brown
Abstract: In today's economy, the efficient use of time and allocation of resources is
invaluable. For many supervisors, these goals are achieved by assigning employees to
scheduled shifts based on their availability and preferences. Generally, manual scheduling
solutions can range from prioritizing task allocations on a firstcome,
assignments. These methods are prone to both uneven distribution of shifts
and even incomplete assignments. Manual scheduling can also quickly become a
process that increases rapidly in difficulty with a larger number of
I present HelpDesk, an application that addresses these issues and creates a
scalable system to optimally accommodate scheduling needs. The system allows
employees to specify which hours of the week they are available or would prefer to work.
Once a manager or supervisor specifies all of the scheduled shifts, including time, duration,
and even number of employees, HelpDesk produces a mathematically optimal assignment
of employees to shifts. This process can be repeated for different input variables, which
allows users to craft specific allocations to meet the needs of the employer.
24. Active Learning for Image Classication
Members: Pratyusha Gupta, Ella Polo, Lauren Reeder, Alex Wissmann
Advisor: Mitch Marcus
Abstract: Personal nancial management is a critical step
to help alleviate poverty in developing countries.Our goal is to allow anyone without technical skills to be able to harness the power of machine learning, specically active learning. We
decided to focus our eろts on image classication. We built a user-friendly web application
that uses active learning to quickly classify any
set of images that a user uploads to the application.
Members: Terry Sun, Sam Rossi
Advisor: Steve Zdancewic
computer and browser, and interpreters such
server-side language. Unfortunately, modern
typically written in C/C++, languages reliant on manual memory management. This
results in countless memory leaks, bugs, and
security vulnerabilities related to memory
strong memory safety guarantees and speeds
comparable to C++. Our interpreter runs
code either from source les or an interactive REPL (read-evaluate-print-loop), similar to the functionality of existing server-
demonstrate the viability of using Rust to
core subset of language features. To that
end, we've tested our coverage using Google's
Sputnik test suite, an ECMAScript 5 conformance test suite.
26. Middleware for Supporting “Big Data” Analytics across a Database Cluster
Members: Daniel Salowe, Shayan Patel, Sahil Shah
Advisor: Zachary Ives
The lack of affordable and open-source support for efficient, big data analytics across a
relational database cluster is a problem in the technology industry. Specifically, there exists a
need for an open-source distributed database middleware for off-the-shelf databases (MySQL).
We approached the project by first deciding on the most important needs of a middleware
addressing these issues. This project focuses on three main components, namely fault
tolerance, distributed JOIN operations, and support for computationally intensive operations. By
focusing on these three issues, we have developed an infrastructure that future developers can
build upon, while also maintaining usefulness as a standalone middleware. The system detects
node failures and automatically reroutes queries to replica nodes. Our distributed JOIN
operation leverages nodes as computation resources to achieve efficient JOINs. The
middleware works the best in situations where computation-heavy operations are to be applied
on relevant data before or after the SQL operations. Our system distributes the computations as
well as the query to work around issues such as memory constraints that are encountered on a
single machine. Our work has led to a system that is accessible via a web user interface,
allowing users to define arbitrary computations and run queries on large amounts of data.
27. Daruma: Regaining Trust in Cloud Storage
Members: Doron Shapiro, Michelle Socher, Ray Lei, Sudarshan Muralidhar
Advisor: Boon Thau Loo, Nadia Heninger
Abstract: Currently, cloud storage services are used by consumers for a wide
variety of important documents, including family photos, healthcare
information and proprietary corporate data. These services
all make promises about their storage solutions, usually including
some guarantees of confidentiality, integrity, and availability. However,
downtime is a fact of life for cloud services and, for better
or worse, many providers openly admit to being able to access customer
files for purposes ranging from analytics to law enforcement.
Daruma solves this problem by eliminating the need to trust
any cloud provider.We run no servers ourselves - instead, we combine
and secure the space on cloud services already used by consumers
(like Dropbox and Google) with advanced cryptographic
and redundancy algorithms. Our system provides a simple guarantee:
no one cloud service provider can read, change, or delete your
files - ever. Daruma feels just like an existing service - there are
no extra passwords to remember or frustrating workflows to navigate.
Daruma handles the complexities of security and reliability
for users, allowing them to confidently utilize cloud storage without
worrying about their previously inherent risks.
28. TorrentTrust: A Trust-Based, Decentralized Object Reputation Network
Members: Ian Sibner, Evelyn Yeung, David Xu, Quanze Chen
Advisor: Andreas Haeberlen
In this paper, we describe TorrentTrust, a decentralized object reputation system for peer-to-peer networks. Torrents are a popular target for spammers and hackers - an easy way to coax users into downloading and installing a profitable (for the hacker) piece of malware disguised as another file. Thus, determining the authenticity of a torrent has long been an issue. Many trackers use upvote/downvote systems, or allow users to verify a torrent, but bad actors can easily verify their own malicious con- tent. Also, these systems are totally centralized, creating a single point of attack for adversaries.
We researched a system called Credence (Walsh and Sirer, 2005), which was used to rank objects on the Gnutella network, and extended it to provide stronger security. The resulting system, TorrentTrust, verifies torrents based on trust relationships between users in a totally decentralized way - making it much more difficult for bad actors to promote malicious content.
TorrentTrust is a layer on top of the BitTorrent filesharing network where users can determine authenticity of content through voting and establishing trust with other users. Although similar to Credence, we show through simulation and analysis that it is more resistant to certain network attacks.
29. EZPark - Automated Parking Garage Payment System
Members: Constanza Figuerola, Samyukta Lanka, Utkarsh Shah, Max Tromanhauser
Advisor: Boon Thau Loo
Today, garage parking systems are annoying, expensive and needlessly complicated for both garages and drivers. Drivers are forced to carry around easy-to-lose tickets, wait in parking lines (especially at large events), and must manually use their credit cards. Garages have to purchase and maintain expensive payment machines, employ parking staff, and lack useful analytics.
The EZPark system was designed to solve these issues. Drivers sign up once in order to easily park in every garage in our system. When a driver enters a garage, cameras send images of the car to a central server which recognizes the plate number and keeps track of how long they stay in the garage. Upon exiting, the associated users credit card is automatically charged and the parking gate is opened. All of this data is aggregated and available to the user and garage. This solution provides convenience and ease of use for drivers, while also simplifying the parking system for garages.
30. adpt: A Differentially Private Tool for Adaptive Data Analysis
Members: Benson Chen, Lingbin Cai, Jake Hart, and Dylan Sun
Advisor: Aaron Roth
Statistical analysis forms the quantitative backbone for research across a variety of fields. Textbook statistics requires that hypotheses and methods be chosen prior to gathering data. Unfortunately, this is impractical; researchers instead choose statistical methods in response to the data to which those methods will be applied, a process known as adaptive data analysis. This allows researchers to iteratively design studies based upon intermediate results, but can lead to spurious conclusions because it invalidates assumptions underpinning statistics.
We approach this issue both practically and theoretically. We apply current research in differential privacy in an implementation that mitigates the negative consequences of adaptive data analysis. Next, we investigate conditions in which our tool is effective and compare its performance to other, similarly-motivated approaches.
31. Using Speech To Improve Software Development Productivity
Members: Brian Vander Schaaf, Neera Thavornvanit, Nillan Patel
Advisor: Chris Murphy
Software development involves a lot more than coding. Readability and organization are vital features to any software project and are necessary at every step of the development process. However, tasks such as refactoring and file navigation take time and are therefore often not prioritized and even sometimes ignored. As a programmer, it can be frustrating and time-consuming to switch between using the keyboard and mouse, which is most common in activities to enhance readability and organization of code.
We created a plugin for Sublime Text, a text editor for programmers we were familiar with, to help alleviate code management issues and make engineers more efficient by allowing them to use their voice to interact with the text editor. Users surveyed confirmed the popularity of Sublime and helped us determine the importance of various aspects of the plugin. Our plugin has twenty-eight commands that can be compounded and executed with a simple voice query, e.g. "select all, copy, comment, go to the next file, paste and indent, save all, and close the window." Users can also customize commands to their way of speaking or save a longer, compounded command with a shorter query. We performed A/B testing to evaluate our plugin's effectiveness in increasing productivity, and found that the majority of users were faster at code management tasks with our plugin than without.
32. BARK: BOINC Volunteer Cluster Manager For Apache Spark
Members: Thomas Delacour, Sedem Fialor, John Weir,
Advisor: Boon Thau Loo
Fortune magazine estimates that Apple sold an average of more than 5 million Macs per quarter in 2015. The PC market is even larger. Most of these devices boast impressive hardware specifications. However, the use cases of an average computer owner's machine (browsing the web, text editing, etcetera) suggest that the vast majority go unused. Given the abundance of untapped computational resources, it seems counterintuitive that developers depend so heavily on cloud giants to run their servers and perform their distributed computations.
BARK presents a tailored application of BOINC (Berkeley Open Infrastructure for Network Computing), a framework for volunteer computing, to Apache Spark, the increasingly popular big data processing tool. Given a Spark job and a pool of volunteer computers anything from dedicated Linux workhorses to personal laptops BARK will set up a Spark cluster, exclusively comprised of volunteer nodes, and execute the job across it. In this way, BARK offers a free alternative to for charge cloud services for end users interested in performing MapReduce computations. BARK is an example of a style of application that may become increasingly prevalent as the more fluid flow of computational resources between participants in peer-to-peer networks becomes normalized.
33. SmokeSignals: A Distributed Key-Value Store on a Mobile Network
Members: Charles Cobb, Meyer Kizner, Xiuruo Zhang
Advisor: Boon Thau Loo
Internet enabled smartphones are a ubiquitous part of modern life, but they depend critically on centralized infrastructure to access the network. This infrastructure is generally reliable, but has several shortcomings. First, large gatherings like sporting events or concerts often overwhelm the limited network capabilities in an area. Second, during a disaster such as a hurricane or terrorist attack, networks often fail completely when communication needs are most critical. Lastly, users may prefer a surveillance resistant decentralized network for sensitive communications.
Our proposed solution is a decentralized peer-to-peer key value store replicated across mobile devices. It should: synchronize with nearby devices when they are in range on a best effort basis, propagate changes across the network even as nodes move and sever existing links, or
leave the network entirely, provide a general API allowing developers to easily build a variety of peer-to-peer applications using our software