List of Current Senior Projects - 2015/2016

1. Beryl: Motion Capture Analysis
Members: Cezar Babin, Fahim Abouelfadl
Advisors: Hyun Soo Park

Abstract: The goal of this project was to perform analysis on human motion capture data and provide useful insight for applications necessitating movement correction. We built a library of tools that could parse .bvh files and then visualize, filter, compress and structure the data so that it can be analyzed with traditional data analysis techniques. Equipped with this custom set of tools, we seeked to demonstrate the application of motion capture to the field of sports analytics. We built several classifiers that could determine the success of a freethrow shot in basketball by making use of input parameters that consisted of key features related to the shooting motion. These tools allowed us to investigate the various attributes that lead to a shot’s success and assess their importance based on the individual that performs the shot. The work conducted here serves as a strong base for developing programs that can provide customized coaching feedback for individuals attempting to strengthen their freethrow shot, and can be used as a framework for conducting motion capture analysis in other areas relating to sports.

Report and Poster

2. Private Surveys for the Public Good
Members: Bianca Pham, Emmanuel Genene, Ethan Abramson, and Gaston P Montemayor Olaizola
Advisors: Aaron Roth, Andreas Haeberlen, Dennis Culhane

Abstract: Protecting the privacy of individuals' data collected while taking a survey has always been a challenging task. To date, survey data holders remove what is referred to as Personally Identifiable Information (PII). Despite these efforts, it has been repeatedly shown possible to uniquely reidentify people in the dataset and recover all of their data. The goal of our project is to create the first ever consumer facing application that uses a controlled amount of randomness to induce anonymity in online surveys. This means that any individual may vote on a particular poll and his/her identity and associated responses will be probabilistically protected. To accomplish this, we built on the theoretical results associated with Differential Privacy. Differential Privacy is a statistical technique that allows data holders to guarantee the privacy of those in the dataset. It works by adding a calculated amount of random noise to the results of every query on the dataset. When the result of a query on the dataset depends significantly on a very small subset of individuals in the dataset, then the amount of noise added masks their responses keeping the survey results anonymous. When the result of a query on the dataset does not depend on a small subset of responses, then the added noise becomes negligible allowing us to learn about the population as a whole. We created a free online application, where survey respondents should feel comfortable answering truthfully, which in turn allows survey creators to ask personal and/or sensitive questions to gather important insights.

Report and Poster

3. P.D.A.T. (Piazza Data Analysis Tool)
Members: Aashish Lalani, Varun Agarwal
Advisors: Swapneel Sheth, Arvind Bhusnurmath, Benedict Brown

Abstract: Piazza is a platform used by University courses where students can ask questions and get answers. Our project has created generalizable parsing tools to extract class relevant data formatted into statistically usable datasets. It also provides a suite of statistical tools by which this data may be appropriately analyzed. Analysis includes the use of several different distributions including Poisson, Beta-Geometric and Negative Binomial. We modeled behaviors expounded from the extracted data. For example the answering patterns of teaching assistants versus students, newly hired teaching assistants versus older teaching assistants, the effect of question length on answer time, redundant questions by specific students, the small number of students who answer most of the questions and the spike of questions before milestones. We provide relevant information and tools to our project advisors for the betterment of their courses in particular Introduction to Computer Programming. Results show that recruiting practices may be improved by using models trained from past semesters to model TA involvement in later semesters.

Report and Poster

4. Surgery Concierge: Improving Clarity and Delivery of Surgery Instructions
Members: Chris Akatsuka, Tadas Antanavicius, Rosmary George, Joyce Lee
Advisor: Ani Nenkova

Abstract: THERE are 53 million outpatient surgeries per year in the United States. For each of these surgeries, patients receive some form of instructions for what to do in the perioperative stages of surgery as part of a holistic health care plan. Oftentimes, these instructions are presented through disorganized, generic, and vague documentation, or even verbal communication. The lack of specificity and clarity in these instructions leads to surgery cancellations and suboptimal care, often with significant impact on health outcomes. We present a HIPAA compliant service that aims to begin to tackle these issues by providing targeted instructions via digital means such as SMS, PDF, and Calendar events.

This service was created under the consultation of medical experts, mock user testing, and field patient user testing. It is an end to end solution offering interfaces for both doctors office scheduler employees and end user patients. Creating an effective solution involved aggregating data and templates to ease the process of integrating new protocols and to ensure only relevant data were given focus. We launched a pilot version of our service that allowed patients to download their customized PDF, subscribe to receive SMS reminders, and integrate Calendar events into their schedules. While the pilot helped us determine that all three services were used, the participants did not reply to feedback requests. In its place, we conducted a peer survey and CrowdFlower survey. Of our 124 peers and 200 CrowdFlower respondents, most indicated that they would prefer our services over existing solutions.

Report and Poster

5. WonderWall: A Machine-Learned Network Filtering Engine
Members: Kruesit Upatising, Christian Barcenas, Yesha Ouyang, Scott Collins
Advisor: Jonathan M. Smith

Abstract: Wonderwall is a proofofconcept network filtering engine utilizing machine learning to identify malicious network packets. Trained using modern profilegenerated datasets, Wonderwall aims to augment human reaction and rulesbased IDSs to respond to attacks in realtime. It is designed to be integrated into a virtual network such as one built with OpenFlow to scalably handle malicious network attacks.

Report and Poster

6. Roshi : Machine Learning Powered Job Recommendations
Members: Dhrupad Bhardwaj, Shreshth Khilani
Advisors: Andreas Haeberlen

Abstract: Recruiting portals today are exclusively focused on job search and filtering. Roshi is a smart web application that uses Machine Learning to learn a student's preferences and accordingly recommend employment opportunities. Roshi uses a combination of revealed preferences and implicit choices to train an ML engine and iteratively update recommendations.

Report and Poster

7. Minimizing Bias in Residency Matching: A Study in Non-Standard Random Walks
Members: Rebecca Baumher, Jeremy Bierema, Scott Buchanan, Meryem Essaidi
Advisor: Sampath Kannan

Abstract: For problems with multiple solutions, it is often desirable to select a solution uniformly at random. This gives all possible solutions a chance of being selected, whereas deterministic algorithms might always bias the outcome towards one solution or one family of solutions. In particular, we focus on the problem of matching residents to hospitals, where deterministic algorithms are inherently biased to favor one side over the other. An unbiased solution is to pick a matching at random. We can represent all possible matchings as nodes in a graph and perform a random walk on this graph. We then output the matching represented by the node that the random walk terminates on, after a successcient number of steps. However, a standard random walk can take very long to reach the stationary distribution. Thus, our group investigated various non-standard random walks in order to nd one that would be more rapidly mixing, which would give a time improvement over standard random walks. As a result, we have been able to reduce the maximum possible deviation from the standard distribution on certain input graphs. Our results show a consistent improvement by a factor of 3 over that achieved using standard random walks.

Report and Poster

8. A Tool for Visualizing Gun Violence Data
Members: Mike Browne, Nina Illeva, Lexi Selldorff, Fabian Wikstrom
Advisor: Chris Callison-Burch, H. Andrew Schwartz

Abstract: Guns were the cause of 12,942 deaths in America in 2015. Most Americans underestimate the number of gun-related incidents in their country. In the last year, guns have killed more people than drug and alcohol overdoses, Parkinson's Disease, and war. We built a website that allows users to look at specific incidents or graphs of gun violence, helping them grasp the breadth of America's gun addiction. Our solution uses data from news articles because the US government has no comprehensive record of gun violence incidents. These articles include details that are not covered in most analyses of gun violence. Our interactive website allows users to create and manipulate their own graphs enabling them to unlock insights that were previously unreachable.

Report and Poster

9. Predicting Non-Seizure Regions to Expedite EEG Analysis
Members: Isobeye Daso, Eliana Mason, Grace Wu
Advisor: Sanjeev Khanna

Abstract: An electroencephalogram (EEG) is a test used to evaluate electrical activity in the brain. Neurologists use EEG data to detect abnormal brain activity that may be associated with certain brain disorders such as epilepsy, tumor and stroke. Doing so requires reviewing all hours of a patient's EEG recording to identify and diagnose seizure activities. Since the length of the recordings can range from several hours to weeks, the demand for specialized expertise often exceeds the supply. Our project assists neurologists in monitoring epileptic patients for seizure. We focus on identifying regions of the EEG recording that contain no seizure activity and do not need to be reviewed. Doing so will drastically decrease diagnostic variance and neurologists' time spent on reviewing data. To do this, we identified mathematical features in the EEG data (such as line length, energy, and correlation) and trained multiple classifiers with patient data. Our primary classifier identifies areas of non seizure and decreases the time a neurologist needs to spend reading EEG data by 90%. The classified EEG data is displayed on a web application, allowing users to focus only on possible seizure regions, providing a streamlined process for reviewing large volumes of EEG data.

Report and Poster

10. Fugue: Musical Programming on the Web
Members: Philip Del Vecchio
Advisor: Steven Zdancewic

Abstract: Music is an inherently functional domain - it is built on ab- stractions and functions of notes, themselves abstractions of sound waves. Music is a regular language, i.e., one that can be expressed as a regular expression (sheet music) and recognized by a nite automata (the musician). This project presents a domain-speci c language for ClojureScript, a dialect of Lisp that compiles to JavaScript, that allows real-time mu- sic composition, production, and performance on the web. By leveraging the Web Audio and Web MIDI APIs, the portability of web applications, and the exibility and power of Lisp, this project gives programmers and musicians the ability to write and perform music using written code, a command-line interface, MIDI instruments, or real instruments natively from a web browser.

Report and Poster

11. The Social Shopping Platform
Members: John Earle
Advisors: Swapneel Sheth

Abstract: Social marketing is a powerful tool used by online storefronts in a variety of industries. It is costefficient and effective by relying on viral sharing for exposure. Social and buzz marketing, however, is much more difficult to employ seamlessly in an instore shopping environment. The Social Shopping Platform aims to improve the shopping experience for customers and provide a realtime social marketing platform for business owners. The system allows users to save products to a mobile app as they browse by scanning QR codes registered with each product. By forming a connection between the customer and individual products, the platform provides a comprehensive product history and detailed product suggestions based on browsing patterns to the consumer. In turn, businesses owners have access to targeted browsing analytics and integrated social marketing on a productbyproduct basis. In particular, using the platform, consumers are able to share their browsing history with friends simply by saving products as they browse. The storefront has access to product views and custom demographic information. Finally, alternative products are suggested based on learned association rules from prior browsing sessions and RGB values scraped from uploaded product images.

Report and Poster

12. Smart Prescription Handling Clarifying Prescriptions To Prevent Medication Errors
Members: Brenden Guthrie, Guillermo Gutierrez, Andrew Shichman
Advisor: Mitchell P. Marcus

Abstract: Medication errors are a major cost to health care systems. Two major causes of these error are mis-written prescriptions and inaccurate interpretation of prescriptions by patients. We created a two-sided iOS application, which encourages accurate prescription writing without the use of forms, and provides explicit medication instructions to patients through a customized medication schedule. At the heart of this solution exists an algorithm which converts raw prescription text into a structured form.

Report and Poster

13. Identikey: A Distributed Social Network
Members: Drew Fisher, Jacob Henner, Vamsi Jandhayala
Advisor: Jonathan Smith

Abstract: Online social networks such as Facebook and Twitter have become an essential tool for communication. Their ease-of-use and the associated network ect have made them ubiquitous. Unfortunately, they often do little to protect user privacy. Instead, they frequently sell user information to advertisers, and provide it to governments when requested. They comply with government requests for censorship, and in authoritarian states, they may be blocked entirely. This is clearly unacceptable, especially since the communication these networks facilitate often inspires needed social change.

We propose Identikey, an encrypted and distributed social network. Instead of the conventional model where user data is stored by a central entity (e.g. Facebook) Identikey stores user data across the machines of network participants. Because the data is distributed, it cannot be easily censored or modi ed by the central entity. Since it is encrypted, only users intended to receive content are able to access it. We believe this system is much more censorship resistant than other social networks, while preserving user privacy.

Report and Poster

14. Determining If A Coffee Chat Has Been Scheduled
Member: Hong Kim, Minsu Kim
Advisor: Chris Callison-Burch

Abstract: We schedule to meet people everyday, whether it is to pick up keys from an Airbnb host or to go on a date with your match on OkCupid. Service providers such as Airbnb or OkCupid have great incentive to know if the two parties involved have scheduled to meet because failure to do so often indicates a fail rate of the service itself. However, there is no easy way to automatically determine this because scheduling often happens over free-form text messages. The goal of our project was to build a classifier that would determine if a meeting has been scheduled given a text message exchange as input.

We ran a service called FreeForCoffee where users were paired with other people to schedule coffee via SMS. Messages that were exchanged on this platform were used as the data source of our project. We built an anonymized labeling web interface where we labeled text messages with four labels: time, location, agreement, and cancel. With the labeled data, we worked on label prediction and ultimately trained a decision tree classifier that could accurately predict whether a coffee chat was scheduled given a text conversation

Report and Poster

15. Counterfactuals in the Language of Social Media: A Natural Language Processing Project in Conjunction with the World Well Being Project
Members: Anthony Janocko, Allegra Larche, Joseph Raso, Kevin Zembroski
Advisor: Lyle Ungar

Abstract: Certain aspects of natural language can give clues to a person's personality and other traits. One such aspect is the use of counterfactual expressions. Counterfactuals are statements that examine how a hypothetical change in a past experience could have affected the outcome of that experience. Counterfactuals have been shown to bring meaning to people's lives, alter behavior related to planning, and achieve affect and emotional management. Most counterfactuals are in the form of a conditional statement such as "if..then.." and contain modal verbs. Hypothetical statements regarding future events are not considered counterfactuals. After investigating patterns in counterfactual usage, we created a natural language processing model that tags tweets as either counterfactual or not. We predict that in United States counties, tweet counterfactual usage as a percentage of total tweets will be directly related to life satisfaction in a given county. Our model achieves 90.24% accuracy in identifying counterfactuals from our test set. The experiment determines a tweet's location, updates a total tweet count, and updates a counterfactual count using the model for each county. The percentage counterfactual usage is used in a linear regression analysis to determine the validity of our hypothesis. Finally, a web application visualizes counterfactual usage by county and uses the Twitter API to allow users to set a time period and view counterfactuals in a given location. The application also allows users to view the counterfactuals in their personal profile.

Report and Poster

16. CPR Connect: Integrated Health System to Notify and Assist First Responders
Members: Richard Kitain, Kevin Lei, Vivek Panyam, Shichao Wang
Advisor: Chris Murphy

Abstract: When a cardiac arrest incident occurs, every minute counts. The average national response time for cardiac arrest incidents outside of the hospital is 9 minutes, but according to the American Heart Association, brain death starts to occur in 4 to 6 minutes. As a result, only 8% cardiac arrest victims outside of the hospital survive. However, if bystanders give CPR and use an AED, survival rates can increase to 38%.

Our team created a mobile application that alerts CPR certified volunteers to nearby cardiac arrest incidents. The application leverages nearby bystanders to reduce the wait time until help arrives, thus improving patient outcomes. It allows two types of users to sign up: (1) users who are at risk of cardiac arrest; (2) CPR-certified responders. If a patient undergoes cardiac arrest, the application notifies nearby responders. Once a responder agrees to help, the application guides that responder to the patient's location. Finally, the application also informs any emergency contacts that the patient has listed.

Report and Poster

17. POWerNAV: Map Data in Augmented Reality for Better Pedestrian Navigation
Members: Anthony Hsieh, Michael Li, Darren Yin
Advisor: Stephen H. Lane

Abstract: Interpreting a map in real time, whether it be on paper or a smartphone, is often confusing and time-consuming. POWerNAV (Pedestrian Overlay onto World of Navigational Augmented View) is designed to improve the pedestrian navigation process by overlaying data from a map, like building names and streets, onto the user view via augmented reality (AR). The overlay integrates map data with the real world, removing the need to consult a separate map.

The Epson Moverio BT-200 smartglasses, which run Android 4.0.4, serve as the main development hardware. GPS and sensor data is used to detect user location and head orientation. Data is imported from Google Maps to create a model of the world, which serves as the basis for overlays on buildings, streets, and points of interest. Virtual objects like destination beacons and animated path arrows are also supported. To demonstrate the viability of POWerNAV as a platform for more complex, content-rich use, a Penn campus tour is included with the system.

It is evident that AR-based navigation has many uses beyond a campus tour that can be developed further, like in the indoor and automobile domains. However, current hardware suffers from relatively poor performance and needs to advance more. The most prominent challenges of developing POWerNAV involve grappling with imperfect sensors. The user's location is approximated to the nearest pathway or road; a complex sensor fusion of the accelerometer, gyroscope, and compass, as well as computer vision analysis, is needed to ensure proper alignment of the overlay. The barrier to entry for designing a seamless AR experience will decrease as location and orientation sensors continue to improve.

Report and Poster

18. SmartGrow -A Personalized Plant Monitor
Members: Alexander Little, Martin Greenberg
Advisor: Jorge Santiago-Aviles

Abstract: The robotic agriculture space has recently become a popular area for innovation. However, most existing systems in this space focus only on irrigation. Existing solutions for robotic agriculture are typically intended to be deployed on a large scale and are inappropriate for home-scale use. Finally, there are very few systems able to adapt to the particular plant being supported by the system. SmartGrow is an electronic plant ecosystem, designed for home use, to give the user an in-depth look at their plant's health.

The system uses an Arduino Uno and its I²C bus to accurately measure factors that determine plant health such as soil moisture, temperature, and lighting. The Arduino automates some parts of the plant care, such as watering and lighting. The processing power of the SmartGrow system is provided by a Raspberry Pi 2. The Raspberry Pi is responsible for reading in the raw data from the Arduino, and then making decisions on what needs to be done for the plant.

Report and Poster

19. Generalized Recommendation Platform
Members: Alex Harelick, Corey Loman
Advisors: Zachary Ives

Abstract: Recommendations pervade our online experience. Whether users are shopping for books on Amazon or watching movies on Net ix, many websites rely on good recommendations as a means to improve the user experience and increase user en- gagement. Creating useful recommendations requires knowledge of recommendation algorithms and the ability to run large chunks of data on a distributed system. While most websites have a large amount of user behavior data, some lack the technical expertise or time to provide the useful recommendations needed by users.

A generalized recommendation platform changes that expe- rience. Developers can sign up for our site, upload a data le, and receive recommendations for their users. This is achieved through an item-based collaborative ltering algo- rithm. The algorithm relies on nding similar items in the user behavior data and is run through Apache Mahout on top of Amazon's Elastic MapReduce. We've also created a Penn course recommendation web app, a proof-of-concept website for our platform, to demonstrate the end-to-end developer experience.

Report and Poster

20. GOLD: GPS and Optic Landing of Drones A Hybrid Approach
Members: Stefania Maiman, Matt Schulman, Josh Pearlstein
Advisors: Jonathan M. Smith

Abstract: The successful use of drones for package delivery offers revolutionary cost savings for logistics providers. Today, the main barrier to drone delivery is the accuracy of drones during automated landings. GPS navigation accuracy is limited to a four meter radius, so getting drones to accurately and inexpensively navigate to the last few meters of precision is a major obstacle. This project creates a new accurate and cost-effective solution for hyper-accurate drone landing. GOLD combines GPS with optical navigation to iteratively recognize and descend towards a target landing spot.

The system works by using the GPS navigation to send the drone to a high altitude above the target. The target landing spot is denoted on the ground with a QR-code poster. Next, the drone iteratively photographs the landscape beneath it, processes the image onboard and gradually descends towards the ground based on the sublocation of the QR-code in each photo taken. This hybrid approach offers an inexpensive, scalable, and accurate drone delivery system.This project also creates a web interface to manage all drone flight, delivery orders, and allocation of drones. For the proof of this concept, the website enables order fulfillment and allocation with a dynamic simulation interface.

Report and Poster

21. Collaborative Web Page Change Detection
Members: Sebastian Messier, Isha Bajekal
Advisor: Zachary Ives

Abstract: To our surprise, with the growing responsiveness and dynamicity of the web, there are few solutions to track content changes on a page. We sought to address this by building a collaborative, non intrusive tool which would rely on users, rather than computers, to detect web page changes as close to real time as possible. We figured that most people would be interested in tracking sites that others will be visiting later in the day. Leveraging this thought into a solution was particularly interesting to us, so we sought to conceptualize an implementation which eventually became this project.

Existing technology is more rigid and less creative. If you search for a pagechange detector, most sites offer cheap services which simply wget a page periodically, and compare it with some preceding or initial result. Some, going a step further, compare screenshots of specific areas of a website, which are taken periodically as well. All of these sites have a limit of once, if not twice a day checks. When one considers that many other individuals will visit this same page throughout the day, the idea of using their visit to another's advantage gains traction. It was these conditions which finally convinced us to pursue this idea as our senior design project.

Report and Poster

22. MobileTurk
Members: Zach Krasner, Kate Miller, Alex Whitaker
Advisor: Chris Callison-Burch

Abstract: We present a system for completing tasks on Mechanical Turk, a webbased crowdsourcing platform, in an Android application. This addresses the gap between earned hourly wage, which does not take into account time spent navigating poorlydesigned interfaces or searching for completable tasks for which the worker is qualified, and the effective hourly wage: what a worker actually earns per hour. Our proofofconcept system demonstrates that certain types of tasks are ideally completed on mobile, while others are merely possible or entirely infeasible. We also draw conclusions about the scalability of our system and its relationship to API or internal access to crowdsourcing platforms like Mechanical Turk and Crowdflower.

Report and Poster

23. HelpDesk- Optimal Shift Scheduling and Support Case Management
Member: Michael Molisani
Advisor: Benedict J Brown

Abstract: In today's economy, the efficient use of time and allocation of resources is invaluable. For many supervisors, these goals are achieved by assigning employees to scheduled shifts based on their availability and preferences. Generally, manual scheduling solutions can range from prioritizing task allocations on a firstcome, firstserve basis to trialanderror assignments. These methods are prone to both uneven distribution of shifts and even incomplete assignments. Manual scheduling can also quickly become a laborintensive process that increases rapidly in difficulty with a larger number of employees.

I present HelpDesk, an application that addresses these issues and creates a scalable system to optimally accommodate scheduling needs. The system allows employees to specify which hours of the week they are available or would prefer to work. Once a manager or supervisor specifies all of the scheduled shifts, including time, duration, and even number of employees, HelpDesk produces a mathematically optimal assignment of employees to shifts. This process can be repeated for different input variables, which allows users to craft specific allocations to meet the needs of the employer.

Report and Poster

24. Active Learning for Image Classi cation
Members: Pratyusha Gupta, Ella Polo, Lauren Reeder, Alex Wissmann
Advisor: Mitch Marcus

Abstract: Personal nancial management is a critical step to help alleviate poverty in developing countries.Our goal is to allow anyone without technical skills to be able to harness the power of machine learning, speci cally active learning. We decided to focus our eろts on image classi cation. We built a user-friendly web application that uses active learning to quickly classify any set of images that a user uploads to the application.

Report and Poster

25. js.rs - A Rustic JavaScript Interpreter
Members: Terry Sun, Sam Rossi
Advisor: Steve Zdancewic

Abstract: JavaScript is an incredibly widespread language, running on virtually every modern computer and browser, and interpreters such as NodeJS allow JavaScript to be used as a server-side language. Unfortunately, modern implementations of JavaScript engines are typically written in C/C++, languages reliant on manual memory management. This results in countless memory leaks, bugs, and security vulnerabilities related to memory mis-management.

Js.rs is a prototype server-side JavaScript interpreter in Rust, a new systems programming language for building programs with strong memory safety guarantees and speeds comparable to C++. Our interpreter runs code either from source les or an interactive REPL (read-evaluate-print-loop), similar to the functionality of existing server- side JavaScript interpreters. We intend to demonstrate the viability of using Rust to implement JavaScript by implementing a core subset of language features. To that end, we've tested our coverage using Google's Sputnik test suite, an ECMAScript 5 conformance test suite.

Report and Poster

26. Middleware for Supporting “Big Data” Analytics across a Database Cluster
Members: Daniel Salowe, Shayan Patel, Sahil Shah
Advisor: Zachary Ives

Abstract: The lack of affordable and open-source support for efficient, big data analytics across a relational database cluster is a problem in the technology industry. Specifically, there exists a need for an open-source distributed database middleware for off-the-shelf databases (MySQL). We approached the project by first deciding on the most important needs of a middleware addressing these issues. This project focuses on three main components, namely fault tolerance, distributed JOIN operations, and support for computationally intensive operations. By focusing on these three issues, we have developed an infrastructure that future developers can build upon, while also maintaining usefulness as a standalone middleware. The system detects node failures and automatically reroutes queries to replica nodes. Our distributed JOIN operation leverages nodes as computation resources to achieve efficient JOINs. The middleware works the best in situations where computation-heavy operations are to be applied on relevant data before or after the SQL operations. Our system distributes the computations as well as the query to work around issues such as memory constraints that are encountered on a single machine. Our work has led to a system that is accessible via a web user interface, allowing users to define arbitrary computations and run queries on large amounts of data.

Report and Poster

27. Daruma: Regaining Trust in Cloud Storage
Members: Doron Shapiro, Michelle Socher, Ray Lei, Sudarshan Muralidhar
Advisor: Boon Thau Loo, Nadia Heninger

Abstract: Currently, cloud storage services are used by consumers for a wide variety of important documents, including family photos, healthcare information and proprietary corporate data. These services all make promises about their storage solutions, usually including some guarantees of confidentiality, integrity, and availability. However, downtime is a fact of life for cloud services and, for better or worse, many providers openly admit to being able to access customer files for purposes ranging from analytics to law enforcement. Daruma solves this problem by eliminating the need to trust any cloud provider.We run no servers ourselves - instead, we combine and secure the space on cloud services already used by consumers (like Dropbox and Google) with advanced cryptographic and redundancy algorithms. Our system provides a simple guarantee: no one cloud service provider can read, change, or delete your files - ever. Daruma feels just like an existing service - there are no extra passwords to remember or frustrating workflows to navigate. Daruma handles the complexities of security and reliability for users, allowing them to confidently utilize cloud storage without worrying about their previously inherent risks.

Report and Poster

28. TorrentTrust: A Trust-Based, Decentralized Object Reputation Network
Members: Ian Sibner, Evelyn Yeung, David Xu, Quanze Chen
Advisor: Andreas Haeberlen

Abstract: In this paper, we describe TorrentTrust, a decentralized object reputation system for peer-to-peer networks. Torrents are a popular target for spammers and hackers - an easy way to coax users into downloading and installing a profitable (for the hacker) piece of malware disguised as another file. Thus, determining the authenticity of a torrent has long been an issue. Many trackers use upvote/downvote systems, or allow users to verify a torrent, but bad actors can easily verify their own malicious con- tent. Also, these systems are totally centralized, creating a single point of attack for adversaries.

We researched a system called Credence (Walsh and Sirer, 2005), which was used to rank objects on the Gnutella network, and extended it to provide stronger security. The resulting system, TorrentTrust, verifies torrents based on trust relationships between users in a totally decentralized way - making it much more difficult for bad actors to promote malicious content.

TorrentTrust is a layer on top of the BitTorrent filesharing network where users can determine authenticity of content through voting and establishing trust with other users. Although similar to Credence, we show through simulation and analysis that it is more resistant to certain network attacks.

Report and Poster

29. EZPark - Automated Parking Garage Payment System
Members: Constanza Figuerola, Samyukta Lanka, Utkarsh Shah, Max Tromanhauser
Advisor: Boon Thau Loo

Abstract: Today, garage parking systems are annoying, expensive and needlessly complicated for both garages and drivers. Drivers are forced to carry around easy-to-lose tickets, wait in parking lines (especially at large events), and must manually use their credit cards. Garages have to purchase and maintain expensive payment machines, employ parking staff, and lack useful analytics.

The EZPark system was designed to solve these issues. Drivers sign up once in order to easily park in every garage in our system. When a driver enters a garage, cameras send images of the car to a central server which recognizes the plate number and keeps track of how long they stay in the garage. Upon exiting, the associated users credit card is automatically charged and the parking gate is opened. All of this data is aggregated and available to the user and garage. This solution provides convenience and ease of use for drivers, while also simplifying the parking system for garages.

Report and Poster

30. adpt: A Differentially Private Tool for Adaptive Data Analysis
Members: Benson Chen, Lingbin Cai, Jake Hart, and Dylan Sun
Advisor: Aaron Roth

Abstract: Statistical analysis forms the quantitative backbone for research across a variety of fields. Textbook statistics requires that hypotheses and methods be chosen prior to gathering data. Unfortunately, this is impractical; researchers instead choose statistical methods in response to the data to which those methods will be applied, a process known as adaptive data analysis. This allows researchers to iteratively design studies based upon intermediate results, but can lead to spurious conclusions because it invalidates assumptions underpinning statistics. We approach this issue both practically and theoretically. We apply current research in differential privacy in an implementation that mitigates the negative consequences of adaptive data analysis. Next, we investigate conditions in which our tool is effective and compare its performance to other, similarly-motivated approaches.



Report and Poster

31. Using Speech To Improve Software Development Productivity
Members: Brian Vander Schaaf, Neera Thavornvanit, Nillan Patel
Advisor: Chris Murphy

Abstract: Software development involves a lot more than coding. Readability and organization are vital features to any software project and are necessary at every step of the development process. However, tasks such as refactoring and file navigation take time and are therefore often not prioritized and even sometimes ignored. As a programmer, it can be frustrating and time-consuming to switch between using the keyboard and mouse, which is most common in activities to enhance readability and organization of code.

We created a plugin for Sublime Text, a text editor for programmers we were familiar with, to help alleviate code management issues and make engineers more efficient by allowing them to use their voice to interact with the text editor. Users surveyed confirmed the popularity of Sublime and helped us determine the importance of various aspects of the plugin. Our plugin has twenty-eight commands that can be compounded and executed with a simple voice query, e.g. "select all, copy, comment, go to the next file, paste and indent, save all, and close the window." Users can also customize commands to their way of speaking or save a longer, compounded command with a shorter query. We performed A/B testing to evaluate our plugin's effectiveness in increasing productivity, and found that the majority of users were faster at code management tasks with our plugin than without.

Report and Poster

32. BARK: BOINC Volunteer Cluster Manager For Apache Spark
Members: Thomas Delacour, Sedem Fialor, John Weir,
Advisor: Boon Thau Loo

Abstract: Fortune magazine estimates that Apple sold an average of more than 5 million Macs per quarter in 2015. The PC market is even larger. Most of these devices boast impressive hardware specifications. However, the use cases of an average computer owner's machine (browsing the web, text editing, etcetera) suggest that the vast majority go unused. Given the abundance of untapped computational resources, it seems counterintuitive that developers depend so heavily on cloud giants to run their servers and perform their distributed computations.

BARK presents a tailored application of BOINC (Berkeley Open Infrastructure for Network Computing), a framework for volunteer computing, to Apache Spark, the increasingly popular big data processing tool. Given a Spark job and a pool of volunteer computers anything from dedicated Linux workhorses to personal laptops BARK will set up a Spark cluster, exclusively comprised of volunteer nodes, and execute the job across it. In this way, BARK offers a free alternative to for charge cloud services for end users interested in performing MapReduce computations. BARK is an example of a style of application that may become increasingly prevalent as the more fluid flow of computational resources between participants in peer-to-peer networks becomes normalized.

Report and Poster

33. SmokeSignals: A Distributed Key-Value Store on a Mobile Network
Members: Charles Cobb, Meyer Kizner, Xiuruo Zhang
Advisor: Boon Thau Loo

Abstract: Internet enabled smartphones are a ubiquitous part of modern life, but they depend critically on centralized infrastructure to access the network. This infrastructure is generally reliable, but has several shortcomings. First, large gatherings like sporting events or concerts often overwhelm the limited network capabilities in an area. Second, during a disaster such as a hurricane or terrorist attack, networks often fail completely when communication needs are most critical. Lastly, users may prefer a surveillance resistant decentralized network for sensitive communications. Our proposed solution is a decentralized peer-to-peer key value store replicated across mobile devices. It should: synchronize with nearby devices when they are in range on a best effort basis, propagate changes across the network even as nodes move and sever existing links, or leave the network entirely, provide a general API allowing developers to easily build a variety of peer-to-peer applications using our software

Report and Poster