List of Current Senior Projects - 2014/2015

1. CloudSubmit: A Distributed Code Execution Service
Members: Lewis Ellis, Ashu Goel, Jeff Grimes, Max Scheiber
Advisor: Stephanie Weirich

Abstract: CloudSubmit is a distributed system that compiles, runs, tests, and grades arbitrary code segments based on a given set of input and output. CloudSubmit uses multiple workers to handle large numbers of simultaneous submissions during major events such as Google Code Jam, Top Coder contests, and homework submission deadlines for large classes.


2. Popping the Filter Bubble
Member: Ali Altaf, Dhruv Maheshwari, Dennis Sell, Hamza Qaiser
Advisor: Chris Callison-Burch

Abstract: The increasing personalization of the web, particularly social media, has placed users in a "filter bubble". Within this bubble users only see political content that reinforces their existing biases, while content that presents alternative views is filtered out. This proposal outlines a strategy for popping the filter bubble by modifying various existing user clustering, classification and Natural Language Processing (NLP) techniques. This project builds upon previous social network and NLP research to devise methods to cluster and classify not just users, but also news articles, which is a task not yet explored. Our project goal is to use these methods to collect articles on five specific and polarizing topics as examples and to show how they can be classified as belonging to different filter bubbles. Our project consists of using what we know about Twitter users (mainly their political leanings) to draw conclusions about the articles they post. The information we generate will allow users to both visualize their filter bubble and view popular opinions that fall outside of it, as well as provide informative visualizations of the Twitter political landscape.


3. Influence Maximization in Time Varying Networks
Members: Ishaan Nerurkar
Advisor: Victor Preciado

Abstract: The purpose of this project is to study the problem of targeting a few nodes in time varying social networks to maximize the spread of an idea. With the advent of social media and Big Data, this problem is of particular interest to political campaigns, non-profits, marketing agencies, and Internet advertisers, among many others. We seek to design an algorithm which considers networks structure and the dynamics of information spread for in uence maximization. In practice, an understanding of data is constantly updating, as a result, we consider environments in which the network topology varies over time, and ensure our solution is robust to these dynamics. Our algorithm approximates the problem of influence maximization under probabilistic models of information spread, for time varying networks.


4. Therapeutic Application for Dementia Patients and Caregivers
Members: Jonathan Chen, Jinesh Desai, Vishwa Patel
Advisor: Chris Murphy

Abstract: We are developing an application that will assist dementia patients in improving their cognitive functions and improving the quality of life of their caretakers in a variety of ways. We are exploring techniques that help dementia patients reinforce their memory, and provide features and solutions for the caretakers that leverage modern technology to aid them in their work. We hope to test our application with actual patients at the Penn Hospital and measure results from their usage.


5. LAKITU: Long-range Aerial Kinetic Immersive Telepresence Unit
Members: Vincent Do, Ian Longshore, David Mally, Sam Raper
Advisor: CJ Taylor

Abstract: At present, most telepresence systems available commercially are largely the same: they rely on large, high-definition displays that are immobile, cumbersome, and cost-prohibitive. Furthermore, they require dedicated real estate such as a desk or a conference room that is often under-utilized in most scenarios. Additionally, these telepresence systems are naturally stationary and static, and therefore provide a poor or limited sense of immersion for the operator, especially on systems that use lower-resolution displays. Coupled with a control scheme that is unintuitive, these telepresence systems are generally highly difficult to use and provide a frustrating experience for the user.

We therefore propose a novel telepresence system that has the potential to be more cost-effective, significantly more portable and mobile, and much more immersive and intuitive for the end-user. By coupling an Oculus Rift VR Headset with a quadrotor equipped with a high-definition, motion-stabilized camera, we will create a more effective, more intuitive telepresence system that provides a highly-immersive, realistic experience.


6. Visual Programming Language for iOS
Members: Mawunyo Akabua, Shadia Al-Shafei, Joshua Rojas
Advisors: Chris Murphy

Abstract: The goal of this project is to build a Visual Programming Language (VPL) that is compatible with iOS devices in order to raise and stimulate an interest in computer science and programming in children. This goal will be accomplish through a web based graphical user interface (GUI) where users will interact with the VPL, and an associated iOS application where users will be able to run their code. The application will have two parts: a web component and a phone component. The web component is where users will be able to use our VPL with a GUI based integrated development environment (IDE) to create their projects and upload them to their user accounts. The phone component will be where users will log in and download their projects from their user accounts. The iOS application will then be able to construct the user-created application and the user will be able to use it on their device. It is hoped that the finished product will mimic the same success that has been seen by other VPLs, such as Scratch or App Inventor, and that our target users will be imparted with a positive view of computer science and programming. Related research studies have shown the effectiveness of VPLs with children in both generating interest in programming and in general performance with programming. Unlike existing VPLs, which are either entirely web-based or Android-based, this project aims to be compatible with iOS-specific features such as the accelerometer and camera.


7. A Decentralized Reputation System
Members: Matthew Buechler, Manosai Eerabathini, Chris Hockenbrocht, Defu Wan
Advisor: Jonathan Smith

Abstract: Cryptocurrencies such as Bitcoin have gained a lot of traction in the past few years as they are touted as an anonymous, safe, and secure way to pay electronically with low transaction fees. Our goal in this project is to make these technologies better by improving the transaction market for cryptocurrencies. After analyzing the current landscape, we believe that a reputation system is needed. A reputation system will facilitate honest trade in the network by looking at the transaction history of both parties. Our implementation of reputation will facilitate transactions by enabling reputation to be tracked and shared across any cryptocurrency network. It will also enable extensible and decentralized marketplace development by allowing developers to integrate this critical element in any application with our preexisting methods and infractructure.

In order to accomplish our goal, we will be using publicly available transaction information in the Bitcoin network and leveraging the security of the Bitcoin network to facilitate Counterparty's smart-contract system to develop and implement reputation. Our reputation system will mainly draw from the wealth of data already available in the Bitcoin blockchain. Because every single transaction is publicly viewable, we want to analyze these transactions for behavior that either signify a positive or negative reputation. First we introduce net flow and net flow convergence as local-global graph properties. We want to develop local models of rate of net flow convergence to form a control for comparison. Combining this local property with other information, we use a scoring function to generate a score for a party. While we do not predict that scoring will be perfectly consistent, we attempt to show that it is a reliable measure of reputation. By the same token, we do not think that a reputation system needs to be perfect in order to be beneficial. Our most important goals are to design a system that is fast, easy to use, and accurate enough for any transaction network online.


8. College Depression Rankings from Twitter Data
Members: Ashwin Baweja, Jason Kong, Tommy Pan Fang, Yaou Wang
Advisor: Chris Callison-Burch, H. Andrew Schwartz

Abstract: College rankings are playing an increasingly in influential role through the rise of social media and viral sharing. Simultaneously, mental health has risen to the forefront of university discussions in light of calls for increased mental illness awareness. Previous attempts at formulating rankings of schools' happiness and mental illness have centered around paper or electronic surveys taken by only a small fraction of the student body. We posit a new methodology for constructing college rankings through analysis of the language used by university students through social media platforms. We leverage existing research into depression language analysis and couple it with a novel problem domain and approach to constructing a dataset of college student tweets in order to identify depression scores for students attending each university. We then aggregate these scores on a per-university level in order to derive a set of meaningful rankings comparing depression among schools. The end product includes a set of rankings with accompanying quantitative analysis, visual representations of the analysis with word clouds, and a Twitter bot that will report on users' predicted depression through language analysis of their tweets.


9. Route Planning For Distribution of Goods
Members: Nick Meyer, Geoff Vedernikoff, Clara Wu
Advisor: Sanjeev Khanna

Abstract: As drones become more ubiquitous, there is a need for algorithms to autonomously and efficiently determine the paths they take. We propose to develop novel extensions to existing literature to in real time determine flight paths for multiple drones that need to deliver a variety of products in response to stochastic demand across some area. To this end we will extend solutions to the k-DTRP to account for this variety in demand, while considering theory from related problems including the k-server problem.


10. Interpolating Depth Panoramas Captured With Kinect for the Oculus Rift
Members: Mayank Gupta, Rafe Kettler, Kai Ninomiya, Boyang Niu
Advisor: CJ Taylor

Abstract: We are in the process of developing and implementing a system by which a series of color/depth (RGB-D) panoramas, captured at discrete points in an environment, can be used to render in real-time continuous perspectives of that environment from new positions that were not originally captured. RGB-D panoramas are being obtained via an automated system utilizing Microsoft Kinect. The rendering is then displayed using Oculus Rift, allowing the user to move through the virtual space and see an accurate and convincing representation of the originally-captured world.

Our contributions include a real-time algorithm for the continuous interpolation of perspective representations from discrete depth and color panoramas, display and interface via Oculus Rift. In addition, we design and implement a robotically automated data collection process. The robot is entirely built from inexpensive, consumer-grade hardware.


11. Verification of System FC in Coq
Members: Tiernan Garsys, Tayler Mandel, Lucas Pena, Noam Zilberstein
Advisors: Stephanie Weirich, Richard Eisenberg

Abstract: Haskell's compiler, the Glasgow Haskell Compiler (GHC), generates code in GHC Core. The Coq proof assistant will be used to verify a formalized version of System FC, the basis for GHC Core. A translation from the formal language to GHC Core, the concrete implementation of System FC that is used in GHC, will then be proven. The goal of verification is to prove that the evaluation semantics of System FC are sound.

There are two main benefits to this project. First, the verification would provide assurance regarding the safety and accuracy of GHC. Second, and perhaps more importantly, it will provide foundation to verify other properties of GHC such as compiler optimizations.


12. Cash tracking and Budgeting Application
Members: Shaun Ayrton, Raul Zablah
Advisor: Chris Murphy

Abstract: The purpose of this document is three-fold. First, it exposes the current market need for personal finances management in emerging economies. Second, it describes the existing technologies; analyzing their core functionality as well as identifying their market positioning. Third, it presents a solution to the existing market need, through a description of the system and its foreseen implementation. It explores the technical functionality of the proposed product, breaking down the innovation process through which multiple existing technologies were combined in order to better meet the existing market need. The performance of the system is evaluated and then used to assess the efficacy of its implementation.


13. Automatic Harmonization Viewed as a Machine Translation Problem
Members: Nicole Limtiaco, Rigel Swavely
Advisor: Chris Callison-Burch

Abstract: In the past, approaches to automatic harmonization of a melody have taken two forms: using rules provided by music theory or by predicting the chord under the melody. This attempt approaches the problem from a machine translation perspective, modeling the melody of a song as the source language and each harmony as a target language. By generating the harmony lines explicitly rather that generating them as a consequence of the chord on each beat, we hope that the algorithm will be able to create more cohesive, creative works.

In order to accomplish this, we use a translation model to represent the probability of one note harmonizing with another note. Additionally, we use a language model to represent a note following a phrase of notes in the same line of music. A perplexity metric is used to evaluate the generated models. If there is time, we hope to incorporate human evaluation and data driven rhythmic variation.


14. Using Bluetooth Monitoring to Create Physical Networks
Member: Chris Beyer, Mark Davis, Daniel Langer
Advisor: Victor Preciado

Abstract: As software becomes more and more integrated into our daily lives, there is a higher demand to turn digital information into physical discoveries. One of the areas where this is becoming increasingly prevalent is in social networks and graphs. Our lives are connected to others' through social media outlets and applications, and we have made significant advances in network and graph theory. We want to apply our knowledge of social networks in order to understand how humans connect at a physical level.

Humanity has reached a point where nearly everyone has a connected device on them: a phone, tablet, or smartwatch. This, paired with advances in Bluetooth technology, provides a perfect opportunity to make advances in physical networks. This project aims to explore these physical networks, and provide a way to intuitively track and analyze the way humans interact with one another. This work can be applied in a plethora of practical settings including hospitals and other medical institutions. These types of associations would use this technology to track interactions between users and use the findings to increase efficiency and ensure safety.


15. Programming Language Support for Autoassociative Memory
Members: Kevin Lu, Fan Yin, Yukuan Zhang
Advisor: Steve Zdancewic

Abstract: Our goal is to create a programming language library that provides an interface for programming with an autoassociative memory. Common data structures such as arrays or hashmaps use memory addresses or exact key lookups for retrieving entries. This is very different from human memory, which essentially functions as a less exact, heuristic based memory. Analagously, an autoassociative memory receives an input and returns a value similar to the input. Like human memory, the input does not have to exactly match existing entries for values to be returned. This project provides an intuitive interface to this autoassociative memory to facilitate programming efficiency for the end user. It will primarily accomplish this by providing an easier way to handle structured input data for the user, as well as automatically managing user-defined types in memory.


16. Domain Adaptation of Word Embeddings for Cross-Lingual Word Sense Disambiguation
Members: Mitchell Stern
Advisor: Lyle Ungar

Abstract: The resolution of ambiguities in human language is essential for many tasks in the field of natural language processing. While conventional approaches typically aim to select the sense of a polysemous word from a manually collated ontology, widespread interest in multilingual applications such as machine translation has motivated a new formulation of the problem. In cross-lingual word sense disambiguation, words of interest in a source language are instead disambiguated via their translation into one or more target languages.

Current work relies primarily on the use of lexical features derived from a word's immediate context, an approach which necessarily discards global information. We investigate here the use of dense, real-valued vector word embeddings together with simpler lexical features, with the hope that the high-level syntactic and semantic behavior captured by such vector representations provides additional useful information for disambiguation. Moreover, as present techniques for the induction of vector word embeddings typically require large corpora, containing hundreds of millions of tokens or more, we also investigate here the adaptation of generic embeddings to more specific domains for which the quantity of text is more limited.


17. Anomaly Visualization
Members: Daniel Blank, Kelsey Duncombe-Smith, Daniel Reife, Lily Wang
Advisor: Chris Murphy

Abstract: The proliferation of technology in the late 20th and early 21st centuries has led to the collection of massive amounts of data, a fact which has both gains and costs associated with it. The data can be in the form of network packets, error logs, or usage information, any of which may indicate trends that will inform business decisions or reveal problem areas that must be addressed. However the typical quantity of data collected renders visual analysis infeasible, a problem that Comcast operators experience regularly when trying to maintain their systems. Not only must they sift through all of the information collected, they must infer meaning from it.


18. Platform for Evaluating Real-Time Resource Management Algorithms for Network Function Virtualization
Member: Razzi Abuissa, Alex Brashear, David Kim, Alex Lyons
Advisor: Linh Phan

Abstract: Network Functions Virtualization (NFV) is a novel, software-based approach to network infrastructure. The virtualization of network functions improves upon in exibility of classical network hardware in terms of design, implementation, and upgradeability. NFV leverages the advantages of virtualization technology to enable consolidation of network equipment onto industry-standard servers and distribute physical computing resources between multiple networked machines. One real-world application of NFV is the configuration of real-time network services that meet of end-to-end latency requirements. The integration of real-time constraints, however, raises the significant challenge of maintaining performance in spite of the overhead of virtualization. Recently, several reports have indicated that the combination of NFV and real-time-aware algorithms can still achieve promising results in real-world deployments.[2] In order to configure virtualized network functions while serving real-time constraints, Professor Linh Phan has developed new algorithms for managing computing and networking resources. These algorithms, however, have not yet been empirically evaluated.

Our research seeks to design and implement a new testing platform which simulates a service provider responding to customer requests for network services. The testing platform provides a framework for evaluating performance and relative metrics of different resource management algorithms. The service provider will use a resource management algorithm under test to provide services within real-time constraints. The service provider architecture consists of an orchestration layer and a cluster of machines that provide computing and networking resources. Our system serves as a validation tool for current resource management algorithms and the development of future algorithms for implementing network functions subject to time constraints. The final phase of our research is to collect data from running experimental trials, which will be used to evaluate real-time resource management algorithms.


19. Distributed Optimization
Members: Hamidhasan Ahmed, Justin Chiu
Advisors: Alejandro Ribeiro, Lyle Ungar

Abstract: Distributed optimization has become very popular recently, with the rise of 'big data' and large-scale machine learning. Most of the current literature has focused on coordinate descent due to its simplicity and effectiveness.

Distributed Dual Descent and ADMM takes two algorithms in optimization - dual descent and the alternating direction method of multipliers (ADMM) - and attempts to implement distributed versions of both for the following three families of functions: quadratic functions, linear class SVMs, and logistic regression. Dual descent and ADMM are algorithms that use multiple nodes and communicate between them; we experiment with utilizing asynchronous communication, so that the algorithm does not have to progress at the pace of the slowest node. The main focus of this project will be to provide a package that easily allows for optimization problems to be solved using dual descent or ADMM.


20. Detect cancerous skin modes through computer vision
Members: Abhishek Gadiraju, Sneha Keshwani, Elise Minkin
Advisors: Zachary Ives

Abstract: Early detection of skin cancer is crucial for patient survival. Often people take no initiative to have moles inspected by doctors because it is time and cost intensive. We propose creating a mobile application that uses image analysis to tell users if a mole is likely to be cancerous. This will incentivize at-risk users to seek medical attention before it is too late.

Our iOS application will be designed to have an intuitive interface on the mobile front end and a robust machine learning model on the back end that classifies new images as they are sent in and returns a response to the application in an acceptable amount of time. Our goal is to empower people by allowing them to perform self-tests after they notice a possibly suspicious mole anywhere on their body, as early detection is the key to a high survival rate in melanoma.


21. Penn Course Recommender Progress Report Specification
Member: Joseph Hong, Benjamin Gitles, Susan Greenberg, Paul Le Ster
Advisor: Zachary Ives

Abstract: Penn students must choose a small subset of the hundreds of courses offered each semester. The process to do so is often ambiguous, fragmented, and diffcult. Penn Course Recommender will aggregate a wide range of relevant variables and will generate a personalized list of recommended courses for each student. It will do so in a robust manner while being as simple and user-friendly as possible, requiring minimal input from the user.

This document outlines our progress so far on Penn Course Recommender and how we plan to proceed in the coming months. More specifically, it re ects our knowledge gained so far and what challenges we have encountered in the development to date, with in an depth look at the three main components of our system: the Requirements Graph, the Recommender System, and the Web Application.


22. Institutional Analytics for Retail Investors
Member: Fahim Abouelfadl, Christopher Holt, Eugene Yarovoi
Advisor: Aaron Roth

Abstract: Investment institutions have sophisticated methods and software for the systematic analysis of stocks, but ordinary individuals investing on their own behalf do not have access to the same tools. We first describe the field of stock analysis and some of the methods used to predict stock movements, in particular three areas: fundamental analysis, which seeks to establish the value of a stock based on the attributes of the underlying company; technical analysis, which values a stock based on price and volume movements; and sentiment analysis, which attempts to infer stock price movements based on the systematic examination of attitudes surrounding a stock. While investment institutions have spent considerable resources on developing software for all of these purposes, the complexity of these analyses means that retail (non-institutional) investors typically lack the knowledge, time, and capital to do the same. This creates an unfair situation in the stock market, where institutions are able to more quickly identify and exploit profit opportunities, while retail investors lag behind and never realize the same levels of profit.

We then propose a system to address the gap between institutional and retail investors. A system that performs a hybrid combination of technical, fundamental, and sentiment analysis, exposed to users through a convenient web interface, will allow individuals access to some portion of the same types of tools that investment institutions have at their disposal. The system works by allowing users to choose, in simple and understandable terms, the combination of factors to consider in the analysis, and produces reports summarizing the findings and recommendations to the user. In addition, while the currently proposed system is not explicitly designed for placing trades automatically, it consists of decision-making modules that could be incorporated into an automatic trading system.


23. Privacy Detection and Management with Facebook Photos
Member: Tae Kim, Jasmine Lee, Crystal Qin
Advisor: Chris Murphy

Abstract: As more and more people around the world gain access to the Internet, the user base of online social networks such as Facebook, Twitter, Google Plus, LinkedIn, and Instagram will continue to grow. As a result, personal data sharing and online content, particularly in the form of photographs, are becoming even more prominent. This influx of content has been raising a significant amount of privacy concerns, commonly discussed in the mainstream media. For many users of social networks, they are unaware of what data of theirs is accessible and to whom.

Our goal, therefore, is to improve the management of online privacy on social networking platforms, specifically on Facebook. To do so, we propose to design a web application that helps users discover photographs on Facebook in which they are not currently tagged in. The application will allow Facebook users to become more aware of their photo privacy and allow them to actively take action to reduce their privacy concerns.


24. The Learning Game
Member: Lakshaya Goel, Kevin Mu
Advisor: Stephen Lane

Abstract: The purpose of this project is to release a polished and flexible educational game with the ability to enhance students' willingness to learn and memorize material, specifically vocabulary. The main underlying concept is the focus on the learning as part of the game mechanics and as a tool in which to progress through the game, rather than the overarching sole 'game completion objective', making it stand out from its predecessors and competitors. The potential impact is to greatly increase time spent learning for students while providing an experience that is fun and engaging rather than a chore, and this will result in increased academic performance on standardized examinations, leading to higher college acceptance rates.


25. Operating Room Rescheduler
Member: Albert Shu, Indu Subbaraj
Advisor: Linh Phan, Ari Brooks, MD

Abstract: Numerous hospitals, including the Hospital of the University of Pennsylvania, suffer heavily from operating room scheduling inefficiencies. In many cases, the inefficiencies arise from failure to properly adjust operating room schedules in response to unanticipated events. Our goal is to develop a web application that adjusts operating room schedules systematically and efficiently in response to three specific events: surgeries exceeding their allotted time slot, surgeries completing earlier than their allotted time slot, and surgeries being cancelled. Given an initial operating room schedule, this web application will automatically suggest schedule adjustments based on the current status of a hospital's operating room system. The web application, along with the scheduling algorithm on which it is based, will ultimately contribute to the reduction of operating room inefficiencies and costs.


26. Fully Immersive Virtual Reality Golf Simulator (DMD Project)
Member: Joseph Tong
Advisor: Stephen Lane

Abstract: The idea of virtual golf simulators is not a new thing. With both casual and professional golfers constantly looking to improve their game, the existence of interactive golf simulators have been present for quite awhile. However, since the introduction of augmented reality gaming through the Microsoft Kinect, PlayStation Move, and Wii and the advent of Oculus Rift, there has not been a fully immersive virtual reality golf simulator created to this day.

As a relatively stationary sport, golf is the perfect sport that allows the user a fully immersive virtual experience with the current gaming technology available today. This project intends to use the Microsoft Kinect and Oculus Rift to create this immersive experience. The Microsoft Kinect will be the source of input that detects the users swing and position while the Oculus Rift will complete the visual immersion. This project intends to create a unique and novel virtual reality user experience that will be a significant improvement on the current existing golf simulators presently available.


27. Integration of Real Lighting Conditions into Interactive Augmented Reality Environments (DMD Project)
Members: Denys Bastov and Anton Bastov
Advisor: Scott White

Abstract: We are developing a system that will create visually seamless augmented reality experience by mimicking the illumination of the real environment inside the virtual environment, initialized at relatively the same spatial position.

With the growing interest in augmented reality and the introduction of new technologies that allow real-time integration of virtual objects into real environments the problem arises to make this integration seamless by minimizing the visual difference between the rendering of the virtual scenes and the real video/image footage. One of the key aspects that make the virtual objects stand out is the mismatch in the lighting conditions between the real and virtual environments.

There have been a lot of attempts of detecting the light sources in the real environments. Most of them require the use of calibration objects (light probes), usually spheres, with known light interaction properties (reflectivity, refraction, absorption, color, etc.) or some user interaction, e.g. manual selection of certain objects in the image to act like light probes. These approaches tend to have very accurate estimations of the positions of the light sources in the scene, but are difficult to use in the real life scenarios, given that regular users do not have light probes with known physical properties. Our approach requires no user interaction and relies solely on the sensory information provided by the Google Tango tablet.