Project Ideas - CIS 400/401

First, some thoughts on selecting a good project (see also the introductory slides which Insup has/will present on the first day of class):
  • Projects fall into two major categories: (1) Research: These projects involve novelty in the technical dimension, and (2) Implementations: Applying existing technical techniques to achieve novel ends. Research projects typically have strong academic foundations, are addressing something novel and non-trivial, and are expected to produce a demo/proof-of-concept realization of their ideas. Implementations should involve technical challenges, but because they are more straightforward, we expect to see highly functioning, professional, and perhaps public-facing services/tools by the end of the course.
  • The homepage has links to previous year's completed projects. Browse prior projects for inspiration and to get a gauge of the appropriate scope.
  • Considering addressing a problem that has real societal impact (i.e., "Computing for Good"). Such projects are well-received by judges and likely to be personally fulfilling.
  • Be conscious of the proposal's scope. Students tend to over promise. We would prefer to see small problems solved elegantly rather than an incomplete final report.
  • If you are going to use machine learning in your project and if you do not feel comfortable enough with ML, please talk to one of the TA's first. An increasing number of projects use, or promise to use, machine learning at some stage, and many encounter unforseen difficulties due to poorly chosen features or not enough time.
  • Try to select a project that re-uses, or slightly extends, your core set of computer science skills. Trying to learn a handful of new languages/technologies is going to negatively impact project output (the basis of evaluation). Distinguish between personal and technical challenges; emphasize the latter.
  • Pick a topic you enjoy. This will make a year of work far more pleasant. Do not feel compelled to select one of the projects below. Approach professors with your own ideas, and failing that, Insup may be willing to advise your project directly.
  • Talk to the TAs and instructor about your potential topics. In this manner, potential issues can be raised before you've already written a formal proposal. In prior years, it has not been uncommon for project groups to have to rewrite their initial submission. Avoid this!
  • Disclaimer: Just because a project is listed below does not mean it is a perfect or even a "good" project. We have tried to solicit projects from a breadth of sources. Whoever submitted these ideas may not fully understand the technical challenges involved, or what is required of a Senior Design team.
Now, project ideas are presented below:

IDEA: Web-based High-School Genomics and Bioinformatics Curriculum for Teachers

CONTACT: Junhyong Kim and Kristin Field (Junhyong [at] sas [dot] upenn [dot] edu; kfield [at] seas [dot] upenn [dot] edu)

DESCRIPTION: Currently, there are many online teaching resources for self-learning at various academic levels. Here, we have the goal of generating an online curriculum resource for teaching advanced biology (i.e., genomics and bioinformatics) that will address the needs of high-school educators, enabling them to more easily incorporate non-traditional topics into their teaching. We envision a website comprised of teaching modules that can be used by the teachers to introduce a topic and generate discussions or more in-depth teaching. Each module will have video introductions, text, and web-based exercises and links. The challenge is to develop a platform to enable the creation of such high-school teaching tool. This project is funded by the Arthur Davis Vinding Foundation.

PRE-REQS: Experience with web programming, user interface design and implementation, media

IDEA: Systematic Evaluation of LLVM Optimizations

CONTACT: Steve Zdancewic (stevez [at] cis)

DESCRIPTION: > Modern compilers like clang implement a large collection of optimizations. However, it's unclear how such optimizations interact and how much they actually improve performance on modern computer architectures. This project's goal is to try to find out: develop techniques to experimentally validate the effectiveness of LLVM's compiler optimizations. Doing so requires solving many challenging systems problems, developing a modified LLVM compiler pipeline to automate testing, finding benchmark software, collecting performance data, and measuring and analyzing the results.

PRE-REQS: CIS 341 and/or CIS 371 and/or 380; strong background in C; familiarity with make, build systems, shell scripts; LLVM / Clang experience a plus.

IDEA: Language Support for Probabilistic Associative Memories

CONTACT: Steve Zdancewic (stevez [at] cis)

DESCRIPTION: > What would a programming language that natively supports "similarity" queries over structured data look like? Associative memories are addressable by _content_ not _address_, providing an alternative paradigm for data storage more akin to those found in some machine learning algorithms than the usual Von Neumann architectures we use today.

This project seeks to develop programming language support (type system and language features) that facilitates programming with probabilistic associative memories. Applications of such a language might include statistical modeling, approximate computing, and simulations.

Want to help design a new programming language with an interesting, novel computational model? Join this project!

PRE-REQS: > CIS 341 and/or CIS 552, STAT 430 or CIS 261; interest in functional programming, interpreters, and type systems; background in machine learning a plus.

IDEA: Citation System for Scientific Data

CONTACT: Susan Davidson (susan [at] cis) and Val Tannen (val [at] cis)

DESCRIPTION: Citation is an essential part of scientific publishing and, more generally, of scholarship. Now that so much scientific publishing involves data and takes place through a database rather than conventional journals, how is some part of a database to be cited? More generally, how should data stored in a repository that has complex internal structure and that is subject to change be cited?

The goal of this research is to develop a framework for data citation which takes into account the increasingly large number of possible citations; the need for citations to be both human and machine readable; and the need for citations to conform to various specifications and standards. A basic assumption is that citations must be generated, on the fly, from the database. The framework will be validated by a prototype system over a database of pharmacological data, IUPHAR. The system will automatically generate citations conforming to standards specified in a rule-based language.

The project will involve 1) creating XML views of IUPHAR (using existing software); 2) archiving the XML views (using existing software); 3) implementing a rule-base citation system; and 4) testing on IUPHAR.

PRE-REQS: Database experience (CIS 450/550, could be taken concurrently), Java; Python a plus.

IDEA: MEAM team -- "MoveIT"

CONTACT: Dr. Susan Duff (, Dr. Katherine Kuchenbecker (kuchenbe [at] seas)

DESCRIPTION: This project aims to create a system which captures available arm motion or muscle activation via accelerometers/Kinect or surface electromyography (EMG) then provides visual-auditory feedback for infants and children limited motion or weakness. These children may have sustained an injury to the brachial plexus at birth, have congenital weakness from another diagnosis, or have had tendon transfers to increase muscle power. The visual/auditory feedback can be in the form of a musical mobile, a toy, a video, animated images on a tablet or just music. The sponsor (Dr. Duff) has access to two different types of accelerometers, a Kinect system, an iPAD, musical mobiles, switch toys and surface EMG systems. A device or support system may be needed to encourage specific arm motion such as shoulder abduction or elbow flexion in participants. The support system could potentially be designed using a 3D printer or other system. This project stems from the experience of the sponsor who is a pediatric physical and occupational therapist with a doctorate in Movement Science. The sponsor has an appointment at Children's Hospital of Philadelphia and colleagues at Shriners Hospitals for Children in Philadelphia. The project's final deliverable will be a functional prototype of a system which will enhance motivation and movement in infants and young children through the provision of visual and/or auditory feedback.

PRE-REQS: Speak with contact points above.

IDEA: Enhanced 3D Printed Conductors

CONTACT: Andre DeHon (andre [at] seas)

DESCRIPTION: Inexpensive 3D printers make desktop fabrication accessible to consumers and hobbyist. By adding the ability to print conductive material alongside mechanical support material, we can print objects with integrated electromechanical functionality. Among other things, this allows us to print circuit boards and integrate circuitry into printed mechanical objects. Last year's 3DPI project identified key components for integrating the print of a colloidal silver conductor with a Makerbot Replicator 2. While an important step forward, there is considerable room for improvement beyond this initial proof of concept. In particular, additional engineering is needed to print viable circuits and support wiring in all three printed dimensions. This project looks to address these needs.

Project Design Objectives: reduce achievable trace width; print vias and Z-axis traces; develop design strategies and software for 3D-printed circuits; easy reloading of conductive material.

PRE-REQS: This project will benefit from a team with complementary skills and could include CSCI, CMPE, EE, SSE, and MEAM contributors. Specifically: embedded system programming and interfacing; mechanical design and fabrication; software development for processing, planning, and optimization of 3D print; experimentation and process optimization. Prior 3D and 3D printing experience not required; interest and willingness to learn will be sufficient.

IDEA: Title: Immersive Virtual Environments

CONTACT: C.J. Taylor (cjtaylor [at] cis)

DESCRIPTION: The goal of this project would be to leverage the capabilities of the recently released Occulus Rift virtual reality system to build a system that would allow a user to virtually explore a remote environment. This project would involve building models of remote spaces using RGB-D panoramas acquired with a Microsoft Kinect (next generation). The system would have to render in real-time photorealistic views of an environment (say the Furness library) based on the data available. Ideally we would like to be able to allow the user to virtually walk around the scene and to provide a compelling experience of presence.

PRE-REQS: Knowledge of OpenGL and graphics programming very helpful, Experience with Computer Vision very helpful, knowledge of C and/or C++

IDEA: Digital Sculpting

CONTACT: C.J. Taylor (cjtaylor [at] cis)

DESCRIPTION: This project seeks to leverage the capabilities of the recently released Occulus Rift immersive virtual reality system to build a system that could be used to design complex 3D objects. The idea would be to combine the virtual reality system with a handheld haptic feedback device like a WiiMote which would also be actively tracked in 3D. Given this input/output system the software would be responsible for constructing a virtual workspace and responding appropriately to the users input to provide a compelling, intuitive interface that would allow the user to rapidly create complex 3D forms which could be fabricated on a 3D printing system. The goal would be to improve on the current generation of CAD software packages which can be cumbersome and difficult to use particularly when one is constructing more complex free/form shapes.

PRE-REQS: Knowledge of OpenGL and graphics programming very helpful, knowledge of C and/or C++

IDEA: Deciding on the "Scope" of Columns in Tables from the Web

CONTACT: Zack Ives (zives [at] cis)

DESCRIPTION: > An increasing number of places (in industry, e.g., Google, and in research, e.g., at CMU) are interested in crawling the Web to find not just documents -- but facts. For instance, one might build a database of statistics on baseball players, a list of mountains by height and geo coordinate, etc.

A great source of such facts is, in fact, HTML tables from the Web. But a difficult question is how to take an arbitrary table from the Web and interpret what's in each column. For instance, "location" could be a geo code for a point, the nearest city, etc. Moreover, if we look at all of the locations in a table, we might decide that they are locations in Pennsylvania, or major international cities. We term this description the "semantic scope" of the column.

This research project seeks to evaluate and improve some of our proposed techniques for automatically detecting "semantic scope" of tables on the Web.

We want to automatically enrich structured data (tables) on the Web by detecting "semantic scopes" of table columns. Consider a real table in Wikipedia where one of its columns lists all county seats in Texas, including ambiguous place names such as "Athens" and "Paris". We wish to infer that the column is about Texas, using only entries in the table. This is useful in a variety of ways. For example, a visualization tool can indeed map "Paris" to a place in Texas rather than to the capital city of France. Detailed problem definition and proposed techniques can be found in our paper draft.

Current status: We have a draft manuscript describing in detail our proposed algorithms as well as some experimental results. We have developed both heuristic-based and probabilistic model-based approaches and have validated them in tables in the geographical data domain.

We would like to extend evaluation of our techniques to another domain such as sports. This involves three main components:

1) Construct an ontology (class hierarchy) describing relationships between players, teams, kinds of sports and so on, using public available data sources such as Freebase.

2) Extract from the Web a set of tables in this domain for evaluation.

3) Evaluate proposed techniques using the ontology and tables constructed.

PRE-REQS: This project requires C++ or Java coding skills and knowledge of probabilistic reasoning.

IDEA: Fault Tolerant Control via Transfer Learning

CONTACT: Eric Eaton (eeaton [at] cis). In addition to the faculty advisor, students will be able to collaborate with a larger team working on a related sponsored research project, which includes two Postdocs at Penn and teams at Olin College and Washington State University.

DESCRIPTION: It is critical that deployed robotic systems must continue to operate properly when one or more of its components fail. Most critically, behavior should degrade proportionally to the severity of the fault. In the case of fault-tolerant control, the ability to compensate for faults is often integrated into the control laws by design, supported by supervisory control and system monitoring.

This project would investigate a radically different approach to fault tolerant control based on transfer learning. When faced with a novel failure (such as a damaged joint, broken sensor, etc.), the system would automatically and rapidly learn a revised optimal control policy by building off its prior experience with other types of failures. Transfer learning could also be used to diagnose the specific failure, or even predict when failure is likely to occur. This resulting approach that uses transfer learning for fault-tolerant control could be applied to a number of dynamical systems, such as ground robots, mechanical arms, or aerial vehicles (quadrotors, etc.).

PRE-REQS: Students working on this project should have taken and done well at least one course in Artificial Intelligence, Machine Learning, Data Mining, Robotics, or advanced statistics. It would be helpful if at least one team member had a background in control theory. We intend for this project to yield academic publications and open-source code, which would be especially beneficial for students considering graduate school. The implementation will likely be in a combination of Java, C++, and MATLAB. This project would be appropriate for a team of 3-4 students.

IDEA: Lifelong Machine Learning

CONTACT: Eric Eaton (eeaton [at] cis). In addition to the faculty advisor, students will be able to collaborate with a larger team working on a related sponsored research project, which includes two Postdocs at Penn and teams at Olin College and Washington State University.

DESCRIPTION: Typical machine learning methods learn a model for a single problem, and then forget that model completely when they are applied to another data set. Consequently, these methods learn to solve each problem in isolation, and require substantial data to learn each new model. In contrast, humans learn by continuously building upon and refining our knowledge over a lifetime of experience. This process of lifelong learning is a key characteristic of human intelligence, and it allows us to develop a wide variety of complex abilities across many domains and adapt effectively to changes in our environment.

This research project will focus on developing lifelong learning methods for intelligent systems, building upon our successful Efficient Lifelong Learning Algorithm (Ruvolo & Eaton, ICML'13; Bou Ammar, Eaton et al., ICML'14). The development of lifelong machine learning has the potential to enable a new class of learning systems capable of learning a diverse set of skills over time, adapting those skills as needed to new challenges. The overall goal of this work is to enable lifelong learning systems that can (1) rapidly acquire new models by building upon previous learned knowledge, (2) scale effectively to thousands of tasks, exhibiting versatility, (3) self-direct its learning by choosing what it should learn next, and (4) collaborate effectively with humans and other agents. There are a variety of open problems toward achieving each of these goals, ranging from core issues in statistical machine learning and optimization, to mechanisms for interaction between the learner and human users. This project would apply lifelong learning methods to problems in robotics, multi-agent search and rescue, object recognition, or environmental sustainability.

PRE-REQS: Students working on this project should have taken and done well at least one course in Artificial Intelligence, Machine Learning, Data Mining, Robotics, Computer Vision, or advanced statistics. We intend for this project to yield academic publications and open-source code, which would be especially beneficial for students considering graduate school. The implementation will likely be in a combination of Java, C++, and MATLAB. This project would be appropriate for a team of 3-4 students.

IDEA: AutomataTutor: A tutoring system for students learning automata concepts

CONTACT: Graduate Advisors: Loris D'Antoni (lorisdan [at] cis), Faculty Advisor: Rajeev Alur (alur [at] cis)

DESCRIPTION: Automatatutor ( is an online tutoring system for helping students learning basic concepts in theory of computation such as finite automata, or regular expressions.

The tool currently supports features for grading DFAs (deterministic finite automata) and providing students with personalized feedback. The grading engine has been shown to be really effective (published in IJCAI13, major AI conference), and this Fall automata tutor will be used at Penn in CIS262 (and also at UIUC)!

We would like to add NFA (non-deterministic finite automata), and regular expression to automata tutor, and make the site more mobile-friendly (perhaps creating an app).

The following extensions are some examples of plausible senior-design projects:

- Identifying techniques for automatically grading NFAs and regular expressions,
- Identifying techniques for automatically generating feedback for such problems, and
- Evaluate the effectiveness of these technique.

- Add a way to input NFAs and regular expression to the tool (building on the code that already exists), and
- Implement the backend for grading and feedback generation (the backend is written in C#)

- migrating the tool to a mobile platform.

This software can potentially be used by students all around the world learning this course.

PRE-REQS: CIS262 or equivalent course. Object oriented programming (preferably C#), web programming and web design (good Javascript, perhaps a bit of Scala). Extra skills: basic knowledge of Android programming would be ideal.

IDEA: Side-Channel Resistant Cryptography on FPGAs

CONTACT: Andre DeHon (andre [at] seas) and Nadia Heninger (nadiah [at] cis)

DESCRIPTION: Attackers have demonstrated remarkable ability to use side-channel information (e.g. power consumption profile, RF emissions) to extract information (including the secret keys). How much can we reduce the rate of data leakage with careful hardware design? And at what cost?

The project goal is to implement cryptography routines with one or more protection schemes. Quantify resource and implementation costs and effectivenss at preventing information leakage. Protection against side-channel attacks is especially important for smart-card applications.

PRE-REQS: Digital Design (ESE170, 171), Discrete Math (CIS160), familiarity with electrical power issues and design styles (ESE215, ESE216, or ESE370), exposure to modern public-key cryptography a plus, but we figure you can pick it up as part of the project.

IDEA: Building plug-and-play data loggers for medical device coordination framework (MDCF)

CONTACT: Shaohui Wang (shaohui [at] seas), Insup Lee (lee [at] cis)

DESCRIPTION: A medical device data logger is any equipment that is directly or indirectly connected to one or a set of medical devices and collects run-time information about the operations of the devices. Collection of medical device data via data loggers is currently not regulated and under active research and development in both academia and in practice.

To improve the current practice of managing medical device data collection, it is ideal to be able to obtain medical device data in a more timely fashion, with much more detailed information, and to perform effective analysis based on the information. The medical device data logger project aims at providing a prototypical design as well as a reference implementation.

The expected result of the project is to provide a well-functioning data logging and replay system that integrates with the existing MDCF framework. In the data logging phase, medical device data are available through the MDCF framework and encrypted and stored in secure, external storage. In the data replay phase, recorded data are analyzed by established algorithms and proper data animations will be shown to adverse event analyzers for expedited analysis. Real-time plotting of recorded data would also be useful features depending on project team size and/or available time.

PRE-REQS: Java programming; data visualization; one script programming language (Python, shell, etc.). Familiarity with Android development is a plus.

IDEA: Advancing the type system of the Glasgow Haskell Compiler

CONTACT: Stephanie Weirich ( and Richard Eisenberg (

DESCRIPTION: The Glasgow Haskell Compiler is the world's playground for type system research. This is your opportunity to join in the fun. Richard and I have several suggestions for type system extensions for you to try, and we would also love to hear your thoughts about what you think is missing from statically-typed programming languages.

Although this project falls into the category of "pure research" (i.e. you will be doing something that no one has done before), it will give you experience with modifying and extending an open source industrial strength compiler ( Along the way, you will become a type system guru and a black-belt functional programmer.

PRE-REQS: Haskell experience (CIS 194 or CIS 552) required. CIS 341 and CIS 500 are helpful.