Project Ideas - CIS 400/401

IDEA: Lossless Floating Point Distillation or Multiprecision Summation

PROFESSOR: Andre DeHon

DESCRIPTION: Floating point works with limited precision (e.g. 55b mantissa in IEEE double-precision floats). As a result, floating-point calculations discard information and only approximate the true value of a calculation. Standard floating-point operations discard information every primitive operation (e.g. addition of 2 numbers). What would it take to sum a large number of floating-point values and obtain:

  • The perfect value (multiprecision)
  • The correctly rounded value as if we had performed a perfect summation then rounded the result? (distillation)
Project Design Objectives:
  • Determine the best (least area achieving highest performance) organization for an FPGA-based, lossless floating-point accumulator
  • Assess the resulting hardware costs
  • Determine if lossless floating-point accumulation is viable and economical with today's technology
Project Prerequisites:
  • Familiarity with digital design
  • HDL experience (VHDL or Verilog) a plus
  • Comfortable with software simulation
  • Will need to work out high-level organization issues with simulation in software (e.g. Java)
  • Familiarity with Floating Point (familiarity with computer arithmetic and willingness to learn details of floating point adequate)

IDEA: Fault-Secure Detector Capable Error-Correcting Codes

PROFESSOR: Andre DeHon

DESCRIPTION: Traditional Error-Correcting Codes (ECCs) are designed assuming the encoding and decoder logic is error free. As we scale to nanoscale logic, encoding and decoding logic may also experience transient upsets. We have identified properties of codes that allow them to tolerate encoder/decoder errors as well. This project helps identify codes that have these properties and/or implement ECC logic based on these codes.[Depending on student/team interest and expertise there are potentially separate projects here]:

  • Identifying codes
  • Implementing encoding/decoding logic based on these codes
Project Design Objectives:
  • Develop tools to assess the FSD property for a given code.
  • (option) Develop tools to find codes with specified FSD properties.
  • (option) Implement ECC logic for a particular FSD code on an FPGA
  • Quantify area costs
  • Validate robustness using fault injection
Project Prerequisites:
  • Programming expertise (able to write 2000-5000 lines of Java code)
  • Familiarity with Coding theory a plus (but not essential…willingness to learn adequate)
  • Familiarity with digital design including HDL experience (VHDL or Verilog)
  • Familiarity with FPGAs

IDEA: Extremely Fast Finite State Machines using Parallel Prefix

PROFESSOR: Andre DeHon

DESCRIPTION: While Finite State Machines (FSM) are normally described by serial operation, they can, in theory, be evaluated in parallel using a form of prefix computation. This project develops the tools and strategies to support highly parallel, extremely fast FSM evaluation for modern FPGAs. Using these implementations, we hope to understand what kinds of FSMs can practically be accelerated in this manner.

Project Design Objectives:

  • Develop practical strategies for parallel-prefix FSM implementation
  • Develop tools to map simple FSMs to FPGA-based parallel-prefix FSMs
  • Determine how size of FSMs (input size, number of states, next-state complexity) and acceleration drive implementation area
  • Determine the limits of practical FSM acceleration using these tools and models.
  • Demonstrate and quantify costs and speedups obtainable for sample designs implemented on a modern FPGA
Project Prerequisites:
  • Familiarity with digital design
  • HDL experience (VHDL or Verilog) a plus
  • Programming expertise (able to write 2000-5000 lines of Java code)
  • Familiarity with FPGAs

IDEA: Tagged Security Processor Prototype

PROFESSOR: Andre DeHon

DESCRIPTION: By adding rich metadata to the hardware representation for primitive data, we can improve the security and safety of computations. This can remove vulnerability to common security breaches (e.g. buffer overflow, stack bashing) and guarantee common errors (misusing an integer as a pointer) are cleanly caught and reported.This project will develop a VHDL-level elaboration of a simple tagged processor architecture and synthesize and optimize an efficient FPGA implementation.

Project Design Objectives:

  • Optimize key structures for efficiently computing on metadata tags in parallel with computation; quantify costs of structures and costs on runtime
  • Demonstrate safety and security advantages (how metadata prevents common security and safety errors)
Project Prerequisites:
  • Familiarity with digital design
  • HDL experience (VHDL or Verilog) a plus
  • Familiarity with conventional RISC processor architectures (e.g. CIS240, CIS371/372)
  • OS familiarity good (CIS 380/381)

IDEA: OCR-like Beetle ID Extraction

PROFESSOR: M. Ani Hsieh

DESCRIPTION: Animal behaviorists often need to detect and track small organisms across their habitat. This project would work within a larger initiative to monitor forked fungus beetles as they move and behave on their natural habitat (logs). Each beetle has been labeled with small tags. These tags each have a unique three letter code. Specifically this project would work on creating OCR-type software to identify/extract the codes on these tags from macro photos of these beetles.


IDEA: Geographic Projections of Non-Uniform 3D Objects

PROFESSOR: M. Ani Hsieh

DESCRIPTION: This project would consist of developing a system to create a Cartesian coordinate system for non-uniform three-dimensional logs. This is part of a larger initiative to track beetles across these logs. The logs have been gridded with nails every 10cm across their entire surface - specifically this project would work with photographs of these gridded logs and attempt to develop algorithms that would convert this network of labels into a 2D coordinate system so that observations of these beetles could be mapped in Geographic Information System software.


IDEA: Tree Rings

PROFESSOR: M. Ani Hsieh

DESCRIPTION: In the past decade a few computer software packages have been developed to count and catalog tree rings, however most of these packages focus on long term climatic patterns, many dendrochronologists and ecologist want to extract information about single year events. This project would be to use images of tree 'cookies' (horizontal cross sections) and cores to count, categorize, and measure tree rings and areas that are between each ring to help ecologists reconstruct yearly climate patterns. The challenges of this project would be considering multiple areas of cross sections and multiple cores sections to generate accurate and precise measurements of tree rings down to the tenth of the cm.


IDEA: Camouflaged Snakes

PROFESSOR: M. Ani Hsieh

DESCRIPTION: This project would develop a way to measure the crypsis of a snake (how camouflaged it is). Garter snakes (genus: Thamnophi) have unique color patterns that help them hide from predators. This has been an interest to ecologists for over half a century and yet no one has developed a way to precisely quantify such differences. This project would develop a system that could use images of pregnant and non-pregnant garter snakes and their respective backgrounds to test if pregnant females have scale patterns that would help them better evade visual predators.


IDEA: Automated Newt Analysis

PROFESSOR: Camillo J. Taylor

DESCRIPTION: The goal of this project would be to develop computer programs to automatically analyze pictures of red-spotted newts which can, surprisingly, be distinguished from one another by the unique coloration patterns on their backs. Many aspects of the study of Biology involve painstaking manual analysis to process data sets for further study. The goal here would be to provide automated tools to help speed up a process which takes far too long currently. This project will involve collaborating with actual scientists working in the field of evolutionary biology.


IDEA: The Fly

PROFESSOR: Camillo J. Taylor

DESCRIPTION: The goal of this project will be to develop new techniques for processing video imagery to enable a simulated flying platform to navigate successfully through an urban canyon. The project will make use of a simulated urban environment which can be previewed at the following URL (http://www.3drt.com/ - the Megacity model). This simulator allows us to produce the video imagery that a flying vehicle would see as it travels through the scene and to test out various control laws. Our hope is to develop new techniques based on ideas from insect vision systems and from statistical machine learning.


IDEA: Universal Data Synchronizer

PROFESSOR: Benjamin Pierce

DESCRIPTION: The goal of this is to build a data synchronizer for one or several real-world data format using some existing tools developed here at Penn:

  • Boomerang, a bidirectional programming language
  • Unison, a popular, open-source file synchronizer
Many of the details of the project -- e.g., what kinds of data the tool should handle -- are totally open. Here is one interesting possibile goal: a universal synchronizer for electronic calendar data capable of reconciling changes made to files in different formats (iCalendar and CSV) on different machines (laptop and PC).

This is primarily a programming project, although there will also be some interesting research questions once the intial tool is built. We imagine that the application would work by first converting the differently-formatted input files into a common format (using one direction of a Boomerang program), then reconciling the changes in these files using a generic synchronization algorithm we have already designed, and finally propagating the changes back to the original formats (using the other direction of the Boomerang program).

Much of the work would consist in writing the programs in Boomerang to convert between calendar data in the original calendar formats and a common "synchronization" format. However, there is also interesting work to be done on other parts of the system including:

  • A user interface for auditing the changes made by the synchronizer
  • Support for information hiding---e.g., in the calendar application, the ability to tag particular calendar entries as private and have them not be synchronized with the data on the other side
  • Extensions to the generic synchronization algorithm
  • Extensions to the framework to support synchronizing against data published via web services
  • Possibly extensions to interface with existing sync protocols, such as SyncML or Apple Sync Services.
The main prerequisite is mathematical maturity. This project is a part of a larger research project, and students can expect to work in close collaboration with Professor Pierce and PhD candidate Nate Foster. Additionally, it has the potential to have significant impact: Unison has many thousands of users and if the final result were incorporated into the codebase, it would be immediately available to a large number of users.
IDEA: Compiling Bidirectional Languages

PROFESSOR: Benjamin Pierce

DESCRIPTION: Most programs work in one direction: given an input, they compute an output. In a bidirectional language, however, programs can be operated in two modes: in the forward direction they map inputs to outputs, and in the reverse direction they map outputs back to inputs. These languages are useful for solving problems in a variety of applications where relationships between data needs to be maintained (e.g. data converters and synchronizers, databases, graphical user interfaces, compilers, system administration, and software engineering.)

The goal of this project is to design new syntax for describing bidirectional transformations based on the syntax of standard functional programming languages (e.g., F#, OCaml, and Haskell), and to compile it down to Boomerang (http://www.seas.upenn.edu/~harmony), a general-purpose bidirectional language developed here at Penn. Concretely, the project will entail building a compiler (parser, type checker, code generator) from scratch, as well as several end-to-end applications validating the approach. There are several ways that the project could be extended if time allows, including exploring optimizations, fancier data model (e.g., trees), a more efficient run-time system, and integration with existing tools (e.g., the Unison file synchronizer).

The main prerequisite is mathematical maturity. Prior familiarity with functional programming is not necessary (though it may be helpful). There is enough meat here that the project could productively be taken on by a team of two. The project will be co-advised by PhD candidate Nate Foster.


IDEA: Planning for a Human-Like Household Robot

PROFESSOR: Maxim Likhachev

DESCRIPTION: Research on human-like household robots has become an increasingly popular theme in robotics. Many companies and government agencies are highly interested in this research because they believe there is a vast market for household robots and a great potential to help older and disabled people in their everyday life. Unfortunately, household robots are still immature, and one of the main reasons for this is their inability to navigate in cluttered and dynamic indoor environments as well as to manipulate objects in a robust manner. In this project, a student will work on the design and implementation of planning algorithms that address these issues. Depending on the student's interests, the project will either be to develop algorithms for navigation or to develop algorithms for the manipulation of objects.

The project will require good knowledge of data structures, understanding and learning new methods for finding least-cost paths in graphs, and developing new ways to construct graphs. The implementation will require good working knowledge of C/C++ and Linux. The developed approach will be tested in a simulator as well as on a real human-like household robot developed and maintained by one of the very well-known and fast-growing companies developing edge-cutting robots.


IDEA: Autonomous Helicopter Landing

PROFESSOR: Maxim Likhachev

DESCRIPTION: Autonomous unmanned helicopters could be employed for a variety of tasks such as surveillance and mapping of urban environments, search-and-rescue operations in hazardous environments (e.g., radioactive sites), border control, scouting hostile areas, surveying national forests for emergencies such as fire, and gathering massive statistical data for various studies. One of the challenging problems in the research on autonomous unmanned helicopters is the problem of autonomous landing: the helicopter needs to autonomously choose a landing site that is free of obstacles, is free of people, is flat and is as close as possible to the desired location. The problem of selecting a landing site is a challenging and principal planning problem: an unmanned helicopter constructs an elevation map using its on-board sensors and then needs to analyze this elevation map in order to either select a good landing site or to fly to obtain more data to confirm the safety of the possible landing sites. The planning problem is where the helicopter should fly next and what landing site it should sense using its on-board sensors so as to land as quickly and safely as possible.

In this project, the student will work on one of the aspects of this problem. The student will closely collaborate with a graduate student working on some other aspect of the same problem. The project will require understanding of data structures and graph algorithms. The implementation will require good working knowledge of C/C++ and Linux. The developed approach will be tested in a simulator as well as on an actual unmanned helicopter that uses lasers and vision cameras for sensing.


IDEA: De novo discovery of regulatory triplets from gene expression data

PROFESSOR: Dr. Sridhar Hannenhalli

DESCRIPTION: Consider gene expression data - a matrix with N rows (genes) and M columns (samples). If two row vectors g(x) and g(y) are highly "similar" (measured by Pearson correlation, mutual information, partial correlation etc.) then they are considered "related". The notion of relatedness is represented as an edge in a graph where nodes are genes. These are undirected edges. For instance, if x is transcription factor and y is a gene then the direction of the edge can be deduced to be x -> y (although this may not be true always, anyways..). Now there are modifer enzymes, such as kinases that modify the x->y relation by modifying x. One idea to detect such modifier z for x->y was proposed recently. For a given z, partition the samples (columns) into low-z-expression and hi-z-expression. Estimate similarity-lo-z(x,y) between x and y using only the samples in low-z-expression and similarity-hi-z(x,y) between x and y using only the samples in hi-z-expression. If the z affects the x->y relation then exactly one of the two similarities should be high and the other should be low. By querying each candidate modifier z, this procedure can detect candidate potential modifier(s) z for x->y. The above procedure requires querying each z, and for a given z, requires arbitrary partitioning of the samples into high and lo.

The goal of this project is to develop statistical/computation method to partition the samples without prior knowledge of z. In other words, to find out a partition of samples, say S and S' such that similarity(x,y)|S is very different from similarity(x,y)|S'. We can then perhaps decide whether this partition is even significant before we bother to look for z that best differentiates S and S'.

An effective solution to this problem can be used to mine numerous gene expression data sets to discover regulatory triplet - (x, y, z), where the effect of x on y is mediated by a third gene z.


IDEA: Gene networks alignment

PROFESSOR: Dr. Sridhar Hannenhalli

DESCRIPTION: Consider a gene network with genes as nodes and edges as functional relationships between genes. If you look at the orthologous genes (gene related by speciation events) in another species, two problems arise: (1) The orthology mapping between the two speices may be ambiguous. Eg. gene x in one species may map to paralogs x1 and x2 (very similar gene copies) in the second species. (2) the edges may not exist because of lack of experimental data in the second species. A network alignment involves node-alignment and edge-alignment across species, where the nodes and edges have evolved through gene duplication and speciation.

The goal is to develop a model-based approach to network alignment problem that uses a probabilistic dependence structure between gene alignment and edge alignment and makes explicit use of phylogenetic relationships between the species.

An effective method to solve this problem has important application in studying how various biological networks have evolved through time.