|
Previous
Project
Ideas - CIS 400/401
IDEA: Lossless Floating Point
Distillation or Multiprecision Summation
PROFESSOR: Andre DeHon
DESCRIPTION: Floating point
works with limited precision (e.g. 55b mantissa in IEEE
double-precision floats). As a result, floating-point calculations
discard information and only approximate the true value of a
calculation. Standard floating-point operations discard information
every primitive operation (e.g. addition of 2 numbers). What would it
take to sum a large number of floating-point values and obtain:
- The perfect value (multiprecision)
- The correctly rounded value as if we had
performed a perfect summation then rounded the result? (distillation)
Project Design Objectives:
- Determine the best (least area achieving
highest performance) organization for an FPGA-based, lossless
floating-point accumulator
- Assess the resulting hardware costs
- Determine if lossless floating-point
accumulation is viable and economical with today's technology
Project Prerequisites:
- Familiarity with digital design
- HDL experience (VHDL or Verilog) a plus
- Comfortable with software simulation
- Will need to work out high-level organization
issues with simulation in software (e.g. Java)
- Familiarity with Floating Point (familiarity
with computer arithmetic and willingness to learn details of floating
point adequate)
IDEA: Fault-Secure Detector
Capable
Error-Correcting Codes
PROFESSOR: Andre DeHon
DESCRIPTION: Traditional
Error-Correcting Codes (ECCs) are designed assuming the encoding and
decoder logic is error free. As we scale to nanoscale logic, encoding
and decoding logic may also experience transient upsets. We have
identified properties of codes that allow them to tolerate
encoder/decoder errors as well. This project helps identify codes that
have these properties and/or implement ECC logic based on these
codes.[Depending on student/team interest and expertise there are
potentially separate projects here]:
- Identifying codes
- Implementing encoding/decoding logic based on
these codes
Project Design Objectives:
- Develop tools to assess the FSD property for a
given code.
- (option) Develop tools to find codes with
specified FSD properties.
- (option) Implement ECC logic for a particular
FSD code on an FPGA
- Quantify area costs
- Validate robustness using fault injection
Project Prerequisites:
- Programming expertise (able to write 2000-5000
lines of Java code)
- Familiarity with Coding theory a plus (but not
essential…willingness to learn
adequate)
- Familiarity with digital design including HDL
experience (VHDL or Verilog)
- Familiarity with FPGAs
IDEA: Extremely Fast Finite
State
Machines using Parallel Prefix
PROFESSOR: Andre DeHon
DESCRIPTION: While Finite
State Machines (FSM) are normally described by serial operation, they
can, in theory, be evaluated in parallel using a form of prefix
computation. This project develops the tools and strategies to support
highly parallel, extremely fast FSM evaluation for modern FPGAs. Using
these implementations, we hope to understand what kinds of FSMs can
practically be accelerated in this manner.
Project Design Objectives:
- Develop practical strategies for
parallel-prefix FSM implementation
- Develop tools to map simple FSMs to FPGA-based
parallel-prefix FSMs
- Determine how size of FSMs (input size, number
of states, next-state complexity) and acceleration drive implementation
area
- Determine the limits of practical FSM
acceleration using these tools and models.
- Demonstrate and quantify costs and speedups
obtainable for sample designs implemented on a modern FPGA
Project Prerequisites:
- Familiarity with digital design
- HDL experience (VHDL or Verilog) a plus
- Programming expertise (able to write 2000-5000
lines of Java code)
- Familiarity with FPGAs
IDEA: Tagged Security
Processor
Prototype
PROFESSOR: Andre DeHon
DESCRIPTION: By adding rich
metadata to the hardware representation for primitive data, we can
improve the security and safety of computations. This can remove
vulnerability to common security breaches (e.g. buffer overflow, stack
bashing) and guarantee common errors (misusing an integer as a pointer)
are cleanly caught and reported.This project will develop a VHDL-level
elaboration of a simple tagged processor architecture and synthesize
and optimize an efficient FPGA implementation.
Project Design Objectives:
- Optimize key structures for efficiently
computing on metadata tags in parallel with computation; quantify costs
of structures and costs on runtime
- Demonstrate safety and security advantages (how
metadata prevents common security and safety errors)
Project Prerequisites:
- Familiarity with digital design
- HDL experience (VHDL or Verilog) a plus
- Familiarity with conventional RISC processor
architectures (e.g. CIS240, CIS371/372)
- OS familiarity good (CIS 380/381)
IDEA: OCR-like Beetle ID
Extraction
PROFESSOR: M. Ani Hsieh
DESCRIPTION: Animal
behaviorists often need to detect and track small organisms across
their habitat. This project would work within a larger
initiative to monitor forked fungus beetles as they move and behave on
their natural habitat (logs). Each beetle has been labeled with small
tags. These tags each have a unique three letter code. Specifically
this project would work on creating OCR-type software to
identify/extract the codes on these tags from macro photos of these
beetles.
IDEA: Geographic Projections
of Non-Uniform 3D Objects
PROFESSOR: M. Ani Hsieh
DESCRIPTION: This project
would consist of developing a system to create a Cartesian coordinate
system for non-uniform three-dimensional logs.
This is part of a larger initiative to track beetles across these
logs. The logs have been gridded with nails every 10cm across their
entire surface - specifically this project would work with photographs
of these gridded logs and attempt to develop algorithms that would
convert this network of labels into a 2D coordinate system so that
observations of these beetles could be mapped in Geographic
Information System software.
IDEA: Tree Rings
PROFESSOR: M. Ani Hsieh
DESCRIPTION: In the past
decade a few computer software packages have been developed to count
and catalog tree rings, however most of these
packages focus on long term climatic patterns, many
dendrochronologists and ecologist want to extract information about
single year events. This project would be to use images of tree
'cookies' (horizontal cross sections) and cores to count, categorize,
and measure tree rings and areas that are between each ring to help
ecologists reconstruct yearly climate patterns. The challenges of
this project would be considering multiple areas of cross sections and
multiple cores sections to generate accurate and precise measurements
of tree rings down to the tenth of the cm.
IDEA: Camouflaged Snakes
PROFESSOR: M. Ani Hsieh
DESCRIPTION: This project
would develop a way to measure the crypsis of a snake (how camouflaged
it is). Garter snakes (genus: Thamnophi) have unique
color patterns that help them hide from predators. This has
been an interest to ecologists for over half a century and yet no one
has developed a way to precisely quantify such differences. This
project would develop a system that could use images of pregnant and
non-pregnant garter
snakes and their respective backgrounds to test if pregnant females
have scale patterns that would help them better evade visual predators.
IDEA: Automated Newt Analysis
PROFESSOR: Camillo J. Taylor
DESCRIPTION: The goal of this
project would be to develop computer programs to automatically analyze
pictures of red-spotted newts which can, surprisingly, be distinguished
from one another by the unique coloration patterns on their backs. Many
aspects of the study of Biology involve painstaking manual analysis to
process data sets for further study. The goal here would be to provide
automated tools to help speed up a process which takes far too long
currently. This project will involve collaborating with actual
scientists working in the field of evolutionary biology.
IDEA: The Fly
PROFESSOR: Camillo J. Taylor
DESCRIPTION: The goal of this
project will be to develop new techniques for processing video imagery
to enable a simulated flying platform to navigate successfully through
an urban canyon. The project will make use of a simulated urban
environment which can be previewed at the following URL (http://www.3drt.com/ -
the Megacity model). This simulator allows us to produce the video
imagery that a flying vehicle would see as it travels through the scene
and to test out various control laws. Our hope is to develop new
techniques based on ideas from insect vision systems and from
statistical machine learning.
IDEA: Universal Data
Synchronizer
PROFESSOR: Benjamin Pierce
DESCRIPTION: The goal of this
is to build a data synchronizer for one
or several real-world data format using some existing tools developed
here at Penn:
- Boomerang,
a bidirectional programming language
- Unison,
a popular, open-source file synchronizer
Many of the details of the project -- e.g., what kinds of data the tool
should handle -- are totally open. Here is one interesting possibile
goal: a universal synchronizer for electronic calendar data capable of
reconciling changes made to files in different formats (iCalendar and
CSV) on different machines (laptop and PC).
This is primarily a programming project, although
there will also be
some interesting research questions once the intial tool is built. We
imagine that the application would work by first converting the
differently-formatted input files into a common format (using one
direction of a Boomerang program), then reconciling the changes in
these files using a generic synchronization algorithm we have already
designed, and finally propagating the changes back to the original
formats (using the other direction of the Boomerang program).
Much of the work would consist in writing the
programs in Boomerang to
convert between calendar data in the original calendar formats and a
common "synchronization" format. However, there is also interesting
work to be done on other parts of the system including:
- A user interface for auditing the changes made
by the synchronizer
- Support for information hiding---e.g., in the
calendar application, the ability to tag particular calendar entries as
private and have them not be synchronized with the data on the other
side
- Extensions to the generic synchronization
algorithm
- Extensions to the framework to support
synchronizing against data published via web services
- Possibly extensions to interface with existing
sync protocols, such as SyncML or Apple Sync Services.
The main prerequisite is mathematical maturity. This project is a part
of a
larger research project, and students can expect to work in close
collaboration with Professor Pierce and PhD candidate Nate Foster.
Additionally, it has the potential to have significant impact: Unison
has many thousands of users and if the final result were incorporated
into the codebase, it would be immediately available to a large number
of
users.
IDEA: Compiling Bidirectional
Languages
PROFESSOR: Benjamin Pierce
DESCRIPTION: Most programs
work in one direction: given an input, they
compute an output. In a bidirectional language, however, programs can
be operated in two modes: in the forward direction they map inputs to
outputs, and in the reverse direction they map outputs back to
inputs. These languages are useful for solving problems in a variety
of applications where relationships between data needs to be
maintained (e.g. data converters and synchronizers, databases,
graphical user interfaces, compilers, system administration, and
software engineering.)
The goal of this project is to design new syntax
for describing
bidirectional transformations based on the syntax of standard
functional programming languages (e.g., F#, OCaml, and Haskell), and
to compile it down to Boomerang (http://www.seas.upenn.edu/~harmony),
a general-purpose bidirectional language developed here at
Penn. Concretely, the project will entail building a compiler (parser,
type checker, code generator) from scratch, as well as several
end-to-end applications validating the approach. There are several
ways that the project could be extended if time allows, including
exploring optimizations, fancier data model (e.g., trees), a more
efficient run-time system, and integration with existing tools (e.g.,
the Unison file synchronizer).
The main prerequisite is mathematical maturity.
Prior familiarity
with functional programming is not necessary (though it may be
helpful).
There is enough meat here that the project could productively be taken
on by a team of two. The project will be co-advised by PhD candidate
Nate Foster.
IDEA: Planning for a
Human-Like Household Robot
PROFESSOR: Maxim Likhachev
DESCRIPTION: Research on
human-like household robots has become an increasingly popular theme in
robotics. Many companies and government agencies are highly interested
in this research because they believe there is a vast market for
household robots and a great potential to help older and disabled
people in their everyday life. Unfortunately, household robots are
still immature, and one of the main reasons for this is their inability
to navigate in cluttered and dynamic indoor environments as well as to
manipulate objects in a robust manner. In this project, a student will
work on the design and implementation of planning algorithms that
address these issues. Depending on the student's interests, the project
will either be to develop algorithms for navigation or to develop
algorithms for the manipulation of objects.
The project will require good knowledge of data
structures,
understanding and learning new methods for finding least-cost paths in
graphs, and developing new ways to construct graphs. The implementation
will require good working knowledge of C/C++ and Linux. The developed
approach will be tested in a simulator as well as on a real human-like
household robot developed and maintained by one of the very well-known
and fast-growing companies developing edge-cutting robots.
IDEA: Autonomous Helicopter
Landing
PROFESSOR: Maxim Likhachev
DESCRIPTION: Autonomous
unmanned helicopters could be employed for a variety of tasks such as
surveillance and mapping of urban environments,
search-and-rescue operations in hazardous environments (e.g.,
radioactive sites), border control, scouting hostile areas, surveying
national forests for emergencies such as fire, and gathering massive
statistical data for various studies. One of the challenging problems
in the research on autonomous unmanned helicopters is the problem of
autonomous landing: the helicopter needs to autonomously choose a
landing site that is free of obstacles, is free of people, is flat and
is as close as possible to the desired location. The problem of
selecting a landing site is a challenging and principal planning
problem: an unmanned helicopter constructs an elevation map using its
on-board sensors and then needs to analyze this elevation map in order
to either select a good landing site or to fly to obtain more data to
confirm the safety of the possible landing sites. The planning problem
is where the helicopter should fly next and what landing site it should
sense using its on-board sensors so as to land as quickly and safely as
possible.
In this project, the student will work on one of
the aspects of this
problem. The student will closely collaborate with a graduate student
working on some other aspect of the same problem. The project will
require understanding of data structures and graph algorithms. The
implementation will require good working knowledge of C/C++ and Linux.
The developed approach will be tested in a simulator as well as on an
actual unmanned helicopter that uses lasers and vision cameras for
sensing.
IDEA: De novo discovery of
regulatory triplets from gene expression data
PROFESSOR: Dr. Sridhar
Hannenhalli
DESCRIPTION: Consider gene
expression data - a matrix with N rows (genes) and M columns (samples).
If two row vectors g(x) and g(y) are highly "similar" (measured by
Pearson correlation, mutual information, partial correlation etc.) then
they are considered "related". The notion of relatedness is represented
as an edge in a graph where nodes are genes. These are undirected
edges. For instance, if x is transcription factor and y is a gene then
the direction of the edge can be deduced to be x -> y (although
this may not be true always, anyways..). Now there are modifer enzymes,
such as kinases that modify the x->y relation by modifying x.
One idea to detect such modifier z for x->y was proposed
recently. For a given z, partition the samples (columns) into
low-z-expression and hi-z-expression. Estimate similarity-lo-z(x,y)
between x and y using only the samples in low-z-expression and
similarity-hi-z(x,y) between x and y using only the samples in
hi-z-expression. If the z affects the x->y relation then exactly
one of the two similarities should be high and the other should be low.
By querying each candidate modifier z, this procedure can detect
candidate potential modifier(s) z for x->y. The above procedure
requires querying each z, and for a given z, requires arbitrary
partitioning of the samples into high and lo.
The goal of this project is to develop
statistical/computation method to partition the samples without prior
knowledge of z. In other words, to find out a partition of samples, say
S and S' such that similarity(x,y)|S is very different from
similarity(x,y)|S'. We can then perhaps decide whether this partition
is even significant before we bother to look for z that best
differentiates S and S'.
An effective solution to this problem can be used
to mine numerous gene expression data sets to discover regulatory
triplet - (x, y, z), where the effect of x on y is mediated by a third
gene z.
IDEA: Gene networks alignment
PROFESSOR: Dr. Sridhar
Hannenhalli
DESCRIPTION: Consider a gene
network with genes as nodes and edges as functional relationships
between genes. If you look at the orthologous genes (gene related by
speciation events) in another species, two problems arise: (1) The
orthology mapping between the two speices may be ambiguous. Eg. gene x
in one species may map to paralogs x1 and x2 (very similar gene copies)
in the second species. (2) the edges may not exist because of lack of
experimental data in the second species. A network alignment involves
node-alignment and edge-alignment across species, where the nodes and
edges have evolved through gene duplication and speciation.
The goal is to develop a model-based approach to
network alignment problem that uses a probabilistic dependence
structure between gene alignment and edge alignment and makes explicit
use of phylogenetic relationships between the species.
An effective method to solve this problem has
important application in studying how various biological networks have
evolved through time.
|