Ideas - CIS 400/401
IDEA: Spam Detection for Wikipedia
PROFESSOR: Oleg Sokolsky
(sokolsky@cis) (also contact Andrew West, firstname.lastname@example.org)
DESCRIPTION: Student(s) will
work closely with a graduate student in developing an automatic
classifier for the detection of link-spam edits to Wikipedia. Work will
begin by examining a spam corpus -- from which student(s) will develop
a taxonomy of spam behaviors. From this, feature extraction will
identify predictive measures. These features will then be implemented
into a real-time edit processing infrastructure backed by a
Wikipedia spam is unique from other spam forms (i.e., email-based)
because it is common to see poor links which do not show commercial
intent -- but are posted for vanity, subject-skewing, etc.. Further,
features will likely pull not just from the URL destination (e.g.,
text-processing over the HTML), but also from the presentation of the
edit on the Wiki (where on the page? what is the description text?).
student(s) should be comfortable with Java programming, and have at
least an elementary knowledge of machine-learning. It is expected this
work will lead to publication -- and therefore may be most appropriate
for those intending to attend graduate school in CS or a related field.
IDEA: Senior Design Projects at
PROFESSOR: Boon Thau Loo
NetDB@Penn (http://netdb.cis.upenn.edu) has a
number of senior design projects suitable for undergraduates.
In previous years, student projects have resulted in
conference papers (in collaboration with doctoral students) at top
conferences such as CIDR'09, NDSS'10, and SIGMOD'10. This year, we are
particularly looking for students to develop various components of the
following three projects:
If interested, please contact Prof. Boon Thau
Loo for more details. In your email, please include your C.V.
IDEA: Advanced Telepresence using
Virtual Reality and a Humanoid Robot
PROFESSOR: Camillo J. Taylor
DESCRIPTION: The goal
of this work is to explore new ways for humans to operate advanced
humanoid robotic systems. More specifically the aim is to develop a
will allow a human user to virtually inhabit our newly acquired PR2
from Willow Garage. This system is sufficiently anthropomorphic to
allow us to consider
mapping the motions of a human operator directly onto the motions of
the head, base
and arms of the robot. The concept is to outfit the operator with a
virtual reality headset,
monitor his movements with a Vicon motion capture system and then map
motions onto the robot while relaying the video feeds from the robots
head camera back
to the head mounted display to create an immersive teleoperation
IDEA: Provenance Aware Scientific Workflow Systems
PROFESSOR: Susan Davidson
This is a perfect project for students interested in bioinformatics or computational biology.
The project involves developing technology for next-generation scientific workflow systems, which are "provenance-aware". Currently, scientific workflow systems maintain repositories of specifications (think of these as programs) that are searchable by keywords to enable component reuse. However, many systems are starting to maintain information about workflow executions as well, e.g. through provenance logs. By maintaining information about the sequence of module executions (processing steps) used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, the validity and reliability of data can be better understood and results be made reproducible.
Provenance-aware workflow systems will yield repositories of both workflow specifications and of the provenance graphs that represent their executions, and will enable a new paradigm for creating and correcting scientific analyses: Scientists who wish to perform new analyses may search workflow repositories to find specifications of interest to reuse or modify. They may also search provenance information to understand the meaning of a workflow, or to correct/debug an erroneous specification. Finding erroneous or suspect data, a user may then ask provenance queries to determine what downstream data might have been affected, or to understand how the process failed that led to creating the data.
The project will involve:
- Gaining a working knowledge of a scientific workflow system (Taverna);
- Understanding how provenance information is captured in the workflow system;
- Creating a database schema for managing workflow specfications and their associated executions (from which provenance is obtained);
- Populating the database to create a repository; and
- Creating a front-end to search the repository.
IDEA: Enhance the data exploration functionality of myExperiment.org
PROFESSOR: Susan Davidson
(susan@cis) (with Julia Stoyanovich (jstoy@cis))
This project is ideal for a student who is interested in data management, social systems, and/or bioinformatics. The goal of the project is to enhance the data exploration functionality of myExperiment.org, an open-source on-line collaborative platform for the sharing of scientific workflows and experimental plans. Scientific workflows are emerging as a state-of-the-art technology for in-silico experimentation in bioinformatics, and repositories such as that maintained by myExperiment.org play a crucial role in the wide-spread adoption of this technology.
The goal of the project is to develop new and effective data exploration techniques for myExperiment. In particular, the project will involve the following:
More information about myExperiment is available at myExperiment.org and http://rubyforge.org/projects/myexperiment.
For more information on proposed data exploration approaches, see our recent publication at http://www.cis.upenn.edu/~jstoy/documents/wands.pdf.
- Gaining a working knowledge of the myExperiment platform and its implementation (in Ruby)
- Understanding of the myExperiment dataset -- characteristics of users, their interactions, and the workflows they create
- Understanding data exploration approaches such as frequent itemset mining, clustering, and topic modeling, and implementing them in scope of the myExperiment framework
- Participating in the design and implementation of a user study that would test the effectiveness of methods in (3)