I am a Post doc with the Penn Database Research Group,
where I work with Prof. S. Davidson.
Before going to Penn, I was a PhD student at the LRIbioinformatics group at Paris-South
University. Prof. C. Froidevaux was my advisor.
I defended on December, 9th 2005.
Research interests CV
SHARQ (Sharing Heterogenous and Autonomous Resources and Queries) aims to
develop generic tools and technologies for creating and maintaining confederations whose purpose is distributed data sharing that is, data cooperatives.
SHARQ is a collaborative work with two biological partners: the Computational Biology and Informatics Laboratory, leaded by Chris Stoeckert, and the Pew project group leaded by Pete
White from the Children Hospital of Philadelphia. We propose to develop a specific data cooperative as a biological testbed for evaluating the proposed technologies.
In this project, I am working on the SHARQ Guide which is therefore being designed to enable biologists to find
relevant information within a peer data management system. It provides assistance not only for users who ask queries, but also
for owners of peers who wish to be registered within the Guide. This work is closely related to my work on BioGuide (see below).
More information
is available here.
BioGuide is a collaborative work between the LRIbioinformatics group at Paris-South
University and the database group at UPenn. Have a look at bioguide-project.net!
BioGuide extends DSS (see below) to be adapted to many user profiles.
I have elaborated a questionnaire and performed interviews
of 20 scientists from various domains (cancer study, annotation project, ...)
to evaluate their needs in the process of querying.
In collaboration with C. Froidevaux and S. Davidson,
I have designed BioGuide, a generic framework to guide the users
to select the relevant sources to be queried and the tools to be used
according to their preferences (e.g., the reliability level of
the sources) and following their querying strategies.
The biological significance of the results obtained with BioGuide has been shown
in the context of Comparative Genomic Hybridization (CGH) analysis performed at
the Curie Institute.
I have developed the BioGuide system in JAVA (applet) with the help of Olivier Biton.
BioGuide is available for use.
The system is very flexible and can be adapted to any biological domain.
I have recently developped a module to use BioGuide on top of the SRS
system. BioGuideSRS provides acess to instances of data!
Have a look at BioGuide-project.net!
Provenance for scientific workflow systems
This project aims to provide a formal model of provenance for scientific workflows which is
both simple and general (i.e. can be used with existing workflow systems, such as Ptolemy/Kepler and myGrid)
and sufficiently expressive to answer the provenance queries encountered in case studies.
Interestingly, the proposed model not only takes into account the chained and complex structures of scientific workflows,
but also allows for reasoning about provenance at different levels of abstraction through user views.
In the context of this project, I have participated with other members of the UPenn Db group, to the "Provenance Challenge". More information can be found here.
HKIS is a
European research & development project between five partners:
ISoft Company (Gif-Sur-Yvette, France), Curie Institute (Paris, France),
University of Medecine of Ulm (Ulm, Germany), European Institute of Oncology (Milano, Italy)
and University of Paris-South (Orsay, France).
HKIS was a central component of my dissertation.
This project aims at developing an integrative software platform for biological and biomedical data processing in oncology. I have contributed to HKIS as follows:
I have participated in the collection of the user requirements
(sources and tools accessed, bioinformatics tasks performed by the partners). I have designed
a framework to represent the HKIS analysis scenarios. Each scenario reflects
the way a HKIS users manage their data.
I have developed the integrating schema of the HKIS platform
by making explicit the biological entities contained in the sources selected by the partners and by
capturing important metadata from sources (e.g. reliability of an entity in a source).
I have designed the DSS (Data Source Selection) algorithm which aims at guiding the HKIS users
in the task of selecting data sources. DSS provides
the user with alternative ways of finding data in the sources: it allows
the user to exploit complementary information and is a guide
to deal with divergent data.
This algorithm has been developed following the HKIS users process of querying.
Collaborative work between members of the LRI Bioinformatics group.
This data integration project is based on the
the Picsel project, an innovative mediator about to be an industrial product.
In Picsel, the language used to express queries and describe
the sources (a description logic) is very simple and can be easily understood by end-users.
Our aim is to exploit the capabilities of Picsel in the context of biological data and
to propose an extended mediator system allowing both transparent
querying (as usual) and cooperative answering process (which meets specific biologist requirements).
We address the problem of expressing and answering cooperative queries,
keeping a tractable logical framework.
We provide the users with the possibility of specifying properties on the sources (metadata) they
would like to access and our proposal enables to trace the origins of the answers got.
Development of WInGS, a local data warehouse dedicated to yeast.
Development of Genopage, a database of proteins modules encoded by completely sequenced genomes.
Development under PostgreSQL, ProC, PHP.
Selected Publications
* indicates that I gave the corresponding conference presentation.
International peer-reviewed journals
[1] Sarah Cohen-Boulakia, Olivier Biton, Susan Davidson, Christine Froidevaux.
BioGuideSRS: Querying Multiple Sources with a user-centric perspective. In To appear in Bioinformatics, Application Notes., 2007.
[2] Sarah Cohen-Boulakia, Susan Davidson, Christine Froidevaux, Zoe Lacroix, and Maria-Esther Vidal.
Path-based systems to guide scientists in the maze of biological data sources. In Journal of Bioinformatics and Computational Biology (JBCB), Oct. 2006, 4(5), pp. 1069-95.
[3] Frederique Lisacek, Sarah Cohen-Boulakia, and Ron D. Appel.
Proteome informatics II. Bioinformatics for comparative proteomics. In Proteomics, 2006 (To appear).
[4] Sarah Cohen-Boulakia, Séverine Lair, Nicolas Stransky, Stéphane Graziani, François Radvanyi, Emmanuel Barillot and Christine Froidevaux.
Selecting biomedical data sources according to user preferences.*
In Bioinformatics, 20(1):i86-i93, Special number, Proceedings of ISMB/ECCB 2004, Glasgow, UK, 2004.
International peer-reviewed conferences
[5] Shirley Cohen, Sarah Cohen-Boulakia and Susan Davidson.
Towards a Model of Scientific workflows and User Views.* Proceedings of DILS'06, Data Integration for the Life Sciences, Springer-Verlag,
Lecture Notes in Bioinformatics (LNBI), Cambridge, UK, 2006.
[6] Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Zachary Ives, Val Tannen
and Susan Davidson.
SHARQ Guide: Finding relevant biological data and queries in a
peer data management system.* Poster (Selected for oral presentation), DILS'06, Data Integration for the Life Sciences, Cambridge, UK, 2006.
[7] Sarah Cohen-Boulakia, Christine Froidevaux and Emmanuel Pietriga.
Selecting Biological Data Sources and Tools with XPR, a Path Language for RDF. Proceedings of PSB'06, Pacific Symposium on Biocomputing,
2006. BioGuide Site.
[8] Sarah Cohen-Boulakia, Susan Davidson and Christine Froidevaux. A User-centric Framework for Accessing Biological Sources and Tools.* Proceedings of DILS'05, Data Integration for the Life Sciences, Springer-Verlag,
Lecture Notes in Bioinformatics (LNBI), Num. 3615, pp. 3-18,
San Diego, USA, 2005.
BioGuide Site.
[9] Alain Bidault, Sarah Cohen-Boulakia and Christine Froidevaux.
Preferences for Queries in a Mediator Approach. In Proceedings of ECAI'2004, European Conference on Artificial Intelligence, pp. 963-964.
National peer-reviewed conferences
[10] Sarah Cohen-Boulakia, Christine Froidevaux and Severine Lair.
Interrogation de sources biomédicales : gestion des préférences de l'utilisateur.* (In French)
In Proceedings of EGC'2004, Extraction et Gestion des Connaissances, pp. 53-64.
[11] David Abergel, Sarah Cohen-Boulakia, Frédéric Lemoine, Christine Froidevaux and Michel Termier.
WInGS: A reliability controlled data warehouse for yeast. In Proceedings of JOBIM'2004, Journées Ouvertes, Biologie, Informatique et Mathématiques (CD-ROM).
[12] Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Waller and Bernard Labedan. Genopage: a Database of all proteins modules encoded by completely sequenced genomes.*
In Proceedings of JOBIM'2002, Journées Ouvertes, Biologie, Informatique et Mathématiques, pp. 187-191.
Provenance in Scientific Workflows: ZOOM with user views.
Invited talk, University of Maryland, USA. (December 12th, 2006)
Modeling Provenance through User views.
Provenance Challenge, Washington DC, USA. (September 13th, 2006)
Querying multiple biological sources with BioGuideSRS.
Invited talk, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus,
Hinxton Cambridge, UK. (July 19th, 2006)
A user-centric approach to query alternative biological data sources: New features of BioGuideSRS.
Bioinformatics group lunch meeting, CBIL, Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, USA. (May 31st, 2006)
BioGuide: Supporting the scientist during the selection of sources and tools.
Invited talk, CBIL, Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, USA. (February 2nd, 2006)
BioGuide: Supporting the scientist during the selection of sources and tools.
Invited talk, Group of Oncology, Pediatrics, Children Hospital of Philadelphia, USA. (January 30th, 2006)
Supporting the scientist during the selection of sources and tools.
Penn Database Research Group, Penn University, Philadelphia, USA. (January 19th, 2006)
Guiding the user in the querying process.
Invited talk, Gemo Research Group (head: Serge Abiteboul), INRIA-Futur, Orsay (June 24th, 2005)
Selecting biomedical data sources according to user preferences.
Invited talk, Penn Database Research Group, Penn University, Philadelphia, USA. (October 27th, 2004)
Integrating data and processes in the biomedical domain.
Invited talk, GeneBio, Genova Bioinformatics Institute S.A., Geneva, Switzerland. (June 23rd, 2003)
Selected Data Sources. HKIS meeting, Milano, Italy. (March 3rd, 2003)
Features of Web Databanks.
HKIS meeting, ISoft, Gif-Sur-Yvette, France. (January 15th, 2003)
Reviewing
I am reviewer for the following conferences and journals:
ISMB'06,
the 14th Annual International conference on Intelligent Systems for Molecular Biology.
DB/IR day,
an american workshop which bring together database and information retrieval researchers and
students from academic and research institutions across the tristate area and beyond.
ISIBio, a french interdisciplinary working group interested in various
aspects of "Information Systems Integration in Biology".
This group brings together researchers from seven computer science laboratories
and from ten biological laboratories (2004-2006).
AS127, the national CNRS Working group on integration and interoperability of genomic data sources.
(2003-2004).
PPF, multidisciplinary program "Programme PluriFormation" on Bioinformatics and
Genomics. This PPF brings together the bioinformatics groups
from three biological laboratories, two computer science laboratories
and from the laboratory of mathematics at Orsay campus.