Penn Discourse Treebank (PDTB)


The Penn Discourse TreeBank Project is an NSF funded project at the Institute for Research in Cognitive Science, University of Pennsylvania.

**************
PDTB 2.0 is now available from the Linguistic Data Consortium.

The corpus is described in the PDTB 2.0 Annotation Manual

Please visit the PDTB API page for technical support.

**************

 

Faculty at Penn

Aravind Joshi, Ani Nenkova

 

Faculty at Edinburgh

Bonnie Webber

 

Researchers

Rashmi Prasad, Alan Lee, Eleni Miltsakaki

 

Previous and Current Penn Students

Nikhil Dinesh, Geraud Campion
Lukasz Abramowicz, John Bell
Katherine Forbes, Cassandre Creswell,
Jason Teeple, Tom Morton

 

Previous and Current Staff

Jeremy Lacivita

 

Annotators

Lukasz Abramowicz, Dan Afergan, Soobia Afroz, Driya Amandita, Alex Channer, Sara Clopton, George Cooper, Cassandre Creswell, Sarah Johnstone, John Laury, Alan Lee, Marielle Lerner, Sophia Malamud, Chris Moulton, Viraj Narayanan, Emily Pawley, Steven Pettington, Sami Saba, Adi Shifir, Sandhya Sundaresan, Nianwen Xue

 

Abstract:   The goal of the PDTB project is to develop a large scale corpus annotated with information related to discourse structure. While there are many aspects of discourse that are crucial to a complete understanding of natural language, the Penn Discourse Treebank (PDTB) focuses on encoding coherence relations associated with discourse connectives. The annotations include the argument structure of the connectives, thus exposing a clearly defined level of discourse structure which will support the extraction of a range of inferences associated with discourse connectives. Some other annotated features associated with discourse connectives and their arguments include sense distinctions for discourse connectives, and attribution-related features for both connectives and their arguments.

The annotations in the PDTB are linked to the Penn Treebank.

The PDTB is targeted to extend the scope of using large scale resources such as the PTB for a wide range of applications, ranging from parsing, information extraction, question-answering, summarization, machine translation, generation systems, as well as corpus based studies in linguistics and psycholinguistics. Since the PDTB will provide a substantial level of discourse structure information, the PDTB, together with the PTB, will raise the bar very substantially with respect to the quality and coverage achieved in the above mentioned applications.



Related Papers and Reports:
pdtb-request@linc.cis.upenn.edu
Last modified: April 3, 2008
eXTReMe Tracker