The Penn Discourse Treebank (PDTB) is
a large scale corpus annotated
with information related to discourse structure and discourse semantics. While
there are many aspects of discourse that are crucial to a
complete understanding of natural language, the PDTB focuses on encoding
discourse relations. The annotation
methodology follows a lexically-grounded
approach. The PDTB has strived to maintain a
theory-neutral approach with respect to the
nature of high-level representation of discourse
structure, in order to allow the corpus to be
usable within different theoretical
frameworks. Theory-neutrality is achieved by
keeping annotations of discourse relations
"low-level": Each discourse relations is
annotated independently of other relations, that
is, dependencies across relations are not
marked.
The PDTB is aimed to support the extraction of a range
of inferences associated with discourse
relations, for a wide range of NLP applications,
such as parsing, information extraction,
question-answering, summarization, machine
translation, generation, as well as
corpus based studies in linguistics and
psycholinguistics.
Discourse relations in the current version of the
PDTB are taken to be triggered by explicit
phrases or by structural adjacency. Each
relation is further annotated for its two
abstract object arguments, the sense of the
relation, and the attributions associated with
the relation and each of its two arguments. The annotations in
the PDTB are aligned with the syntactic
constituency annotations of
the Penn
Treebank.
The following publication describes PDTB-2.0. corpus:
Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi and Bonnie Webber. The Penn Discourse Treebank 2.0. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC). Marrakech, Morocco.
PDTB annotation guidelines, annotation format, and summary distributions are provided in the manual:
The PDTB Research Group. 2008. The PDTB 2.0. Annotation Manual. Technical Report IRCS-08-01. Institute for Research in Cognitive Science, University of Pennsylvania.
The PDTB project also aims to conduct empirical
research with the PDTB corpus, for NLP as well as
theoretical linguistics. See
the publications
for PDTB related research supported by the project.
|