University of Pennsylvania

Institute For Research in Cognitive Science

THE PDTB ANNOTATOR

The PDTB Annotator is a tool for annotating and adjudicating discourse relations using the PDTB Annotation Framework. It enables the annotation of text files with minimal preprocessing - the user simply needs to organize their text files into the correct directory structures before loading them into the Annotator.

Prerequisites and Download

The latest version of the PDTB Annotator is Annotator version 4. The Annotator runs locally on the user's computer. To run the tool, users should have at least version 6 of the Java Runtime Environment installed on their computer.

Check if you have Java installed here.

Once Java is running on your computer, download the Annotator using the following links:

Annotator version 4 for Java 8.

Users who prefer to use an older JRE (minimum Java 6) can download this version of the tool:

Annotator version 4 for Java 6.

A config file named Options.cfg is also necessary to run the tool:

Download Options.cfg.

Setup

Step 1 - Gather your text files and convert to UTF-8

Gather the text files that you wish to annotate. Ensure that these files are saved in UTF-8 (Unicode). There are several ways to convert a file to UTF-8. The following sites offer several suggestions:

Saving a text file on a Mac or PC in UTF-8
Converting files to UTF-8
Step 2 - Organize your text files into a directory structure

Create a directory that will contain your text files. You should create a base directory, and then create at least one subdirectory within the base directory. The text files are contained within these subdirectories.

For example, you may have six text files to annotate. You wish to organize these six files into three different subdirectories. You could create a directory structure as follows:

  • MyTexts
    • Dir1
      • Dir1_Text1
      • Dir1_Text2
      • Dir1_Text3
    • Dir2
      • Dir2_Text1
    • Dir3
      • Dir3_Text1
      • Dir3_Text2

In the above example, MyTexts is the name of the base directory of your text files, Dir1, Dir2 and Dir3 are the subdirectories, and the files within each subdirectory are your actual text files (Dir1_Text1, Dir1_Text2, etc.).

(Note: a minimum of one subdirectory is needed. So even if you have just one text file to annotate, you need to create a subdirectory that will contain this text file.)

You could name your directories and files as you wish. It is suggested that all your text files are given unique names according to some convention, like the numberings used in the above example.

Step 3 - Create a directory for your annotation files

The next step is to create a directory where you would like to save the annotation files that the tool will generate as you annotate. You simply need to create an empty directory in this step and you can name the directory as you wish so long as it differs from the name of the base directory of your text files created in step 2.

As you annotate, the tool will dynamically create subdirectories and annotation files corresponding to the structure and naming conventions set up in Step 2 for your text files.

Step 4 (Optional) - Create a directory for your comment files

The tool allows you to add comments for each token that is annotated. When you add a comment for a token, a separate comment file is created corresponding to the annotation file containing that token.

For this to work, simply create an empty directory where your comment files will be saved. As before, this comments base directory can be named as you wish so long as it is different from the directories created in Steps 2 and 3.

Download the following sample setup for an example set of directories that can be loaded into the Annotator.

Running the Tool

To load the tool, first place the Options.cfg that you downloaded into the same directory as your Annotator jar file. Then launch tool either by double-clicking (Windows/MAC) or running the following command (UNIX/Linux/MAC) from the current directory:

java -jar Annotator_v4.0_java8.jar

A file chooser window will appear. Only the top three rows are relevant. For "RawRoot", select the path to the base directory containing your text files. For "WorkRoot", select the path to the annotation base directory. For "WorkComment", select the path to the comments base directory.

Click on OK and the tool should launch.

A screenshot of the Annotator in action. The working directory is named "alan" here and appears on the Relation List panel.

Adjudication

The Annotator can also be used as an adjudication tool. It allows an adjudicator to view the work of up to two annotators and to select and/or edit the annotations into gold standard files.

To use the tool as an adjudicator, first gather the completed annotation files from your annotator(s), who will have set up their annotation directories as described above. Then, as before, set up your text directories, an empty working directory (for your gold files) and an empty comments directory (for any comments on your gold tokens).

Launch the tool and fill in the necessary fields as in the example below:

In this example, the WorkRoot points to the empty directory that will contain your gold files. Ann1Root points to the directory containing the completed annotations of the first annotator and Ann2Root the completed annotations of the second annotator. Corresponding comment directories for each of these three are also specified.

Once the relevant directories are selected, launch the tool as before.

An example of the Annotator used for adjudication. Two annotators (Alan and Rashmi) are shown. The adjudicator selects and/or edits annotator tokens into gold files.

File Format

The Annotator uses a simple file format - each token is stored as a pipe-delimited row of text. These fields are listed below (for those familiar with the original version of the tool, new fields are marked in blue):

Description of fields within ann files
Relation TypeExplicit, Implicit, AltLex, EntRel, NoRel
Conn SpanText Span of the Connective
Conn SrcConnective's Source
Conn TypeConnective's Type
Conn PolConnective's Polarity
Conn DetConnective's Determinacy
Conn Feat SpanConnective's Feature Span
Conn1Explicit Connective / First Implicit Connective
SClass1AFirst Semantic Class of the First Connective
SClass1BSecond Semantic Class of the First Connective
Conn2Second Implicit Connective
SClass2AFirst Semantic Class of the Second Connective
SClass2BSecond Semantic Class of the Second Connective
Sup1 SpanText Span of the First Argument's Supplement
Arg1 SpanText Span of the First Argument
Arg1 SrcFirst Argument's Source
Arg1 TypeFirst Argument's Type
Arg1 PolFirst Argument's Polarity
Arg1 DetFirst Argument's Determinacy
Arg1 Feat SpanText Span of the First Argument's Feature
Arg2 SpanText Span of the Second Argument
Arg2 SrcSecond Argument's Source
Arg2 TypeSecond Argument's Type
Arg2 PolSecond Argument's Polarity
Arg2 DetSecond Argument's Determinacy
Arg2 Feat SpanText Span of the Second Argument's Feature
Sup2 SpanText Span of the Second Argument's Supplement
Adju ReasonThe Adjudication Reason
Adju DisagrThe type of the Adjudication disagreement
PB RoleThe PropBank role of the PropBank verb
PB VerbThe PropBank verb of the main clause of this relation
IdentifierThe unique identifier of this token within the annotation file
TaskThe name of the task for which the token was annotated
LinkThe link id of the token

To capture the association between the two tokens, the Link field has been provided. It should be populated with a string made up of the keyword LINK followed by a numerical index. Linked tokens will share the same link values. E.g.

Explicit|5947..5950|Wr|Comm|Null|Null|||Expansion.Conjunction| .... |5947..5950|VP-CONJ|LINK3
Implicit||Wr|Comm|Null|Null||instead|Expansion.Substitution.Arg2-as-subst| .... |5947|VP-CONJ|LINK3

Note that this linking can only be done as a post-processing step. There is no current facility within the tool to add linking during annotation or adjudication.

For questions/comments, please contact Alan Lee at aleewk AT seas.upenn.edu