edu.upenn.cis.pdtb
Interface RelationLoader

All Known Implementing Classes:
RelationLoaderImpl

public interface RelationLoader

Implementations of this assist in loading annotations into memory. The following conventions are assumed on a Linux/Unix filesystem (use the appropriate file separators on other platforms):

  1. RAW refers to the Wall Street Journal raw text. RAW is assumed to be divided into 25 sections, each with atmost 100 files. The directory textRoot refers to the directory such that textRoot/00/wsj_0003 is the RAW file for section 00, file 03.
  2. PDTB refers to the Penn Discourse Treebank. pdtbRoot refers to the directory such that pdtbRoot/00/wsj_0003.pdtb contains the PDTB annotations for textRoot/00/wsj_0003.
  3. PTB refers to the Penn Treebank. PTB files are assumed to be in symbolic expression form, and ptbRoot/00/wsj_0003.mrg contains the parse trees for textRoot/00/wsj_0003.
  4. Given a PDTB file pdtbRoot/ij/wsj_ijkl.pdtb, the associated RAW file is textRoot/ij/wsj_ijkl, and the associated PTB file is ptbRoot/ij/wsj_ijkl.

Author:
nikhild, geraud
See Also:
CorpusFileIterator

Method Summary
 PDTBRelationList loadRelations(CorpusFileIterator cfi)
          Loads a list of relations from a CorpusFileIterator
 PDTBRelationList loadRelations(java.io.Reader r, java.lang.String rawString, PTBTreeNode root)
          Loads a PDTB file given the RAW text and a tree node whose children are the parse trees for each sentence.
 PDTBRelationList loadRelations(java.lang.String textFile, java.lang.String ptbFile, java.lang.String pdtbFile)
          Loads a list of relations from a PDTB file, and its associated PTB and RAW files.
 PDTBRelationList loadRelations(java.lang.String textRoot, java.lang.String ptbRoot, java.lang.String pdtbRoot, java.lang.String secNo, java.lang.String fileNo)
          Loads the PDTB file, and its associated RAW and PTB files given the section and file numbers.
 

Method Detail

loadRelations

public PDTBRelationList loadRelations(java.lang.String textFile,
                                      java.lang.String ptbFile,
                                      java.lang.String pdtbFile)
                               throws java.io.IOException
Loads a list of relations from a PDTB file, and its associated PTB and RAW files.

Parameters:
textFile - The name of the RAW file.
ptbFile - The name of the PTB file.
pdtbFile - The name of the PTB file.
Throws:
java.io.IOException

loadRelations

public PDTBRelationList loadRelations(CorpusFileIterator cfi)
                               throws java.io.IOException
Loads a list of relations from a CorpusFileIterator

Parameters:
cfi - The Corpus File Iterator object.
Throws:
java.io.IOException

loadRelations

public PDTBRelationList loadRelations(java.lang.String textRoot,
                                      java.lang.String ptbRoot,
                                      java.lang.String pdtbRoot,
                                      java.lang.String secNo,
                                      java.lang.String fileNo)
                               throws java.io.IOException
Loads the PDTB file, and its associated RAW and PTB files given the section and file numbers. Existence of the file should be ensured before invoking, otherwise an exception will be thrown.

Parameters:
textRoot - The root dir for RAW files.
ptbRoot - The root dir for PTB files.
pdtbRoot - The root dir for PDTB files.
secNo - The section number as a string. Note that the section numbers are 00, 01, 02...09, 10, 11, ... 24
fileNo - The file number as a string. 00, 01, ...09, 10, 11, ... 99 are possible.
Throws:
java.io.IOException

loadRelations

public PDTBRelationList loadRelations(java.io.Reader r,
                                      java.lang.String rawString,
                                      PTBTreeNode root)
                               throws java.io.IOException
Loads a PDTB file given the RAW text and a tree node whose children are the parse trees for each sentence. Convenience method for those who hate entity resolution imposed on them.

Throws:
java.io.IOException