The following syntactic shortenings are supported in v0.2.3:
| Standard XPath | Short Form |
self::* | =::* |
child::* | >::* |
parent::* | <::* |
descendant::* | >>::* |
descendant-or-self::* | =>>::* |
ancestor-or-self::* | <<=::* |
following::* | ->::* |
preceding::* | <-::* |
following-sibling::* | $>::* |
preceding-sibling::* | <$::* |
=>>::NP
=>>::S[contains(@roles,'TPC')]
=>>::John[->::Mary]
=>>::John/->::Mary
public int getCount(String ptbRoot, String xpathExpr) throws Exception{
//The navigator provides iterators that
//traverse the tree.
PTBNavigator nav = new PTBNavigator(null);
//Compile the XPath expression
PTBXPath xp = (PTBXPath)(nav.parseXPath(xpathExpr));
int count = 0;
//For each treebank file
for(PTBTask task = new PTBTask(ptbRoot);task.hasNext();){
//Get the abstract root node whose children
//are the sentences.
PTBTreeNode root = task.next();
//For each sentence
for(Enumeration children = root.children(); children.hasMoreElements();){
//Set a sentence as the root of a tree against
//which an expression is run
nav.setRoot((PTBTreeNode)(children.nextElement()));
//Run the expression
count += ((List)xp.evaluate(nav.getRoot())).size();
}
}
return count;
}
Relevant Packages: edu.upenn.cis.ptb.xpath and edu.upenn.cis.ptb.util
The following are queries from the Bird et al. paper
in our notation. Please refer to the paper for details on the semantics.
>>::S/>>::saw
i-foll(>>::VB,{=::NP})
>>::VP/>::VB/->::NN
subtree(>>::VP,{>::VB/->::NN})
subtree(>>::VP,{>::NP[r-edge()]})
subtree(>>::VP,{>>::NP[r-edge()]})
>>::VP[
subtree(.,
{i-foll(=>>::VB[l-edge()],
{i-foll(=::NP,
{=::PP[r-edge()]})
})
})
]
node-set subtree(node-set, string) node-set i-foll(node-set, string)The second argument is a string which should be a valid xpath expression, and it is applied to each node in the node-set supplied in the first argument. Since we need the ability to embed these functions, curly braces are used instead of quotes in the expressions above.