The future of MSTParser
A SourceForge project has
been started by Jason
Baldrige and
Ryan McDonald to make
it easier to add new features to the parser. A new version of the parser will
be available soon from that site.
MSTParser (v0.2)
This is the parser described in the following papers:
-
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
R. McDonald, K. Lerman, and F. Pereira
Tenth Conference on Computational Natural Language Learning (CoNLL-X)
(2006)
http://www.seas.upenn.edu/~strctlrn/bib/PDF/conll2006.pdf
-
Online Learning of Approximate Dependency Parsing Algorithms
R. McDonald and F. Pereira
11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006
81-88
(2006)
http://www.seas.upenn.edu/~strctlrn/bib/PDF/approxDepEACL2006.pdf
-
Non-projective Dependency Parsing using Spanning Tree Algorithms
R. McDonald, F. Pereira, K. Ribarov, and J. Haji\v{c}
Proceedings of HLT/EMNLP 2005
(2005)
http://www.seas.upenn.edu/~strctlrn/bib/PDF/nonprojectiveHLT-EMNLP2005.pdf
-
Online Large-Margin Training of Dependency Parsers
R. McDonald, K. Crammer, and F. Pereira
43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
(2005)
http://www.seas.upenn.edu/~strctlrn/bib/PDF/dependencyACL2005.pdf
The parser is implemented in Java. The README file describes usage, input
and output formats.
Version history
- Version 0.2
- uses second-order edge features (see reference 4 above).
- Version 0.1
- has the ability to produce typed (or labeled) trees.
Downloads
FAQ
-
What character encoding does the parser use?
It is hard coded for Unicode (UTF8) in correspondence with the
CoNLL-X shared task. You can
grep "UTF8" and replace all occurances with whatever encoding you want.
-
Can the parser use CoNLL-X input format?
Not yet. However, I have include some easy to use python scripts to convert
between CoNLL and MSTParser formats. They are in the scripts directory.
-
Can the parser produce non-tree dependency graphs?
Not yet. This will be part of the next release.
-
Is the edge labeler any good?
This is somewhat complicated. The parser currently jointly predicts dependencies
and labels at once. This is nice since it allows the information from both
decisions to simultaneously be used. However, the labeler is forced to obey
any locality constraints of the dependency parser (single edge or pairs of edges).
I have found that it is often better to have a post-processing edge labeler that
can have a larger scope for features. It is not difficult to create this and
any classifier can be used. I suggest MALLET.
I will make a post-processing labeler available in the next version.
Credits
Research supported by the National Science Foundation under grants
EIA-0205448 (Mining the Bibliome) and IIS 0428193
(Machine Learning for Sequences and Structured Data).
|