Penn CIS
Machine Learning at Penn
Genomics and Computational Biology

Minimum-Spanning Tree Parser

The future of MSTParser

A SourceForge project has been started by Jason Baldrige and Ryan McDonald to make it easier to add new features to the parser. A new version of the parser will be available soon from that site.

MSTParser (v0.2)

This is the parser described in the following papers:

  1. Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
    R. McDonald, K. Lerman, and F. Pereira
    Tenth Conference on Computational Natural Language Learning (CoNLL-X)    (2006)
  2. Online Learning of Approximate Dependency Parsing Algorithms
    R. McDonald and F. Pereira
    11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006  81-88  (2006)
  3. Non-projective Dependency Parsing using Spanning Tree Algorithms
    R. McDonald, F. Pereira, K. Ribarov, and J. Haji\v{c}
    Proceedings of HLT/EMNLP 2005    (2005)
  4. Online Large-Margin Training of Dependency Parsers
    R. McDonald, K. Crammer, and F. Pereira
    43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)    (2005)

The parser is implemented in Java. The README file describes usage, input and output formats.

Version history

Version 0.2
uses second-order edge features (see reference 4 above).
Version 0.1
has the ability to produce typed (or labeled) trees.



  • What character encoding does the parser use?
    It is hard coded for Unicode (UTF8) in correspondence with the CoNLL-X shared task. You can grep "UTF8" and replace all occurances with whatever encoding you want.
  • Can the parser use CoNLL-X input format?
    Not yet. However, I have include some easy to use python scripts to convert between CoNLL and MSTParser formats. They are in the scripts directory.
  • Can the parser produce non-tree dependency graphs?
    Not yet. This will be part of the next release.
  • Is the edge labeler any good?
    This is somewhat complicated. The parser currently jointly predicts dependencies and labels at once. This is nice since it allows the information from both decisions to simultaneously be used. However, the labeler is forced to obey any locality constraints of the dependency parser (single edge or pairs of edges). I have found that it is often better to have a post-processing edge labeler that can have a larger scope for features. It is not difficult to create this and any classifier can be used. I suggest MALLET. I will make a post-processing labeler available in the next version.


Research supported by the National Science Foundation under grants EIA-0205448 (Mining the Bibliome) and IIS 0428193 (Machine Learning for Sequences and Structured Data).