This tool is built to perform summary content evaluation by comparing a generated summary with the source document for which it was produced. Since summaries are expected to be surrogates of the input, high similarity with the source would be indicative of good quality summaries and vice versa.
This package contains code to obtain various input-summary similarily metrics that were compared in our work described in the following papers.
An information-theoretic measure, Jensen Shannon divergence between vocabulary distributions of the input and summary texts was found to produce the best predictions of summary quality. System scores produced by this metric obtain correlations with pyramid scores in the range of 0.89 (TAC 2008 data) and 0.74 (TAC 2009). More details about the performance of various features can be found in the papers above.