Sense-Annotated Judgements of Similarity


    ConceptSim contains sense-annotated versions of three standard similarity datasets:
    • MC
    • RG
    • WordSim-Sim
    Each pair of words was annotated by two humans with WordNet 3.0 senses. The inter-annotator agreement ranged from 86% - 93%. The similarity scores themselves are maintained from the original datasets (motivated by past research showing greatest correlations with human judgments coming from the maximum similarity over all pairs of senses). The final version of each sense-annotated dataset was the result of annotators coming to an agreement on disagreed senses.

Related Publications:

  • [ pdf ] Hansen A. Schwartz, Fernando Gomez. 2011. Evaluating Semantic Metrics on Tasks of Concept Similarity. In FLAIRS-24. Palm Beach, Florida.

Original Data References

  • Rubenstein, H., and Goodenough, J. 1965. Contextual correlates of synonymy. Communications of the ACM 8:627-633.
  • Miller, G., and Charles, W. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1):1-28.
  • [ link ] Finkelstein, L.; Gabrilovich, E.; Matias, Y.; Rivlin, E.; Solan, Z.; Wolfman, G.; and Ruppin, E. 2001. Placing search in context: The concept revisited. In ACM Trans. on Information Systems.
  • [ link ] Agirre, E.; Alfonseca, E.; Hall, K.; Kravalova, J.; Pasca, M.; and Soroa, A. 2009. A study on similarity and relatedness using distributional and wordnet-based approaches. In The Annual Conference of the NAACL, 19-27.