Detection of Errors and Correction
in Corpus Annotation

An Investigation into Improving Part-of-Speech Tagging

Markus Dickinson

Proceedings of MCLC 06.

We develop a method to improve POS tagging, which attempts to account for problematic ambiguities by redefining the tagset. Hand evaluating the tagger-benchmark disagreements shows us the profound effect errors have on reported accuracies, and we also explore the effect of correcting training data errors. Our results emphasize the need to focus on particular tagging problems in evaluation.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:mclc:06,
  author =       {Markus Dickinson},
  title =        {An Investigation into Improving Part-of-Speech Tagging},
  booktitle =    {Proceedings of the Third Midwest Computational Linguistics 
                  Colloquium (MCLC-06)},
  address =      {Urbana-Champaign, IL},
  year =         {2006},
  url = {http://www9.georgetown.edu/faculty/mad87/papers/dickinson-mclc-06.html}
}