Detection of Errors and Correction
in Corpus Annotation

Determining Ambiguity Classes for Part-of-Speech Tagging

Markus Dickinson

Proceedings of RANLP-07.

We examine how words group together in the lexicon, in terms of ambiguity classes, and use this information in a redefined tagset to improve POS tagging. In light of errors in the training data and a limited amount of annotated data, we investigate ways to define ambiguity classes for words which consider the lexicon as a whole and predict unknown uses of words. Fitting words to typical ambiguity classes is shown to provide more accurate ambiguity classes for words and to significantly improve tagging performance.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:07, 
  author =       {Markus Dickinson}, 
  title =        {Determining Ambiguity Classes for Part-of-Speech
                  Tagging},
  booktitle =    {Proceedings of the International Conference on Recent 
                  Advances in Natural Language Processing 2007 (RANLP-07)}, 
  address =      {Borovets, Bulgaria},
  pages =        {167--172},
  year =         {2007},
  url =  {http://jones.ling.indiana.edu/mdickinson/papers/dickinson-07.html}
}