Detection of Errors and Correction
in Corpus Annotation
Determining Ambiguity Classes for Part-of-Speech Tagging
Markus Dickinson
Proceedings of RANLP-07.
We examine how words group together in the lexicon, in terms of ambiguity classes, and use this information in a redefined tagset to improve POS tagging. In light of errors in the training data and a limited amount of annotated data, we investigate ways to define ambiguity classes for words which consider the lexicon as a whole and predict unknown uses of words. Fitting words to typical ambiguity classes is shown to provide more accurate ambiguity classes for words and to significantly improve tagging performance.
Electronically available file formats:
-
.pdf
(126K)
Bibtex entry:
@InProceedings{dickinson:07, author = {Markus Dickinson}, title = {Determining Ambiguity Classes for Part-of-Speech Tagging}, booktitle = {Proceedings of the International Conference on Recent Advances in Natural Language Processing 2007 (RANLP-07)}, address = {Borovets, Bulgaria}, pages = {167--172}, year = {2007}, url = {http://jones.ling.indiana.edu/mdickinson/papers/dickinson-07.html} }