Detection of Errors and Correction
in Corpus Annotation

An Investigation into Improving Part-of-Speech Tagging

Markus Dickinson

Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006). Prague, Czech Republic.

We show how to compact rules for error detection by grouping their daughters lists into equivalence classes, by maintaining the essential elements of rule. Using a few simple and precise properties, we demonstrate the rule growth of the equivalence classes is much less dramatic than the overall rule growth. After determining rule equivalence classes, we are able to eliminate potentially erroneous rules by running an endocentricity check, shown to be successful on the Wall Street Journal corpus.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:mclc:06,
  author =       {Markus Dickinson},
  title =        {Rule Equivalence for Error Detection},
  booktitle =    {Proceedings of the Fifth Workshop on Treebanks and 
                  Linguistic Theories (TLT 2006)},
  address =      {Prague, Czech Republic},
  year =         {2006},
  url = {http://www9.georgetown.edu/faculty/mad87/papers/dickinson-tlt06.html}
}