Detection of Errors and Correction
in Corpus Annotation

On Detecting Errors in Dependency Treebanks

Adriane Boyd, Markus Dickinson, and Detmar Meurers

Research on Language and Computation. 6(2).

Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor argument structure as a target of syntactic processing. Corresponding\-ly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on dependency treebanks. At the same time, general techniques for detecting errors in dependency annotation have not yet been developed.

We address this gap by exploring how a technique developed for detecting errors in constituency-based syntactic annotation can be adapted to systematically detect errors in dependency annotation. Building on an analysis of key properties and differences between constituency and dependency annotation, we discuss results for dependency treebanks for Swedish, Czech, and German. Dealing with different languages and annotation schemes also raises questions of standardization for some aspects of dependency annotation, in particular regarding the locality of annotation, phenomena such as coordination, and the unique-head constraint.


Electronically available file formats:


Bibtex entry:

@article{boyd-et-al:08,
  author =   {Adriane Boyd and Markus Dickinson and Detmar Meurers},
  title =    {On Detecting Errors in Dependency Treebanks},
  journal =  {Research on Language and Computation},
  volume =   {6},
  number =   {2},
  pages =    {113--137},
  year =     {2008},
  url =      {http://decca.osu.edu/publications/boyd-et-al-08.html}
}