Using Derivation Trees for Treebank Error Detection

Seth Kulick,  Ann Bies,  Justin Mott
Linguistic Data Consortium, University of Pennsylvania


Abstract

This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2122.pdf