Exploiting Morphology in Turkish Named Entity Recognition System

Reyyan Yeniterzi
Carnegie Mellon University


Abstract

Turkish is an agglutinative language with complex morphological structures, therefore using only word forms is not enough for many computational tasks. In this paper we analyze the effect of morphology in a Named Entity Recognition system for Turkish. We start with the standard word-level representation and incrementally explore the effect of capturing syntactic and contextual properties of tokens. Furthermore, we also explore a new representation in which roots and morphological features are represented as separate tokens instead of representing only words as tokens. Using syntactic and contextual properties with the new representation provide an 7.6% relative improvement over the baseline.




Full paper: http://www.aclweb.org/anthology/P/P11/.pdf