Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes

Thomas Mueller and Hinrich Schuetze
IMS Universität Stuttgart


Abstract

We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing out-of-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall and 81% on unknown histories.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2092.pdf