Latent Class Transliteration based on Source Language Origin

Masato Hagiwara and Satoshi Sekine
Rakuten Institute of Technology, New York


Abstract

Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, e.g., ``get'' in ``piaget'' and ``target.'' Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2010.pdf