Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

Daniel Emilio Beck
Federal University of São Carlos


Abstract

In this paper I present a Master's thesis proposal in syntax-based Statistical Machine Translation. I propose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. I also present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.




Full paper: http://www.aclweb.org/anthology/P/P11/.pdf