Lexicographic Semirings for Exact Automata Encoding of Sequence Models

Brian Roark,  Richard Sproat,  Izhak Shafran
Center for Spoken Language Understanding


Abstract

In this paper we introduce a novel use of the lexicographic semiring and motivate its use for speech and language processing tasks. We prove that the semiring allows for exact encoding of backoff models with epsilon transitions. This allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. We present preliminary empirical results demonstrating that, even in simple intersection scenarios amenable to the use of failure transitions, the use of the more powerful lexicographic semiring is competitive in terms of time of intersection.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2001.pdf