An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Donald Metzler,  Eduard Hovy,  Chunliang Zhang
University of Southern California


Abstract

Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2096.pdf