Chinese sentence segmentation as comma classification

Nianwen Xue and Yaqin Yang
Brandeis University


Abstract

We describe a method for disambiguating Chinese commas that is central to Chinese sentence segmentation. Chinese sentence segmentation is viewed as the detection of loosely coordinated clauses separated by commas. Trained and tested on data derived from the Chinese Treebank, our model achieves a classification accuracy of close to 90% overall, which translates to an F1 score of 70% for detecting commas that signal sentence boundary.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2111.pdf