Partial Parsing from Bitext Projections

Prashanth Mannem and Aswarth Dara
IIIT-Hyderabad


Abstract

Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which doesn't need a fully connected parse and can learn from parses by utilizing available structural and syntactic information in them. Our parser achieved statistically significant improvements over a baseline system that trains on only fully connected parses for Bulgarian, Spanish and Hindi. It also gave a significant improvement over previously reported results for Bulgarian and set a benchmark for Hindi.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1160.pdf