Preliminary Program

Unsupervised Discovery of Domain-Specific Knowledge from Text

Dirk Hovy¹, Chunliang Zhang¹, Eduard Hovy¹, Anselmo Peñas²
¹Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292, ²UNED NLP and IR Group, Juan del Rosal 16, 28040 Madrid, Spain

Abstract

Learning by Reading (LbR) aims at enabling machines to acquire knowledge from and reason about textual input. This requires knowledge about the domain structure (such as entities, classes, and actions) in order to do inference. We present a method to infer this implicit knowledge from unlabeled text. Unlike previous approaches, we use automatically extracted classes with a probability distribution over entities to allow for context-sensitive labeling. From a corpus of 1.4m sentences, we learn about 250k simple propositions about American football in the form of predicate-argument structures like "quarterbacks throw passes to receivers". Using several statistical measures, we show that our model is able to generalize and explain the data statistically signiﬁcantly better than various baseline approaches. Human subjects judged up to 96.6% of the resulting propositions to be sensible. The classes and probabilistic model can be used in textual enrichment to improve the performance of LbR end-to-end systems.

Full paper: http://www.aclweb.org/anthology/P/P11/P11-1147.pdf