Improving word sense disambiguation using topic features

Cai JF, Lee WS, Teh YW

This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified näive Bayes network alongside other features such as part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In both the English all-words task and the English lexical sample task, the method achieved significant improvement over the simple näive Bayes classifier and higher accuracy than the best official scores on Senseval-3 for both task. © 2007 Association for Computational Linguistics.