A Novel Word Sense Disambiguation Algorithm Based on Semi-Supervised Statistical Learning

Zhehuang Huang, Yidong Chen, Xiaodong Shi


Statistical learning theory is a framework drawing from the fields of statistics and functional analysis . It provides a strong theoretical foundation for machine learning problems in the system of finite sample case. Word sense disambiguation (WSD) is a fundamental task in natural language processing to identify which sense of a word is used in a sentence, when the word has multiple meanings. At present, the mainstream studies of word sense disambiguation focus on the use of a variety of statistical machine learning techniques. But it difficult to obtain high quality labeled data. To solve the problem, we proposed a novel word sense disambiguation algorithm based on semi-supervised statistical learning in this paper. Firstly, an initial classifier with a certain accuracy rate was constructed based on small-scale labeled data. Then we extend the train data using a variety of threshold. The experiment results show the proposed method has a higher performance for word sense disambiguation.


Semi-supervised, Statistical learning, Word sense, Maximum entropy.

