Open Access Open Access  Restricted Access Subscription or Fee Access

Query-based semantic feature selection model

P. Shayegh Boroujeni, Y. Li, Q. Zhang

Abstract



The rapid development of information technology and the widespread use of the internet in organizations have resulted in the production of large number of digital documents, most of which are plain unstructured text. Text classification is one way of structuring such plain text
and preparing it for further analysis. Feature selection is a core part of text classification aiming to select a subset of features that allows a classifier to reach higher accuracy and optimal performance. However, the existing feature selection methods either ignore semantics
of text, require external knowledge resources or do not provide explainable semantic features. This study aims to address this gap by proposing a new feature selection method for text classification which can be used for small and large datasets, does not require specific knowledge resources, and provides explainable semantic features.The proposed feature selection method is applied to three datasets, DementiaBank, RCV1, and ELEC, in order to empirically test its classification performance. Experimental results show that a text classification algorithm with the proposed feature selection method has substantially improved accuracy and F1-score compared to a text classification algorithm without
the proposed feature selection method. These findings highlight the contribution of our study which is introducing a new text classification framework in which the base classifier is improved by integrating the probabilities of the features in the base classifier with their
relevance to the topics or questions that appear in the context.

Keywords


text classification, semantic feature selection, topic modelling, question answering, natural language processing.

Full Text:

PDF


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.