Open Access Open Access  Restricted Access Subscription or Fee Access

Improved Relative Discriminative Criterion Feature Ranking Technique for Text Classification

Wareesa Sharif, Noor Azah Samsudin, Mustafa Mat Deris, Muhammad Aamir

Abstract



Feature ranking techniques are used to improve the performance of classification in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank term. In contrast to document frequency, term frequency support real values of the term. Recent feature ranking techniques use term frequencies with frequently occurring terms, but ignore rarely occurring terms which are as meaningful and important as frequently occurring terms. Moreover, F-measure decreases as features of existing techniques increases. In this paper, Improved Relative Discriminative Criterion (IRDC) technique is proposed to obtain more informative and meaningful rarely occurring terms. IRDC scale up rarely occurring terms that is present in one class and absent in other classes. Additionally, IRDC creates a trade-off between frequently and rarely occurring terms. Experimental results indicate that our proposed technique on reuters21578 and 20newsgroup datasets using well known classifiers like multinomial naïve bayes (MNB), support vector machine (SVM) and decision tree (DT) performed better in terms of F-measure.

Keywords


Text classification, High dimensional data, Feature ranking, Document frequency, Term count, Rare terms, True positive rate, False positive rate.

Full Text:

PDF


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.