Open Access Open Access  Restricted Access Subscription or Fee Access

Sentiment Learning from Imbalanced Dataset: An Ensemble Based Method

Vinodhini Gopalakrishnan, Chandrasekaran Ramaswamy


More people are buying products online and expressing their opinions on the products through online reviews. Sentiment analysis is used to extract opinion related information from the reviews and the extracted results can benefit both consumers and manufacturers. Much work on machine learning based sentiment classification has been carried out on balanced datasets. However, the real time sentiment analysis is a challenging machine learning task, due to the imbalanced nature of positive and negative sentiments. Sentiment analysis becomes complex when learning from imbalanced data sets, very few minority class instances cannot present sufficient information and result in performance degrading significantly. Modifying the data distribution or the classifier are the traditional approaches for dealing with the class imbalance problem. In this work, we propose to apply a combination of both approaches. We propose a modification in ensemble based bagging algorithm and also in sampling method used for data distribution, so as to solve class imbalance problem to improve the classification performance. We found that the modified bagged ensemble makes an improvement in predicting performance in terms of the receiver operating characteristic curve (ROC). The results also show that the modified bagging model performs better in terms of area under the receiver operating characteristic curve (AUC) in imbalanced dataset.


sentiment, classifier, opinion, learning, reviews.

Full Text:


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.