Open Access Open Access  Restricted Access Subscription or Fee Access

Comparative Study of Combination of Preprocessing, N-Gram Feature Extraction, Feature Selection, and Classification Method in Indonesian Sentiment Analysis with Imbalanced Data

Rezkya Putri Septiani , Margaretha Ari Anggorowati


Social media makes a shift in lifestyles of people. People tend to use microblogging such as Twitter to criticize the controversial issues. The most controversial Indonesian economics policy in recent year is tax amnesty. Predicting positive and negative sentiments on tax amnesty policy could be developed by supervised machine learning. The performance of classification results can be improved by using the right combination of preprocessing technique, feature extraction, and feature selection. We aim to compare the performance and find the best combination of preprocessing technique, N-Gram feature extraction, feature selection, and classification method by conducting experiments. Data collection was developed by crawling using Twitter API. Imbalanced data is one of challenge in machine learning which can produce unsatisfactory classifiers and normalize the Indonesian slang can also be more challenging. This research uses an imbalanced dataset to know the performance of the combination algorithms handling the imbalanced data which measured by nested cross-validation. The experimental results show that the best combination of algorithms in this research performs well in handling imbalanced data and the performance of models can be improved and really depend on the combination of preprocessing, N-Gram feature extraction, feature selection, and classification method.


Text mining, sentiment analysis, preprocessing, feature selection, classification.

Full Text:


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information.