Open Access Open Access  Restricted Access Subscription or Fee Access

Sentiment Classification using The Sentiment Scores of Lexicons Based on A Kuhns-II Coefficient in English

Vo Ngoc Phu, Vo Thi Ngoc Tran

Abstract



Many different approaches related sentiment lexicons have already been proposed for sentiment classification because they have had many advantages for the sentiment classification and It has had many significant contributions which have been applied to everyday life, political activities, commodity production, and commercial activities. In this survey, a novel model using the sentiment scores of the lexicons according to A Kuhns-II Coefficient (KIIC) has already been proposed for the sentiment classification of 8,500,000 documents of our testing data set comprising the 4,250,000 positive and the 4,250,000 negative in English. We do not use any training data sets. We do not use any vectors. We do not use any vector space modelling (VSM). In addition, we do not use any one-dimensional vectors. Any multi-dimensional vectors are not used in this model. We only use the sentiment values of the lexicons for the proposed model. We calculate frequency_valence of one term of one document which is a value of the frequency of this term in this document multiplied by the valence of this term. In one document, we calculate a total of all the frequency_valence measures of all the terms of this document which their valence is the positive polarity, called ATotalOfPositive_Frequency_Valences and then, we also calculate a total of all the frequency_valence measures of all the terms of this document which their valence is the negative polarity, called ATotalOfNegative_Frequency_Valences. The sentiment classification of this document is certainly identified by a comparison of ATotalOfPositive_Frequency_Valences with ATotalOfNegative_Frequency_Valences. The novel model has firstly been implemented in a sequential environment. Next, we have performed the proposed model in a parallel network environment secondly. The parallel system has more advantages of the results than the sequential environment. The accuracy of the testing data set has been achieved 88.64%. Our novel model has the benefits of the results which can widely be applied to many different fields in many commercial applications and surveys of the sentiment classification.

Keywords


English sentiment classification; parallel system; Cloudera; Hadoop Map and Hadoop Reduce; sentiment scores of lexicons; Kuhns-II coefficient.

Full Text:

PDF


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.