Open Access Open Access  Restricted Access Subscription or Fee Access

English Sentiment Classification using An Ochiai Similarity Measure and The One-dimensional Vectors in a Parallel Network Environment

Vo Ngoc Phu, Vo Thi Ngoc Tran


Sentiment analysis is very significant in everyday life, for example, in political activities, commodity production, and commercial activities. A novel model for large-scale data set opinion analysis in this work has been proposed successfully. We use An OCHIAI coefficient (OC) of the clustering technologies of a data mining field to cluster one document of our English testing data set, which is 6,000,000 documents comprising the 3,000,000 positive and the 3,000,000 negative, into either the positive polarity or the negative polarity based on our English training data set which is 5,000,000 documents including the 2,500,000 positive and the 2,500,000 negative. Any opinion lexicons are not used in this study in English. We do not use any multi-dimensional vector based on both a vector space modelling (VSM) and the sentiment lexicons. We only use many one-dimensional vectors based on VSM. One one-dimensional vector is clustered into either the positive or the negative if this vector is very close to either the positive or the negative by using many similarity coefficients of the OC. Therefore, we see that this vector is clearly very similar to either the positive or the negative. One document of the testing data set is clustered into the sentiments (positive, negative, or neutral) based on many one-dimensional vectors. We firstly tested the proposed model in a sequential environment and then, this novel model was secondly tested in a distributed network system. We have had 87.58% which is the accuracy of the testing data set in this research.The execution time of the model in the parallel network environment is faster than thatl in the sequential system. This survey used many similarity coefficients of the data mining field. Many applications and surveys can significantly use the results of this research.


English sentiment classification; distributed system; parallel system; OCHIAI similarity measure; Cloudera; Hadoop Map and Hadoop Reduce; clustering technology

Full Text:


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.