Open Access Open Access  Restricted Access Subscription Access

A Novel Two-Way Clustering based on Multivariate Outlier Detection

G.S. David Sam Jayakumara, Bejoy John Thomasb, A. Sulthanc

Abstract



Clustering is an extremely important task in a wide variety of application domains especially in management and social science research. In this paper, an iterative procedure of two-way clustering method based on multivariate outlier detection was proposed using the famous Squared Mahalanobis distance or T2 -distance and step-wise multiple discriminant analysis. At first, Squared Mahalanobis distance or T2 –distance should be calculated for the entire sample, then using T2 –statistic fix a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and the remaining inliers are treated as inlier cluster. Now, perform a step-wise multiple discriminant analysis by consider the cluster membership as dependent variable and utilizing the variables already used to detect multivariate outliers as independent variables. Moreover, retain the dominant sub-set of discriminant independent variables and leave the remaining variables. The dominant sub-set of variables are treated as variables cluster-1.Now, remove the outliers from the entire sample, leave the sub-set of dominant variables from the original set and repeat the same procedure for the remaining inliers and variables, until one of the following results will be attain i) there are no outliers in the final cluster ii) the set of predictors in the last iteration will be empty or there are no sub-sets emerged and iii) the variance-covariance matrix for the variables in the final cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visualizes the iterations and clustering process. This paper employed this procedure for clustering 275 customers of a famous four-wheeler in India based on 19 different attributes of the four wheeler and its company. The result of the proposed technique confirms there exist 6 clusters of customers, 5 variable clusters at 5% significance level and 5 clusters of customers, 5 variable clusters at 1% significance level respectively.

Keywords


Multivariate outliers, Mahalanobis distance, T2 –distance Upper Control Limit, Step-wise discriminant analysis, Variance-Covariance matrix, Multivariate test of Means

Full Text:

PDF


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information. Also: DOI is paid service which provided by a third party. We never mentioned that we go for this for our any journal. However, journal have no objection if author go directly for this paid DOI service.