Missing Data Imputation in a t -Distribution with Known Degrees of Freedom Via Expectation Maximization Algorithm and Its Stochastic Variants

P.K. Kinyanjui, C.L. Tamba, J.O. Okenye


During the data collection process, the field personnel encounters some inevitable challenges, making it difficult to gather all the required information. Since most of the statistical tools assume complete data, a data analyst is prompted to pre-treat the data before analyzing it. Model-based imputation techniques are widely used because they utilize all the collected information and preserve the original distribution of the data. In this paper, missing data imputation in multivariate t-distribution with known degrees of freedom using expectation maximization (EM), Stochastic EM (SEM) and Monte Carlo EM (MCEM) algorithms is considered. The method of mean square error (MSE) is used to compare the imputation efficiencies of the three procedures. In this study, the SEM technique yields the most efficient imputations when compared to EM and MCEM methods. Throughout all the iterations, SEM explores a wide log-likelihood region, thereby enabling it to avoid local maximal traps and converge at the most optimal points possible. Further, the efficiency of the three methods improves with higher correlation coefficients between variables.


expectation maximization (EM), stochastic EM, Monte Carlo EM, missing values, imputation

