Validation study of consumer scanner and retail scanner data

Meike Wocken


The aim of this study is, first, to analyse consumer scanner panel data with regard to measurement error in price records and, second, to classify potential biases. Studies, which use such data, commonly rely on the assumption of unbiased data. In the used consumer scanner data participating households record each bought article, corresponding price, and attended chain after purchase. For validation, retail scanner data are used which contain information about sales on store-level. Matching between both datasets is impossible on store-level. Therefore, only articles with uniform pricing across all stores of a chain and particular calendar week are analysed. With it, the visited store in detail is negligible. Information about the chain is sufficient. In total, I find an additive, normal distributed measurement error in consumer price data, which is correlated with the price level. The assumption of a classical measurement error has to be rejected. Additional, I observe statistically significant effects of age and income on the measurement error. The calculated reliability ratio is statistically significant less than 1 for each product category, except UHT milk. Hence, in linear regression models estimates are biased, whether the mismeasured price is an explanatory variable or it is the dependet variable regressed on age or income. Due to the non-classical measurement error the bias is rather complex. As a result of the restrictive choice of analysed articles, detected measurement error is only a lower limit. In general the measurement error and the resulting bias are expected to be larger in size.


