Character segmentation of Assamese printed and handwritten words using classifier-based sliding window technique

Amlan Jyoti Basumatari


Development of optical character recognition for Indian scripts has been an active area of research. Despite ample amount of independent research, there are only a few available commercial applications. The reason behind this is the complex nature of these scripts which leads to poor segmentation accuracy even when isolated character recognition accuracy is very high. This paper explores the area of character segmentation and proposes an innovative character segmentation scheme for Assamese word images, both printed and handwritten, which operates in a sliding window based mechanism taking aid of a classifier. The method extends the conventional role of Support Vector Machine (SVM) classifiers and makes them useful in segmentation also. Here, a small window is passed over the word image and word segment inside the current window is fed to the trained SVM. Based on the probability estimate given by the SVM, segmentation points are determined. When probability estimate is higher than a predefined threshold it is assumed that the current window holds a segmentation point. Otherwise the size of window is incremented and again fed to the SVM. This process is repeated until the window passes over the entire word. When tested on self-made datasets the system achieved character level accuracies of 87.48% and 82.07% respectively for printed and handwritten words. The technique fails to work where slanted characters are present.


Character segmentation, Support vector machine, Sliding window technique, Assamese script.

