To search, Click below search items.


All Published Papers Search Service


Hierarchical Recognition System for Machine Printed Kannada Characters


Dinesh Achaya U, N V Subba Reddy, Krishnamoorthi


Vol. 8  No. 11  pp. 44-53


Extensive research has been done on optical character recognition in the last few decades. Most of the efforts were made to develop OCR systems for foreign languages like English, Japanese, Roman and Arabic characters. Many commercial OCR systems for these foreign languages are available in the market. In the context of Indian languages, majority of work is reported on Hindi and Bangla. And very few reports are available on South Indian languages. This paper describes a character recognition system that can handle machine printed text documents in Kannada, which is the official language of the South Indian state of Karnataka. Initially, the scanned image is preprocessed to remove noise. Lines, words and character components are segmented using two-stage segmentation technique. Classification of the character components is done in two stages. In the first stage, the character components are grouped into small subsets by a feature based tree classifier. In the second stage, characters in each group are recognized using a nearest neighbor classifier. We adopted this hybrid approach instead of using only a tree classifier because it is nearly impossible to find a set of stroke features that are simple to compute, robust and reliable to detect, and are sufficient to classify a large number of basic and complex shaped compound characters. The system is tested with the data set containing 8400 characters of different font and size. On average, the system recognizes characters with an accuracy of about 92.68%.


Character recognition, Structural features, Direction code, Binary decision tree, k-Nearest Neighbor, Multi-stage classifier