Recognition of Handwritten Indic Script Using Clonal Selection Algorithm - Artificial Immune Systems

Information Technology Reference

In-Depth Information

(than what was obtained using IM i ) recognition accuracy is obtained. Otherwise, IM i

is reloaded and the Phase-II terminates.

It is observed that for a few iterations of Phase-II newer versions of the immune

memory continue to produce better recognition accuracy and then there is degradation in

accuracy, signaling a negative (or over) learning in the system. In fact, instead of using

the training antigen set, a separate validation set can be used in this refinement phase.

This modification would be considered in the future extension of the present study.

Classification strategy: Classification is implemented by a k -nearest neighbor ( k -

NN) approach. For a target antigen ( ag ), k (an odd number) closest (w.r.t. ag ) memory

cells are selected from the immune memory IM . Closeness is measured by the stim

function i.e. stim ( ag , m i ) for all i , m i

IM . Next, k m i 's are grouped based on their

class labels. Class of the largest sized (a majority-voting strategy) group identifies ag .

∈

3 Experimental Details

Two different datasets (DS1 and DS2) [16] have been used to test the proposed

classification approach based on clonal selection algorithm (CSA). These datasets

DS1 and DS2 contain samples for handwritten numerals in two major Indic scripts

namely, Devanagari (Hindi) and Bengali, respectively. Unlike English, Chinese,

Japanese, etc., studies in Indic script handwriting recognition are rare and this

provides additional motivation to this present work to deal with datasets of handwrit-

ing in Indian languages. Moreover, datasets consisting of a large number of samples

for handwritten digits in Indic scripts are recently available [16] in public domain and

this facilitates training and testing of an approach and comparing it with other

competing methods.

Both the datasets contain real samples collected from different kinds of handwrit-

ten documents such as postal mails, job application forms and railway ticket reserva-

tion forms, passport application forms, etc. For our experiment, each dataset consists

of 12,000 samples (equal number of samples for each class). DS1 samples are ran-

domly selected from a collection of 22,556 Devanagari numerals written by 1049

persons and DS2 samples are taken from a set of 12,938 Bengali numerals written by

556 persons. Some samples for each digit class are shown in fig 1. The datasets are

divided are into six equal sized partitions. Training is conducted on samples from five

partitions and classification is tested on the sixth partition. This realizes a six-fold

experiment that results in six test runs. The results reported next are averaged over

these six runs.

Experiments are carried out under two different training policies, L1: training is

single pass and L2: proposed method that employs refinement process. Recognition

accuracies under these two environments are reported in Table 1 and it is observed

that L2 outperforms L1 by a significant margin. However, L2 generates a slightly

larger sized immune memory than the one produced by L1. Significant difference is

observed in the time units required for training. On a Pentium-IV (733 MHz, 128

RAM) PC, L1 takes quite less CPU time than L2 that involves additional refinement

phase. However, there is hardly any difference in the time needed for classification by

the two approaches. The system can classify about 50 characters per second. Abso lute

time units taken during training and testing are outlined in Table 2 below.

Artificial Immune Systems

Search WWH ::

Custom Search

Home