Information Technology Reference
In-Depth Information
(than what was obtained using IM i ) recognition accuracy is obtained. Otherwise, IM i
is reloaded and the Phase-II terminates.
It is observed that for a few iterations of Phase-II newer versions of the immune
memory continue to produce better recognition accuracy and then there is degradation in
accuracy, signaling a negative (or over) learning in the system. In fact, instead of using
the training antigen set, a separate validation set can be used in this refinement phase.
This modification would be considered in the future extension of the present study.
Classification strategy: Classification is implemented by a k -nearest neighbor ( k -
NN) approach. For a target antigen ( ag ), k (an odd number) closest (w.r.t. ag ) memory
cells are selected from the immune memory IM . Closeness is measured by the stim
function i.e. stim ( ag , m i ) for all i , m i
IM . Next, k m i 's are grouped based on their
class labels. Class of the largest sized (a majority-voting strategy) group identifies ag .
3 Experimental Details
Two different datasets (DS1 and DS2) [16] have been used to test the proposed
classification approach based on clonal selection algorithm (CSA). These datasets
DS1 and DS2 contain samples for handwritten numerals in two major Indic scripts
namely, Devanagari (Hindi) and Bengali, respectively. Unlike English, Chinese,
Japanese, etc., studies in Indic script handwriting recognition are rare and this
provides additional motivation to this present work to deal with datasets of handwrit-
ing in Indian languages. Moreover, datasets consisting of a large number of samples
for handwritten digits in Indic scripts are recently available [16] in public domain and
this facilitates training and testing of an approach and comparing it with other
competing methods.
Both the datasets contain real samples collected from different kinds of handwrit-
ten documents such as postal mails, job application forms and railway ticket reserva-
tion forms, passport application forms, etc. For our experiment, each dataset consists
of 12,000 samples (equal number of samples for each class). DS1 samples are ran-
domly selected from a collection of 22,556 Devanagari numerals written by 1049
persons and DS2 samples are taken from a set of 12,938 Bengali numerals written by
556 persons. Some samples for each digit class are shown in fig 1. The datasets are
divided are into six equal sized partitions. Training is conducted on samples from five
partitions and classification is tested on the sixth partition. This realizes a six-fold
experiment that results in six test runs. The results reported next are averaged over
these six runs.
Experiments are carried out under two different training policies, L1: training is
single pass and L2: proposed method that employs refinement process. Recognition
accuracies under these two environments are reported in Table 1 and it is observed
that L2 outperforms L1 by a significant margin. However, L2 generates a slightly
larger sized immune memory than the one produced by L1. Significant difference is
observed in the time units required for training. On a Pentium-IV (733 MHz, 128
RAM) PC, L1 takes quite less CPU time than L2 that involves additional refinement
phase. However, there is hardly any difference in the time needed for classification by
the two approaches. The system can classify about 50 characters per second. Abso lute
time units taken during training and testing are outlined in Table 2 below.
Search WWH ::




Custom Search