Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Let U =( X ds ,T ds ) denote the training set extracted from a given dataset

and U j the subsets obtained by randomly dividing U into several groups with

an equal number of samples, such that

L

j =1 |

|

U

|

= r +

U j |

,

(6.10)

where L is the number of subsets and r the remainder. This division is per-

formed in each epoch of the learning phase. The partition of the training

set into subsets reduces the probability of the algorithm getting trapped in

local minima since it is performed in a random way. The subsets are sequen-

tially presented to the learning algorithm, which applies to each one, in batch

mode, the respective back-propagation and subsequent weight update.

One of the advantages of using the batch-sequential algorithm is the de-

crease of algorithm complexity. The complexity of the original MEE algo-

rithm is O (

2 ); with the MEE-BS algorithm the complexity is proportional

|

U

|

/L ) 2 , which means that, in terms of computational time, one achieves

a reduction proportional to L . The number of subsets, L , is influenced by the

size of the training set. One should avoid using subsets with a number of

sampleslessthan40 [202].

Classifier performance using the MEE-BS algorithm seems to be quite

insensitive to "reasonable" choices of the number of subsets, L . Table 6.3

shows the best results in experiments reported in [202] on four real-world

datasets. No statistically significant variation of the error rate statistics (mean

and standard deviation in 20 repetitions of the hold-out method) were found

for two different values of L . The same conclusion was found to hold when

the number of epochs and of hidden neurons used in these experiments were

varied. In what concerns the processing time per epoch, as also shown in

Table 6.3, the MEE-BS algorithm was found to be up to six times faster than

the MEE-VLR algorithm.

The batch-sequential algorithm can also be implemented with variable

learning rate. However, the simple "global" updating rule described in the

previous section cannot be applied. The reason for this is easy: since in MEE-

VLR we compare the error entropy of a certain epoch with its value in the

previous one for the same samples, we cannot apply it to the batch-sequential

algorithm because, in each epoch, we use different sets.

Instead of using the simple procedure described in the preceding section

we may employ the variation of the learning rate in such a way that it is done

by comparing the respective gradient in consecutive iterations. Two learning

rate updating rules can then be incorporated into the MEE-BS algorithm:

either the Silva and Almeida's rule [210] (MEE-BS(SA)) or the resilient back-

propagation rule [186] (MEE-BS(RBP)). Both variants of the MEE-BS algo-

rithm are described in detail in [202]. An example of the training phase using

the three methods (MEE-BS, MEE-BS(SA), and MEE-BS(RBP)) is shown

in Fig. 6.9.

to L (

|

U

|

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home