Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

The Smoothing Parameter

The smoothing parameter h is very important when computing the entropy.

In other works, [117,84], using Renyi's quadratic entropy to perform cluster-

ing, it is assumed that the smoothing parameter is experimentally selected

and that it must be fine-tuned to achieve acceptable results. Formula (6.8),

h fat =25 c/n , was proposed in [203] and showed to produce good results

in neural network classification using error entropy minimization, as men-

tioned in Sect. 6.1.1.1. For the LEGClust algorithm we need a formula that

reflects the standard deviation of the data. Following the approach described

in 6.1.1.1, a new formula, inspired on (6.7), was proposed in [198]:

h op =2 s

1

d +4

4

( d +2) n

,

(6.55)

where s is the mean value of the sample standard deviations for all d di-

mensions. All experiments with the entropic clustering algorithm reported

in [204] were performed using formula (6.55).

Although the value of the smoothing parameter is important, it is not cru-

cial to obtain good results. As we increase the h value, the kernel becomes

smoother and the entropic proximity matrix becomes similar to the Euclidian

distance proximity matrix. Extremely small values of h will produce undesir-

able behaviors because the entropy will have high variability. Using h values

in a small interval, near the h fat value, does not affect the final clustering

results.

Minimum Number of Connections

The minimum number of connections, k , to join clusters in consecutive steps

of the algorithm is the third parameter that must be chosen. One should not

use k =1to avoid outliers and noise, especially if they are located between

clusters. If the elementary clusters have a small number of points, high values

for k are also not recommended because the impossibility of joining clusters

could then arise due to lack of a su cient number of connections. Experi-

mental evidence provided in [204] shows that good results are obtained when

using either k =2or k =3.

An alternative is simply to join at each step the two clusters with the

highest number of connections between them.

Search WWH ::

Custom Search

Home