Information Technology Reference
In-Depth Information
The Smoothing Parameter
The smoothing parameter h is very important when computing the entropy.
In other works, [117,84], using Renyi's quadratic entropy to perform cluster-
ing, it is assumed that the smoothing parameter is experimentally selected
and that it must be fine-tuned to achieve acceptable results. Formula (6.8),
h fat =25 c/n , was proposed in [203] and showed to produce good results
in neural network classification using error entropy minimization, as men-
tioned in Sect. 6.1.1.1. For the LEGClust algorithm we need a formula that
reflects the standard deviation of the data. Following the approach described
in 6.1.1.1, a new formula, inspired on (6.7), was proposed in [198]:
h op =2 s
1
d +4
4
( d +2) n
,
(6.55)
where s is the mean value of the sample standard deviations for all d di-
mensions. All experiments with the entropic clustering algorithm reported
in [204] were performed using formula (6.55).
Although the value of the smoothing parameter is important, it is not cru-
cial to obtain good results. As we increase the h value, the kernel becomes
smoother and the entropic proximity matrix becomes similar to the Euclidian
distance proximity matrix. Extremely small values of h will produce undesir-
able behaviors because the entropy will have high variability. Using h values
in a small interval, near the h fat value, does not affect the final clustering
results.
Minimum Number of Connections
The minimum number of connections, k , to join clusters in consecutive steps
of the algorithm is the third parameter that must be chosen. One should not
use k =1to avoid outliers and noise, especially if they are located between
clusters. If the elementary clusters have a small number of points, high values
for k are also not recommended because the impossibility of joining clusters
could then arise due to lack of a su cient number of connections. Experi-
mental evidence provided in [204] shows that good results are obtained when
using either k =2or k =3.
An alternative is simply to join at each step the two clusters with the
highest number of connections between them.
 
Search WWH ::




Custom Search