Information Technology Reference
In-Depth Information
3.3
Variability Due to Sampling
One first question to be addressed is whether the entropy functions are stable
from one randomly-generated matrix to another, a fundamental question of the
signal-to-noise ratio of the functions being studied. To this end, we have run two
experiments.
- In the first experiment, we assume a background density of 5% for random
connections, we assume overall network sizes of 1000 to 10000 nodes in in-
crements of 1000 nodes, and we assume a cluster size of 1000 nodes with a
cluster density of 80%.
- In the second experiment, we assume a network size fixed at 10000 nodes,
a single cluster of 1000 to 5000 nodes in increments of 1000 nodes, and the
same background and cluster densities as in the first experiment.
In both cases we do ten iterations and compute the Shannon entropy (equation
(1)), the Renyi entropy (equation (3)), the mutual Shannon entropy for q =2
(equation (2)), the mutual Renyi entropy for q = 2 (equation (4)), and the
difference between the latter two mutual entropies.
We did not conduct a thorough statistical analysis, because this did not
seem necessary. If we naively compute the difference between the maximum and
minimum values and divide by the average value with each parameter setting,
we obtain a measure of the relative error from using different random samples
but with all other variables held constant.
The result of both experiments seems to be that the differences arising from
sampling are very small. There were a few instances in which this relative error
was as large as 2 . 0
10 4 , but for the most part the relative errors were even
smaller than this, often less than 10 6 . As long as the predictive use of entropy
as an indicator of anomaly is based on observed changes significantly larger than
1 in 10000, say, we would not expect sampling variations to have a significant
effect.
×
3.4
The Entropy Functions Themselves
We turn next to the entropy functions themselves.
Single-Cluster Matrices: In our first simulation we computed entropies for
all matrices with
- network size 100 to 1000 in increments of 20, with constant background
density 0 . 05
- a single cluster of size 20 to 1000 in increments of 5
- cluster densities from 0 . 50 to 0 . 80 in increments of 0 . 10
We present a plot that provides a heuristic view of the functions. Figure 1 is of
the standard Shannon entropy for cluster densities 0 . 80. The plot is quite similar
for different cluster densities and for the Renyi entropy for various densities. As
one would expect, the entropy is high for networks in which either few or most
Search WWH ::




Custom Search