Information Technology Reference
In-Depth Information
1. The overall size of the network. This is the number of rows (also of columns)
in the matrix C . We will study sizes ranging from approximately 100 up
through approximately 10000.
2. The background density. This is the probability that one node will be con-
nected to another at random. We will assume until shown to be in error that
these probabilities will fall in the range 0 . 05 to about 0 . 15.
3. The number of clusters in the network.
4. The sizes of the clusters.
5. The densities of the clusters.
It is the latter three variables that require justification. We assume that in a
large network, such as a university, that departments, colleges, and other units
will appear in the connectivity matrix as clusters, because the nodes in these
units will have reasons to be communicating with each other more frequently
than would be observed for the background random activity. If one were to
have complete information about the network trac (this would require an NP-
complete computation to be done), then one could, for any chosen threshold that
would define a cluster, rearrange the matrix C into a block-diagonal form. In the
absence at present of any real data contradicting the assumption, we will assume
that the number of clusters of a given size will have a Zipf-like distribution and
will vary inversely with the size of the clusters, and we will generate simulated
data accordingly. For our initial experiments we have chosen cluster densities in
the range of 0 . 50 to 0 . 90. We have chosen initially to study two types of cluster
structure. The first is a single cluster of varying size that could in fact be the
entire network. This follows the mode of Gudkov et al. in looking at difference of
entropies for a single cluster as it grows from a small size eventually to become
the entire network. The second study is motivated by an assumption about how
C might change for a network experiencing an anomaly. We begin with a series
of clusters of decreasing size, computing the entropies as we go, to establish the
parameters for a “normal” state. We then introduce a moderately large cluster
(on the order of 10% of the entire network) that we might postulate to arise
from a newly-infected computer that has begun an attack.
3.2 The Software Artifact
A brief description of the software is in order. Our program takes as input a
set of parameters that includes the matrix size, the background density, and the
number, size, and density of the clusters to be simulated. Calls to rand() are
made to fill in the background of a symmetric matrix of the appropriate density,
and the background entropy is computed. Following this, the simulated clusters
are added one at a time and the entropy recomputed. An overall outer loop
controls the number of such tests to be made. Any of the entropy calculations
themselves are simply effected by a double loop through the rows and columns of
the matrix (which is for programming convenience represented in dense matrix
form). The code was written for simplicity and flexibility, not for performance,
and since even for the larger matrices the running times were at worst in minutes,
we made no attempt to improve the eciency of the code if that would have
added complexity and/or decreased the flexibility.
Search WWH ::




Custom Search