Information Technology Reference
In-Depth Information
if less than 1% of data instances are within a cluster it is labeled as anoma-
lous. We leverage the distance notation from Formula 1 to d ( v i ,C )todenote
the distance from a feature vector to a cluster (represented by its centroid). The
algorithm to cluster the fingerprints, as described in [24, 9], consists of 3 steps:
1. The set S of clusters is first initialized to the empty set.
2. A fingerprint v i =( v i ,..., v in− 1 ) is taken from the set of fingerprints (unlabeled
set of fingerprints).
IF The set S is still empty then the fingerprint will create a new cluster C and
v i will be the centroid.
ELSE The cluster C with the smallest distance is selected arg min
C∈S ( d ( v i ,C ))
such that the fingerprint does not surpass the maximal width. If such a cluster
is found, the fingerprint is inserted, otherwise a new cluster is generated and
v i will be the centroid.
3. The second step is repeated for all remaining fingerprints.
Fig. 3. Clustering as described in [24, 9]
Detecting Abnormal Entities and False-Positives. Clusters containing less
fingerprints than the user-specified threshold are automatically labeled as outliers.
The fingerprints within, and their entities they represent, are then also labeled
as anomalous. For each entity there are two possibilities for creating an anomaly
alert, (i) either through a change of behaviour from itself, or (ii) by being substan-
tially different from other entities of the same type. The idea behind (i) is that the
system collects fingerprints for a single entity over an amount of time, i.e., hours
or months, and clusters them. If an entity did not change its behaviour, its fin-
gerprints are in the same dense cluster c . The more changes an entity undergoes
(stored in the behavioural profile) the more the fingerprints change. Eventually
the generated fingerprint surpasses the distance to the centroid of c and results in
an anomaly alert. In case of (ii) fingerprints are used to compare entities among
each other. A user, who exhibits a significant different usage pattern, creates his
own cluster and is labeled as anomalous. In case a new user, service, or host is
introduced to the system, it can be determined automatically if said entity is ab-
normal or not, simply by comparing its fingerprint.
Through the use of the domain model, entities are put in relation to each
other, i.e., users to hosts, or services to workflows. Anomalies are, thus, put into
context and alerts propagated upwards. For instance, abnormal services, hosts,
and users, determine the security status of the assigned workflows. Vice-versa,
drilling down on an abnormal workflow (e.g., too much network trac or too
many document queries), exposes abnormal entities, e.g., anomalous services,
users, hosts, and speeds-up root-cause analysis.
It is possible, even likely, that some clusters that are detected anomalous are
actually not anomalous. Groups of fileservers will, for instance, have different
fingerprints than mail servers or timeservers. It is therefore important to consider
various degrees of optimization to prevent false-positives. There are a couple
of options, since the clustering algorithm is parameterized by two variables,
Search WWH ::




Custom Search