Information Technology Reference
In-Depth Information
The motivation for the assumption of having multiple types of outliers in
a certain monitored data, stems from the domain of web-based attacks from
remote computers that can be for example of multiple types of Trojans (we have
a huge dataset that includes for example temporal data flow consisting of around
13 types of Trojans). To this end, we propose a two stage expertise based fusion
protocol:
1. O ine Stage: Identify groups/clusters of outliers within an initial data set
andassociateexpertADAswitheachoftheoutlierclusters,basedontheir
classification/decision on the instances of the initial data set. Namely, if an
ADA exists, such that in most cases it has correctly classified a certain outlier
type than it will be considered an expert for that type of outlier. Moreover
a certain ADA may be found to be expert for multiple outlier types. At the
end of this phase we should have a set of outlier types and for each such
outlier type we should have a list (which might be empty) of ADAs that are
associated with it and are assumed to be expert in detecting it.
2. Online Phase: For any new given instance, identify its nearest outlier clus-
ter/type, then using an expertise based weighting function combine
the decisions, in order to reach the final decision/score. The expertise based
weighting function aims to promote the decision of the ADAs that were
found to be experts for the given instance's type. Thus, we aim to achieve a
more accurate decision.
The oine stage may be performed using either supervised or unsupervised
methods. The motivation for using an unsupervised oine stage is the common
assumption that in some environments (e.g., big data) in which anomaly de-
tection algorithms perform, the anomalies are not expected and are unknown;
therefore it is impossible to assume we have tagged or classified data that can
be used.
For the supervised case we propose to reveal the list of anomaly types (if
available). Next, for each anomaly type and for each instance of it in the available
data, compare the score/decision of each member of the ensemble (i.e., an ADA
member) to the accurate decision. Each ADA algorithm that was found to have a
relatively high performance with regard to a certain outlier type will be referred
to as an expert for that particular anomaly type.
For the unsupervised case, on the other hand, the process of identifying ex-
perts is much more complicated. In particular, in order to overcome the fact
that the initial data set is not classified/tagged we will follow a procedure that
was proposed by Schubert et al. [6]. According to the initialization procedure
of Schubert in order to identify the anomaly instances we will take the k top
scored instances according to each ADA. Next, we collect the instances identified
by each ADA (using the union set) to create the group of outliers. Once this
initialization classification is derived, we continue with identifying the expertise
of each ADA similar to the supervised case.
In the online phase, for each new given instance, E we propose to calculate
the nearest anomaly cluster and this similarity measurement s and the specific
 
Search WWH ::




Custom Search