Databases Reference
In-Depth Information
Class label ambiguities in training databases;
Computational and storage constraints; and
Skewness of the databases.
Class label ambiguities naturally arise in application scenarios especially
when domain expert knowledge is sought for classifying the training data
instances. We use a belief theoretic technique for addressing this issue. It en-
ables the proposed ARM-KNN-BF classifier to conveniently model the class
label ambiguities. Each generated rule is then treated as a BoE providing
another 'piece of evidence' for purposes of classifying an incoming data in-
stance. The final classification result is based upon the fused BoE generated
by DRC. Skewness of the training data set can also create significant di -
culties in ARM because the majority classes tend to overwhelm the minority
classes in such situations. The partitioned-ARM strategy we employ creates
an approximately equal number of rules for each class label thus solving this
problem. The use of rules generated from only the nearest neighbors (instead
of using the complete rule set) enables the use of a significantly fewer number
of rules in the BoE combination stage. This makes our classifier more compu-
tationally e cient. Applications where these issues are of critical importance
include threat detection and assessment scenarios.
As opposed to the other classifiers (such as c4.5 and KNN), belief theoretic
classifiers capture a much richer information content in the decision making
stage. Furthermore, how neighbors are defined in the ARM-KNN-BF classifier
is different than the strategy employed in the KNN-BF and KNN classifiers.
Due to the fact that the rules in the ARM-KNN-BF classifier are generated via
ARM, the rules capture the associations within the training data instances.
Thus, it is able to overcome 'noise' effects that could be induced by individual
data instances. This results in better decisions. Of course. a much smaller rule
set in the classification stage significantly reduces the storage and computa-
tional requirements, a factor that plays a major role when working with huge
databases.
The work described above opens up several interesting research issues that
warrant further study. In security monitoring and threat classification, it is
essential that one errs on the side of caution. In other words, it is always better
to overestimate the threat level than under-estimate it. So, development of
strategies that overestimate threat level at the expense of under-estimating it
is warranted.
Another important research problem involves the extension of this work
to accommodate more general types of imperfections in both class labels and
features. The work described herein handles ambiguities in class labels only;
ways to handle general belief theoretic class label imperfections [28] would be
extremely useful. Development of strategies that can address general belief
theoretic imperfections in features would further enhance the applicability of
this work. Some initial work along this line appears in [11, 12].
Search WWH ::




Custom Search