Using Association Rules for Classification from Databases Having Class Label Ambiguities: A Belief Theoretic Method - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

help of several domain experts who would classify the feature vectors (of the

instances) into different threat level classes. In such a situation, it is likely

that the domain experts would arrive at conflicting threat levels, which es-

sentially introduces ambiguities into the class labels of the instances in the

training data set. In addition, although the number of training data instances

that are classified as having a heightened threat level would likely be very

small, identification of targets possessing a heightened threat level would be

of critical importance. For example, suppose the threat classes for an airport

terminal security monitoring system are the following:

{ NotDangerous , OfConcern , Dangerous , ExtremelyDangerous }

. (1)

In the training data set, one is likely to encounter a larger number of instances

labeled as NotDangerous and very few labeled as ExtremelyDangerous .The

classification results then may be biased toward the majority class.

In essence, a classifier for such a scenario needs to effectively address the

following characteristics:

(C1) The training data set may contain ambiguities in the class labels due

to the conflicting conclusions made by different domain experts.

(C2) The computational and storage requirements should be tolerable so

that classification can be carried out in real-time.

(C3) The threat class distribution in the training data set can be highly

skewed.

In this chapter, a classifier that can effectively take into consideration the

above characteristics typical of a threat detection and assessment scenario is

proposed [31]. To address (C1), several different and effective approaches are

available, for example, rough set theory [29,30] and belief theory. The relation-

ship between belief theory and other mechanisms can be found on [8,15,17,26].

In our proposed classifier, belief theoretic notions are adopted. This is mainly

motivated by the fact that belief theory provides an easy and convenient way

for handling ambiguities. A classifier facilitated with belief theoretic notions

can improve the overall classification accuracy while providing a quantitative

'confidence interval' on the classification results.

To address (C2), the classifier is developed to operate on a rule set ex-

tracted by an ARM algorithm that has been appropriately modified to han-

dle class label ambiguities. This rule set is significantly smaller than the size

of the original database. This is the main difference between our proposed

classifier and the KNN-BF classifier in [4]. ARM has demonstrated its ca-

pability of discovering interesting and useful co-occurring associations among

data in large databases [1,14,19,25]. In the classifier mentioned in [18], it uses

a modified ARM method to extract the association rules. However, it does

not effectively address (C2).

To address (C3), the proposed ARM algorithm is applied to different par-

titions of the database where the partitioning is based on the class labels. This

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home