Using Association Rules for Classification from Databases Having Class Label Ambiguities: A Belief Theoretic Method - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

simple modification results in an algorithm that generates an approximately

equal number of rules from each class irrespective of whether it is a majority

class or not.

The rest of this chapter is organized as follows. Our proposed classifier,

which we refer to as the ARM-KNN-BF classifier , is discussed in Sect. 2; a

primer on belief theory and the strategy we employ to accommodate highly

skewed databases are also discussed in Sect. 2. Section 3 presents the experi-

mental results. Conclusion, which includes several interesting research direc-

tions, appears in Sect. 4.

2 The Proposed ARM-KNN-BF Classifier

Although ARM in its original form can be deployed for extracting rules from

large databases based on minimum support and minimum confidence condi-

tions [1], it does not effectively address all the requirements (C1-C3) stated

in Sect. 1. For example, one may develop a classifier based on rules that are

generated by simply ignoring all the training data instances possessing class

label ambiguities. But this strategy can potentially exclude a large portion

of the training data instances that would have otherwise provided extremely

crucial information. Moreover, since the training data set is highly skewed,

a classifier built on it tends to favor the majority classes at the expense of

the minority classes. Avoidance of this scenario is of paramount importance

since this could result in devastating consequences in a threat classification

environment.

As mentioned previously, we use belief theoretic notions to address (C1).

One could alleviate the computational and storage burdens (C2) as well as

the problems due to skewness of the database (C3) significantly by using a

coherent set of rules in the classifier that effectively captures the re-occurring

patterns in the database [31]. An effective ARM mechanism, as demonstrated

in [18], can produce such a set of rules.

Each stage of the proposed algorithm can be summarized as follows: The

training phase consists of partitioned ARM, rule pruning and rule refinement.

The partitioned ARM mechanism generates an approximately equal number of

rules in each class. The rule pruning and refinement processes use the training

data set to select the important rules. The Dempster-Shafer belief theoretic

notions [24] are utilized in the classification stage where a classifier that is

capable of taking certain types of ambiguities into account when classifying

an unknown instance has been introduced.

2.1 Belief Theory: An Introduction

Let Θ =

be a finite set of mutually exclusive and exhaustive

'hypotheses' about the problem domain. It signifies the corresponding 'scope

{

θ 1 ,θ 2 ,...,θn

}

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home