Database Reference
In-Depth Information
2.1
Explicit Class Labels
A first, straight-forward interpretation considers class labels just additional items
in the transactional data, i.e.
I
, and imposes a syntactical constraint on
itemsets being mined from it: each itemset has to include exactly one of those class-
label items. This has the tremendous advantage that existing techniques for FIM can
be used directly, e.g. Apriori [ 2 ], Eclat [ 40 ], or FPGrowh [ 20 ].
The typical FIM mining approach identifies interesting itemsets by using a min-
imum support threshold that itemsets' support has to exceed, and chooses relevant
rules by using a minimum confidence threshold. Since specializations of patterns,
e.g. extensions of an itemset with additional items, will have less than or equal sup-
port as the pattern itself, the search space can be pruned, allowing for exhaustive
enumeration.
This can be adapted by using class labels explicitly as items. It allows to treat
settings with more than two classes in a straightforward way:
= I C
￿
For all class labels C :
1. Mine all itemsets including C that exceed the minimum support threshold
2. Retain all association rules r
C that exceed the minimum confidence
threshold
The resulting association rules are referred to as class association rules ( car s) and
are restricted to having only the class item as their right-hand side . Their quality is
usually evaluated using confidence as in the case of general association rules:
Definition 2.3
Given a set of items
I
and a set of class labels
C
, a class association
rule is of the form r
. r is called its left-hand side (LHS), an-
tecedent, or rule body, c its right-hand side, consequent, or rule head. Its confidence
is defined as conf ( r
c , r
I
, c
C
supp D ( r c )
supp D ( r ) .
Prominent examples of classification learners that build upon class association
rules are the CBA [ 27 ] and CMAR [ 25 ] algorithms. The Harmony algorithm, in-
troduced by [ 38 ], also takes this view of class labels, as does the ART technique
[ 17 ]. As a direct application of FIM techniques, these methods are somewhat limited
by typically using only a single minimum support and confidence threshold, which
might be inappropriate in the case of skewed class distributions. They can, however,
benefit from all developments in FIM research, such as better rule quality measures
(replacing confidence), and the development of more efficient algorithms.
c )
=
2.2
Classes as Data Subsets
A second interpretation of different classes in the data is to consider each class a
separate data set and a whole database the union of those subsets:
Definition 2.4
Given a labeled data set
D C , and a set of class labels
C
, the subsets
C i C
:
D
i
={
( d , C i )
D C }
are called classes.
Search WWH ::




Custom Search