Databases Reference
In-Depth Information
Apriori-TFP
1
based (i.e., [11-13,16,17], etc.), and (2) FP-tree based (i.e.,
[9, 24, 27, 39], etc.).
Mining ARs from MFIs
It is apparent that the size of a complete set of FIs can be very large. The
concept of MFI [47] was proposed to find several “long” (super) FIs in
D
T
,
which avoids the redundant work required to identify “short” FI. The con-
cept of vertical mining has also been effectively promoted in this category [54].
Vertical mining, first mentioned in [32], deals with a vertical transaction data-
base
D
T
, where each data-record represents an item that is associated with a
list of its relative transactions (the transactions in which it is present). Typ-
ical MFI algorithms include: MaxEclat/Eclat [54], MaxClique/Clique [54],
Max-Miner [47], Pincer-Search [37], MAFIA [8], GenMax [26], etc.
Mining ARs from FCIs
Algorithms belonging to this category extract ARs through generating a set
of FCIs from
D
T
. In fact the support of some sub-itemsets of an MFI might
be hard to identified resulting in a further di
culty in the computation of
confidence. The concept of FCI [44] is proposed to improve this property
of MFI, which avoids the di
culty of identifying the support of any sub-
itemsets of a relatively “long” FI. A FCI
f
is an itemset
S
∈
D
T
,where
f
f
and
f
shares a common support with
f
. The relationship between FI, MFI and FCI is that MFI
itemset
f
⊃
is frequent and
¬∃
FI [8].
In this category typical algorithms include: CLOSET [44], CLOSET+ [51],
CHARM [55], MAFIA [8], etc.
⊆
FCI
⊆
2.2 Classification Rule Mining
CRM deals with
D
C
,where
D
C
is founded as
D
R
∪
D
E
. It discovers a set of
CRs in
D
R
from which to build a classifier to classify “unseen” data records in
D
E
.A
D
R
consists of
n
data-attributes and
m
data records. By convention the
last data-attribute in each data-record usually indicates its pre-defined class,
noted as the class attribute. CRM can thus be described as the process of
assigning a Boolean value to each pair (
d
j
,c
i
)
D
E
is an “unseen” data-record,
C
as declared in Sect. 1 is a set of pre-defined
classes, and (
d
j
,c
i
) is a data-record in
D
E
to be labeled.
∈
D
E
×
C
,whereeach
d
j
∈
The Cover Algorithm
In CRM a number of approaches have been proposed to generate a classifier
from a set of training data-records. For example the Cover Algorithm [41]
takes
D
R
as its input and aims to generate a complete set of minimal non-
redundant CRs. We define the Cover algorithm in Fig. 1 as follows.
1
Apriori-TFP and its related softwares may be obtained from http://www.csc.liv.
ac.uk/frans/KDD/Software.