Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Apriori-TFP 1 based (i.e., [11-13,16,17], etc.), and (2) FP-tree based (i.e.,

[9, 24, 27, 39], etc.).

Mining ARs from MFIs

It is apparent that the size of a complete set of FIs can be very large. The

concept of MFI [47] was proposed to find several “long” (super) FIs in D T ,

which avoids the redundant work required to identify “short” FI. The con-

cept of vertical mining has also been effectively promoted in this category [54].

Vertical mining, first mentioned in [32], deals with a vertical transaction data-

base D T , where each data-record represents an item that is associated with a

list of its relative transactions (the transactions in which it is present). Typ-

ical MFI algorithms include: MaxEclat/Eclat [54], MaxClique/Clique [54],

Max-Miner [47], Pincer-Search [37], MAFIA [8], GenMax [26], etc.

Mining ARs from FCIs

Algorithms belonging to this category extract ARs through generating a set

of FCIs from D T . In fact the support of some sub-itemsets of an MFI might

be hard to identified resulting in a further di culty in the computation of

confidence. The concept of FCI [44] is proposed to improve this property

of MFI, which avoids the di culty of identifying the support of any sub-

itemsets of a relatively “long” FI. A FCI f is an itemset S

∈

D T ,where f

f and f shares a common support with

f . The relationship between FI, MFI and FCI is that MFI

itemset f ⊃

is frequent and

¬∃

FI [8].

In this category typical algorithms include: CLOSET [44], CLOSET+ [51],

CHARM [55], MAFIA [8], etc.

⊆

FCI

⊆

2.2 Classification Rule Mining

CRM deals with D C ,where D C is founded as D R ∪

D E . It discovers a set of

CRs in D R from which to build a classifier to classify “unseen” data records in

D E .A D R consists of n data-attributes and m data records. By convention the

last data-attribute in each data-record usually indicates its pre-defined class,

noted as the class attribute. CRM can thus be described as the process of

assigning a Boolean value to each pair ( d j ,c i )

D E

is an “unseen” data-record, C as declared in Sect. 1 is a set of pre-defined

classes, and ( d j ,c i ) is a data-record in D E to be labeled.

∈

D E ×

C ,whereeach d j ∈

The Cover Algorithm

In CRM a number of approaches have been proposed to generate a classifier

from a set of training data-records. For example the Cover Algorithm [41]

takes D R as its input and aims to generate a complete set of minimal non-

redundant CRs. We define the Cover algorithm in Fig. 1 as follows.

1 Apriori-TFP and its related softwares may be obtained from http://www.csc.liv.

ac.uk/frans/KDD/Software.

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home