Databases Reference
In-Depth Information
contains a set of items I
each T i ∈T
I . In ARM, two threshold values are
usually used to determine the significance of an AR:
Support: The frequency that the items occur or co-occur in
. A support
threshold σ , defined by the user, is used to distinguish frequent items from
the infrequent ones. A set of items S is called an itemset, where S
T
I ,
and
a i
S co-occur in
T
. If the occurrences of some S in
T
exceeds σ ,
we say that S is a Frequent Itemset (FI).
Confidence: Represents how “strongly” an itemset X implies another item-
set Y ,where X,Y
. A confidence threshold α ,
supplied by the user, is used to distinguish high confidence ARs from low
confidence ARs.
I and X
Y =
{}
Y is valid when the support for the co-occurrence of X and
Y exceeds σ , and the confidence of this AR exceeds α . The computation of
support is X Y
|T |
An AR X
,where
|T |
is the size function of the set
T
. The computation
of confidence is Support ( X Y )
Support ( X ) . Informally, X ⇒ Y can be interpreted as “if
X exists, it is likely that Y also exists”. With regards to the history of ARM
investigation, three major categories of serial (non-parallel) ARM algorithms
can be identified: (1) mining ARs from all possible FIs, (2) mining ARs from
Maximal Frequent Itemsets (MFIs), and (3) mining ARs from Frequent Closed
Itemsets (FCIs).
Mining ARs from FIs
In the past decade, many algorithms have been introduced that mine ARs
from identified FIs. These algorithms can be further grouped into different
“families”, such as Pure-apriori like, Semi-apriori like, Set Enumeration Tree
like, etc.
Pure-apriori like where FIs are generated based on the generate-prune
level by level iteration that was first promulgated in the Apriori algo-
rithm [2]. In this “family” archetypal algorithms include: Apriori, Apriori-
Tid and AprioriHybrid [2], Partition [49], DHP [43], Sampling [50], DIC [7],
CARMA [31], etc.
Semi-apriori like where FIs are generated by enumerating candidate item-
sets but do not apply the Apriori generate-prune iterative approach
founded on (1) the join procedure, and (2) the prune procedure that em-
ploys the closure property of itemsets - if an itemset is frequent then all
its subsets will also be frequent; if an itemset is infrequent then all its su-
persets will also be infrequent. In this “family” typical algorithms include:
AIS [1], SETM [33], OCD [40], etc.
Set Enumeration Tree like where FIs are generated through constructing
a set enumeration tree structure [48] from D T , which avoids the need
to enumerate a large number of candidate itemsets. In this “family” a
number of approaches can be further divided into two main streams: (1)
Search WWH ::




Custom Search