Database Reference
In-Depth Information
Fig. 2.4
Execution tree of
Apriori
algorithm
node is potentially of the order of the total number of items. An example of such an
implementation is provided in [
12
], and it seems to work quite well. An algorithm
that shares some similarities to the
Apriori
method, was independently proposed in
[
44
], and subsequently a combined work was published in [
3
].
Figure
2.4
illustrates the execution tree of the join-based
Apriori
algorithm over
the toy transaction database mentioned in Table
2.1
for minimum support value 3.
As mentioned in the pseudocode of
Apriori
, a candidate
k
-patterns are generated
by joining two frequent itemset of size (
k
−
1). For example, at level 3, the pattern
{
. After generating the candidate
patterns, the support of the patterns is computed by scanning every transaction in
the database and determining the frequent ones. In Fig.
2.4
, a candidate patterns is
shown in a box along with its support value. A frequent candidate is shown in a solid
box, and an infrequent candidate is shown in a dotted box. An edge represents the
join relationship between a candidate pattern of size
k
and a frequent pattern of size
(
k
a
,
b
,
c
}
is generated by joining
{
a
,
b
}
and
{
a
,
c
}
1) such that the latter is used to generate the earlier. The figure also illustrates the
fact that a pair of frequent patterns are used to generate a candidate pattern, whereas
no candidates are generated from an infrequent pattern.
−
2.1.1
Apriori Optimizations
Numerous optimizations were proposed for the
Apriori
algorithm [
1
] that are referred
to as
AprioriTid
and
AprioriHybrid
respectively. In the
AprioriTid
algorithm, each
transaction is replaced by a shorter transaction or null transaction) during the
k
th
phase. Let the set of
k
+
1-candidates in
C
k
+
1
that are contained in transaction
T
be
denoted by
R
(
T
,
C
k
+
1
). This set
R
(
T
,
C
k
+
1
) is added to a newly created transaction
T
k
. If the set
database
C
k
+
1
) is null, then clearly, a number of different tradeoffs
exist with the use of such an approach.
R
(
T
,