Databases Reference
In-Depth Information
TID
Item-lists
Item-vectors
1
1235
11101
01111
00111
11111
2
2345
3
345
4
12345
HIL
HIV
Tid-lists
Tid-vectors
1
2
3
4
5
12 3 4 5
1
1
11
111
1
0
1
1
11
2
1
0
00
11
4
22
3
2
4
3
4
3
11 1 11
4
4
VTL
VTV
Fig. 3. Examples of database layouts
computing the supports of itemsets is simpler and faster with the vertical
layout (VTL or VTV) since it involves only the intersections of tid-lists or
tid-vectors.
In a general point of view, rule mining algorithms employ a combination
of a traverse strategy (breadth-first or depth-first) and a form of the database
layout. Examples of algorithms for horizontal mining with a breadth strategy
previously presented are a priori, a prioriTID, DIC [3] and with a depth strat-
egy are different version applying FP-Trees [1]. Other algorithms considering
a VTL layout are Partition, with a breadth strategy [10], and Eclat, with a
depth strategy [22].
In this chapter we evaluate a compressed form of the VTV layout, improv-
ing the performance of the itemset generation, applicable to those algorithms
with an a priori-like approach.
The problem of mining frequent itemsets was first introduced by Agrawal
et al. [2]. To achieve e cient mining frequent patterns, an antimonotonic
property of frequent itemsets, called the a priori heuristic, was formulated
in [3]. The a priori heuristic can dramatically prune candidate itemsets.
A priori is a breadth-first search algorithm, with a HIL organization, that
iteratively generates two kinds of sets: C k and L k . The set L k contains the
large itemsets of size k ( k -itemsets). Meanwhile, C k is the set of candidate
k -itemsets, representing a superset of L k . This process continues until a null
set L k is generated.
The set L k is obtained scanning the database and determining the support
for each candidate k -itemset in C k . The set C k is generated from L k− 1 with
the following procedure:
C k =
{
c
|
Join ( c,L k− 1 )
Prune ( c,L k− 1 )
}
(2)
Search WWH ::




Custom Search