Database Reference
In-Depth Information
Fig. 11.3 Vertical format of
the sequence database and
fragments of the SPADE
mining process. (Adapted
from [ 3 ])
a
b
SIDEID Items
1
SID EID
1
SID EID
1
a
12
1 23
1332
2135
2445
32
4
1
2
1
2
abc
1
3
ac
1
4
d
1
5
cf
2
1
ad
2
2
c
3
2
3
bc
2
4
ae
ab
ba
3
1
ef
SID EID(a) EID(b)
SID EID(b) EID(a)
3
2
ab
1
1
2
1
2
3
3
3
df
2
1
3
2
3
4
3
4
c
3
2
5
3
5
b
4
3
5
4
1
e
4
2
g
4
3
af
aba
4
4
c
SID EID(a) EID(b)
EID(a)
4
5
b
1
1
1
2
3
4
4
6
c
2
3
pairs. For example, item “ b ” is associated with (SID, EID) pairs as follows: (1, 2),
(2, 3), (3, 2), (3, 5), (4, 5), as shown in Fig. 11.3 . This is because item b appears
in sequence 1, event 2, and so on. Frequent single items “ a ” and “ b ” can be joined
together to form a length-two subsequence by joining the same sequence_id with
event_ids following the corresponding sequence order. For example, subsequence
ab contains a set of triples (SID, EID( a ), EID( b )), such as (1, 1, 2), and so on. Fur-
thermore, the frequent length-2 subsequences can be joined together based on the
Apriori heuristic to form length-3 subsequences, and so on. The process continues
until no frequent sequences can be found or no such sequences can be formed by
such joins. The detailed analysis of the method can be found in [ 11 ].
3.2.2
SPAM
SPAM [ 12 ] uses a novel search strategy that integrates a depth-first traversal of the
search space with effective pruning mechanisms. The candidate sequences are stored
in a lexicographic tree and each sequence in the sequence tree can be considered as
either a sequence-extended sequence (via an S-step ) or an itemset-extended sequence
(via an I-step ). A sequence-extended sequence is a sequence generated by adding a
new transaction consisting of a single item to the end of its parent's sequence in the
tree. An itemset-extended sequence is a sequence generated by adding an item to the
last itemset in the parent's sequence, such that the item is greater than any item in
that last itemset.
SPAM traverses the sequence tree described above in a standard depth-first
manner. At each node, the support of each sequence-extended child and each itemset-
extended child is checked. If the support of a generated sequence s is greater than
or equal to min _ suppor t , SPAM stores that sequence and repeats DFS recursively
Search WWH ::




Custom Search