Database Reference
In-Depth Information
4th scan, 6 candidates
4 length-4 sequential patterns
......
<a(bc)a>
<(ab)dc>
<efbc>
3rd scan, 64 candidates
21 length-3 sequential patterns
13 candidates not appear in database at all
......
<aab>
<a(ab)>
<aac>
2nd scan, 51 candidates
.
.
.
.
.
.
......
.
.
.
.
.
.
<aa>
<ab>
<af>
<ba>
<bb>
<ff>
<(ab)>
<(ef)>
22 length-2 sequential patterns
9 candidates not appear in database at all
1st scan, 7 candidates
<a>
<b>
<c>
<d>
<e>
<f>
<g>
6 length-1 sequential patterns
Candidate cannot pass support threshold
Candidate does not appear in database at all
Fig. 11.2 Candidates, candidate generation, and sequential patterns in GSP. (Adapted from [ 3 ])
item: support ”):
a
:4,
b
:4,
c
:4,
d
:3,
e
:3,
f
:3,
g
:1.
By
filtering
the
infrequent
item g ,
we
obtain
the
first
seed
set L 1
=
{
, with each member in the set representing a 1-element
sequential pattern. Each subsequent pass starts with the seed set found in the previ-
ous pass and uses it to generate new potential sequential patterns, called candidate
sequences.
For L 1 , a set of 6 length-1 sequential patterns generates a set of 6
a
,
b
,
c
,
d
,
e
,
f
}
×
6
+
6
×
5
=
51 candidate sequences, C 2
={
aa
,
ab
, ... ,
af
,
ba
,
bb
, ... ,
ff
,
2
.
The multi-scan mining process is shown in Fig. 11.2 . The set of candidates is
generated by a self-join of the sequential patterns found in the previous pass. In the
k -th pass, a sequence is a candidate only if each of its length-( k -1) subsequences is
a sequential pattern found at the ( k -1)-th pass. A new scan of the database collects
the support for each candidate sequence and finds the new set of sequential patterns.
This set becomes the seed for the next pass. Clearly, the number of scans is at least
the maximum length of sequential patterns. It needs one more scan if the sequential
patterns obtained in the last scan still generate new candidates.
GSP, though benefits from the Apriori pruning, still generates a large number of
candidates. In this example, 6 length-1 sequential patterns generate 51 length-2 can-
didates, 22 length-2 sequential patterns generate 64 length-3 candidates, etc. Some
candidates generated by GSP may not appear in the database at all. For example, 13
out of 64 length-3 candidates do not appear in the database.
( ab )
,
( ac )
, ... ,
( ef )
}
3.1.3
PSP
PSP [ 10 ] is another Apriori-based algorithm, which resumes the general procedures
of GSP [ 9 ] but utilizes a different hierarchical structure for organizing candidate
Search WWH ::




Custom Search