Information Technology Reference
In-Depth Information
example, the cell number 5 (with value 6) corresponds to the line number 6 of the first
sequence of size 5 encoded in the Bitmap. The table NB stores the frequency of each
distinct sequence in the database. Thus the sequence HSMH occurs 20 times in the
database.
In the first stage of the algorithm, INDEX, SV , NB and the Bitmap are built on the
fly during one pass. At each insertion of a sequence, the Bitmap matrix may become
larger, and a set of shifting operations are applied to the bit values stored in this table.
H = Home
W = Work
S = School
M = Market
R = Restaurant
SV
H W S M H R W H
Index
Bitmap
NB
6
1 1 0 0 0 1 1 1
15
20
30
40
60
100
1 0 1 1 0 0 0 1
5
1 0 1 0 0 0 0 1
1 0 1 0 0 1 0 0
3
2
0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1
Fig. 8.1. The data structure
IBM ( sequence database DB , threshold t )
01 SV =
-- SV empty at the beginning
02 For each sequence s in DB
03 Merge-sequence-vector ( s ) -- merge s with SV
04 Update NB
05 If s
φ
IBM
06
Encode and Insert s in the IBM
07
Update Index
Shift the IBM table according to the new SV
09 k = 1
10 Gen-frequent-sequences ( t ) -- check the frequent of size 1
11 While there exist a frequent sequence of size k
12 k = k +1
13 Generate Ck
14 Gen-frequent-sequences ( t )
Fig. 8.2. IBM algorithm
Fig. 8.2 shows the general IBM algorithm that takes as parameters: the sequence
database DB and a threshold t . This value ( t ) stands for the minimum subsequences
frequency taken into account for the generation of the candidates. Then for each se-
quence s that it reads from the database during the scan, SV (line 03) is generated
using a merging process (as detailed in Fig. 8.3). Three situations may hold. The first
Search WWH ::




Custom Search