Information Technology Reference
In-Depth Information
is when the sequence is already encoded in
the Bitmap
which only requires the update
of
NB
table (line 04): the line corresponding to this sequence in
NB
is incremented to
maintain its frequency counting. The second case is when the sequence itself is not
present in
the Bitmap
, but may be represented in
SV
. In this case,
s
will be encoded
and inserted in
the Bitmap
as a new line, involving the update of
Index
and
NB
ac-
cordingly (lines 06 and 07). The last case is when the sequence cannot be represented
in
SV
, i.e. it is affected by the Merge-sequence-vector(
s
) function (Fig. 8.4). In this
case, the algorithm adds the process of shifting the
Bitmap
table in order to adapt
existing sequences coding to the new
SV
(line 08). Once all the data have been
encoded in this structure, new candidates (line 10) are generated (see candidates gen-
eration section) and compared to the data stored in the
Bitmap
(line 11) with a fast
access thanks to the
Index
(see support counting section).
8.3.2 Generation of the Sequence Vector
The sequence vector is generated during the unique scan of the database according to
the algorithm of Fig. 8.3, which depicts the Merge-sequence-vector function which
builds the
SV
vector according to the sequences present in the database.
Merge-sequence-vector (sequence
s
)
01
For
each item
a
of
s
--
SV
is fetched by a position pointer initially set to 0
02
SV suffix -- after the current position
03
If
(∃
b
∈
s
suffix such that
b
∈
SV
suffix)
If
a
∉
insert
a
before
b
04
Else
insert
a
at the end of
SV
05 update the current position of
SV
to the position of
a
Fig. 8.3.
Generation of the Sequence Vector
It takes as parameter a sequence s of the database. It fetches the sequence items one
by one checking their presence in SV in the right order (line 01). If a given item a is
not presents in
SV
after the current position (line 02), the function checks if their exist
an item
b
in
SV
such that
b
in located after
a
in
s
and also located after the
current position in
SV
. If those two conditions hold, item
a
will be inserted before
item
b
in
SV
(line 03). If only the first condition holds,
a
will be positioned at the end
of
SV
(line 04). The process will continue for the next item starting by the position of
item
a
in
SV
(line 05). Notice that SV will stay unchanged when the sequence items
belong to SV suffix. This holds when the current SV is sufficient to encode the new
sequence. In the example of Fig. 8.4, the arrow with value 1 shows an insertion of a
sequence HSH. This operation does not change SV. Since this sequence already exists
in the Bitmap, only the corresponding frequency in the NB table is incremented. The
arrow with value 2 shows an insertion of the sequence HWMWH. This operation
modifies the SV vector; then, shifting operations are applied to the Bitmap in order to
preserve the existing encoded sequences, and a new line is added in the Bitmap for
the new sequence. Finally, the frequency of this new sequence is set to one.