Databases Reference
In-Depth Information
The next definition extends the concept of generator itemset to the do-
main of sequences. Different sequences can have the same sequential closure,
i.e., they are represented in the same closed sequence. Among the sequences
with the same sequential closure, the shortest sequences are called generator
sequences.
Definition 7 (Generator Sequence).
An arbitrary sequence X in
D
is a
generator sequence iff there is not a sequence Y in
D
such that (i) Y
Ψ
X
and (ii)sup
Φ
(
X
)=
sup
Φ
(
Y
)
.
Special cases of the above definitions are the
contiguous closed sequence
and the
contiguous generator sequence
, where the matching functions in set
Ψ
define a contiguous subsequence relation. Instead, we have an
unconstrained
closed sequence
and an
unconstrained generator sequence
when
Ψ
defines an
unconstrained subsequence relation.
Knowledge about generators associated to a closed sequence
X
allow
generating all sequences having
X
as sequential closure. For example, let
closed sequence
X
be associated to a generator sequence
Z
.Consideran
arbitrary sequence
Y
with
Z
Ψ
X
. Then,
X
is the sequen-
tial closure of
Y
. From Property 2, it follows that
sup
Φ
(
Z
)
Ψ
Y
and
Y
≥
sup
Φ
(
Y
)and
sup
Φ
(
Y
)
sup
Φ
(
X
). Being
X
the sequential closure of
Z
,
Z
and
X
have
equal support. Hence,
Y
has the same support as
X
. It follows that sequence
X
is the sequential closure of
Y
according to Definition 6.
In the example dataset,
ADBA
is a contiguous closed sequence with sup-
port 33.33% under the maximum gap constraint 2.
ADBA
represents con-
tiguous sequences
BA
,
DB
,
DBA
,
ADB
,
ADBA
which satisfy the same gap
constraint.
BA
and
DB
are contiguous generator sequence for
ADBA
.
In the context of association rules, an arbitrary itemset has a unique clo-
sure. The property of uniqueness is lost in the sequential pattern domain.
Hence, for an arbitrary sequence
X
the sequential closure can include sev-
eral closed sequences. We call this set the
closure sequence set
of
X
, denoted
CS
≥
(
X
). According to Definition 6, the sequential closure for a sequence
X
is
defined based on the pair of matching functions (
Ψ,Φ
). Being a collection of
sequential closures, the closure sequence set of
X
is defined with respect to
the same pair (
Ψ,Φ
).
Property 3.
Let X be an arbitrary sequence in
D
and
CS
(
X
)
the set of
sequences in
which are the sequential closure of X. The following properties
are verified. (i) If X is a closed sequence, then
D
CS
(
X
)
includes only sequence
X. (ii) Otherwise,
CS
(
X
)
may include more than one sequence.
In Property 3, case (i) trivially follows from Definition 5. We prove case (ii)
by means of an example. Consider the contiguous closed sequences
ADCA
and
ACA
, which satisfy maximum gap 2 in the example dataset. The generator
sequence
C
is associated to both closed sequences. Instead,
D
is a generator
only for
ADCA
. From Property 3 it follows that a generator sequence can
generate different closed sequences.