Databases Reference
In-Depth Information
3.2 Sequence Support
The concept of support is bound to dataset
D
. In particular, for a sequence
X
the support in a dataset
which
contain
X
[4]. Hence, we need to define when an input-sequence contains a
sequence. Analogously to the concept of sequence containment introduced
in Definition 3, an input-sequence
S
contains a sequence
X
when the events
in
X
match the events in
S
based on a given matching function. However,
in an input-sequence
S
events are characterized by their position within
S
.
This information can be exploited to constrain the occurrence of an arbitrary
sequence
X
in the input-sequence
S
.
Commonly considered constraints are maximum and minimum gap con-
straints and windows constraints [17, 25]. Maximum and minimum gap con-
straints specify the maximum and minimum number of events in
S
which
may occur between two consecutive events in
X
. The window constraint spec-
ifies the maximum number of events in
S
which may occur between the first
and last event in
X
. For example sequence
ADA
occurs in the input-sequence
S
=
ADCBA
, and satisfies a minimum gap constraint equal to 1, a maximum
gap constraint equal to 3 and a window constraint equal to 4.
In the following we formalize the concept of gap constrained occurrence
of a sequence into an input-sequence. Similarly to Definition 3, we introduce
a set of possible matching function to check when an input-sequence
S
in
D
is the number of input-sequences in
D
D
contains an arbitrary sequence
X
. With respect to Definition 3, these matching
functions may incorporate gap constraints. Formally, a gap constraint on a
sequence
X
and an input-sequence
S
can be formalized as
Gapθ K
,where
Gap
is the number of events in
S
between either two consecutive elements of
X
(i.e.,
maximum and minimum gap constraints), or the first and last elements of
X
(i.e., window constraint),
θ
is a relational operator (i.e.,
θ
∈{
>,
≥
,
=
,
≤
,<
}
),
and
K
is the maximum/minimum acceptable gap.
Definition 4 (Gap Constrained Subsequence).
Let X
=(
x
1
,...,x
m
)
be
an arbitrary sequence and S
=(
s
1
,...,s
l
)
an arbitrary input-sequence in
D
,
with arbitrary length m
l.LetΦ be a set of matching functions between two
arbitrary sequences, and Gap θ K be a gap constraint. Sequence X occurs in
S under the constraint Gap θ K, written as X
Φ
S, if there is a function
ϕ
≤
Φ
S and (b) depending on the constraint type, ϕ
satisfies one of the following conditions
∈
Φ such that (a) X
•∀
j
∈{
1
,...,m
−
1
}
,
(
ϕ
(
j
+1)
−
ϕ
(
j
))
≤
K, for maximum gap constraint
•∀
j
∈{
1
,...,m
−
1
}
,
(
ϕ
(
j
+1)
−
ϕ
(
j
))
≥
K, for minimum gap constraint
•
(
ϕ
(
m
)
−
ϕ
(1))
≤
K, for window constraint
When no gap constraint is enforced, the definition above corresponds to
Definition 3. When consecutive events in
X
are adjacent in input-sequence
S
,
then
X
is a
string sequence
in
S
[32]. This case is given when the maximum
gap constraint is enforced with maximum gap
K
= 1. Finally, when set
Φ
is the