Databases Reference
In-Depth Information
3.4 Specializations of the Sequential Classification Framework
In the following we discuss some specializations of our ( Ψ,Φ )-constrained
framework for sequential classification rule mining. They correspond to partic-
ular cases of constrained framework for sequence mining proposed in previous
works [5, 17, 25]. Each specialization is obtained from particular instances of
function sets Ψ and Φ .
Containment between two arbitrary sequences is commonly defined by
means of either the unconstrained subsequence relation or the contiguous
subsequence relation. In the former, set Ψ is the complete set of all possible
matching functions. In the latter, set Ψ includes all matching functions in the
form ψ ( j )= offset + j . It can be easily seen that both notions of sequence
containment satisfy Property 1.
Commonly considered constraints to define the containment between an
input-sequence S and a sequence X are maximum and minimum gap con-
straints and window constraint. The gap constrained occurrence of X within
S is usually formalized as X
S and X satisfies the gap constraint in S .
Hence, in relation X
Φ S ,set Φ is the universe of all possible matching
functions and X satisfies Gap θ K in S .
Window constraint . Between the first and last events in X the gap is
lower than (or equal to) a given window-size. It can be easily seen that an
arbitrary subsequence of X is contained in S within the same window-size.
Thus, Property 2 is verified. In particular, Property 2 is verified both for
unconstrained and contiguous subsequence relations.
Minimum gap constraint . Between two consecutive events in X the gap is
greater than (or equal to) a given size. It directly follows that any pair of
non-consecutive events in X also satisfy the constraint. Hence, an arbitrary
subsequence of X is contained in S within the minimum gap constraint.
Thus, Property 2 is verified. In particular, Property 2 is verified both for
unconstrained and contiguous subsequence relations.
Maximum gap constraint . Between two consecutive events in X the gap is
lower than (or equal to) a given gap-size. Differently from the two cases
above, for an arbitrary pair of non-consecutive events in X the constraint
may not hold. Hence, not all subsequences of X are contained in input-
sequence S . Instead, Property 2 is verified when considering contiguous
subsequences of X .
The above instances of our framework find application in different con-
texts. In the biological application domains, some works address finding DNA
sequences where two consecutive DNA symbols are separated by gaps of more
or less than a given size [36]. In the web mining area, approaches have been
proposed to predict the next web page requested by the user. These works
analyze web logs to find sequences of visited URLs where consecutive URLs
are separated by gaps of less than a given size or are adjacent in the web log
(i.e., maxgap = 1) [32]. In the context of text mining, gap constraints can be
Search WWH ::




Custom Search